[IEEE APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) - Macao,...

4
A Low-Power Area-Efficient SRAM with Enhanced Read Stability in 0.18-μm CMOS Cihun-Siyong Alex Gong*, Ci-Tong Hong*, Kai-Wen Yao, and Muh-Tian Shiue** Department of Electrical Engineering National Central University Taiwan, R.O.C. Email: [email protected]; [email protected]; [email protected] *The first two authors contributed equally **Corresponding author Abstract— Read stability has been considered one of the dom- inant factors governing the overall performance and operation limitation of static random access memory (SRAM). Further- more, periodic precharge in read/write (R/W) cycle is the major source of power consumption in an SRAM circuit. To address these two concerns, a newly developed SRAM architecture, with specific concentration on read operation, is described in this paper. By utilizing a ”preequalize” scheme, direct connections of the bit lines to power-supply nodes at the beginning of read cycle no longer exist, thereby having the SRAM be provided with an improved power efficiency. The preequalize scheme also yields an increased read static noise margin (SNM) and a cell area comparable to that of the conventional counterpart, due to the similarity between the proposed SRAM cell and familiar inverter circuit in geometry (aspect) ratios of the transistors involved. Several concerns stemming from the proposed scheme are discussed. A 4-kb-capacity prototype designed in a 0.18- μm CMOS process achieves a more power-efficient operation as compared to that adopting conventional architecture. I. I NTRODUCTION SRAM continues to be an essential building block in modern digital signal processing systems. In today’s consumer electronics, especially in audio, video, and telecommunication chipsets, it is very possible that more than 40% active silicon area is consumed by memory: for instance, physical imple- mentations for fast fourier transform (FFT) and other similar algorithms. As a result, it is of major importance that the performance of SRAM can be optimized to an extent where area overhead, power dissipation, and fail-bit rate of the system can be minimized. With regard to the factors that restrict the performance of SRAM, the precharge in R/W cycle and read stability are of chief concerns. Reducing the power dissipation of write cycle resulting from the precharge has been addressed in [1]. The employment of sense-amplifying cell allows the cell node to be inverted when a V DD /6-only voltage difference between the bit lines is created. As for the enhancement of read stability, a read- SNM-free SRAM cell has been proposed to separate data retention elements and data output elements such that cell node will not be disturbed during read operation [2]. The work, however, requires eight transistors for a single cell, which results in almost 30% increase in SRAM area compared to that adopting conventional six-transistor (6-T) cell. The authors of [3] proposed a 7-T SRAM cell to not only obtain the same effect but also reduce the increased area overhead to 13%. It ,however, is still a bit significant as modern DSP processors generally contain several purpose-specific SRAMs in a single chip. In this paper, we present an architectural solution in con- sideration of the tradeoffs among power, area, speed, and read stability of SRAM. The newly developed SRAM architecture and simulation analysis are elaborated in Section II. Results on the designed chip are offered in Section III. Conclusions are given in Section IV. II. PROPOSED PREEQUALIZE SCHEME AND SIMULATION ANALYSIS The column configurations of the conventional and proposed SRAMs are, respectively, shown in Fig. 1(a) and (b). In general, for a read operation of the conventional SRAM, the bit lines BL and BLB are initially pulled up to an enough high voltage level by switching PC. After WL goes HIGH, the voltage difference between BL and BLB is sensed through a pair of latch-based sense amplifiers (SAs) whose gate terminals of their N-tail transistors are controlled by SE, provided that the differential voltage exceeds a predetermined threshold [4]. The two SAs separate the bit lines with heavy loads and data read-out lines a and b to reduce the power dissipation and delay of read. By contrast, it can be discovered that the vertical p-type precharge transistors of the proposed architecture have been replaced with two n-type transistors used for performing column selection (it should be noticed that the vertical p- type precharge transistors can still be used in the proposed architecture provided that several necessary changes in tran- sistor type and timing arrangement are made accordingly). One significant difference between the conventional and proposed architectures is that the horizontal transistor, driven by PC, has also been replaced with the transistor independently controlled by EQ. Referring to the timing diagram in the read operation of the proposed architecture (Fig. 1(c)), once a column is selected, BL and BLB are equalized more quickly to a voltage level, close to the input range enabling SAs to have a maximum voltage gain, as compared to the conventional architecture, thereby speeding up the read-out operation. Fur- thermore, because direct connections of the bit lines to the 729 978-1-4244-2342-2/08/$25.00 ©2008 IEEE.

Transcript of [IEEE APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) - Macao,...

A Low-Power Area-Efficient SRAM with EnhancedRead Stability in 0.18-µm CMOS

Cihun-Siyong Alex Gong*, Ci-Tong Hong*, Kai-Wen Yao, and Muh-Tian Shiue**Department of Electrical Engineering

National Central UniversityTaiwan, R.O.C.

Email: [email protected]; [email protected]; [email protected]*The first two authors contributed equally **Corresponding author

Abstract— Read stability has been considered one of the dom-inant factors governing the overall performance and operationlimitation of static random access memory (SRAM). Further-more, periodic precharge in read/write (R/W) cycle is the majorsource of power consumption in an SRAM circuit. To addressthese two concerns, a newly developed SRAM architecture, withspecific concentration on read operation, is described in thispaper. By utilizing a ”preequalize” scheme, direct connectionsof the bit lines to power-supply nodes at the beginning of readcycle no longer exist, thereby having the SRAM be providedwith an improved power efficiency. The preequalize scheme alsoyields an increased read static noise margin (SNM) and a cellarea comparable to that of the conventional counterpart, due tothe similarity between the proposed SRAM cell and familiarinverter circuit in geometry (aspect) ratios of the transistorsinvolved. Several concerns stemming from the proposed schemeare discussed. A 4-kb-capacity prototype designed in a 0.18-µm CMOS process achieves a more power-efficient operationas compared to that adopting conventional architecture.

I. INTRODUCTION

SRAM continues to be an essential building block inmodern digital signal processing systems. In today’s consumerelectronics, especially in audio, video, and telecommunicationchipsets, it is very possible that more than 40% active siliconarea is consumed by memory: for instance, physical imple-mentations for fast fourier transform (FFT) and other similaralgorithms. As a result, it is of major importance that theperformance of SRAM can be optimized to an extent wherearea overhead, power dissipation, and fail-bit rate of the systemcan be minimized. With regard to the factors that restrict theperformance of SRAM, the precharge in R/W cycle and readstability are of chief concerns.

Reducing the power dissipation of write cycle resulting fromthe precharge has been addressed in [1]. The employmentof sense-amplifying cell allows the cell node to be invertedwhen a VDD/6-only voltage difference between the bit linesis created. As for the enhancement of read stability, a read-SNM-free SRAM cell has been proposed to separate dataretention elements and data output elements such that cell nodewill not be disturbed during read operation [2]. The work,however, requires eight transistors for a single cell, whichresults in almost 30% increase in SRAM area compared to thatadopting conventional six-transistor (6-T) cell. The authors of[3] proposed a 7-T SRAM cell to not only obtain the same

effect but also reduce the increased area overhead to 13%. It,however, is still a bit significant as modern DSP processorsgenerally contain several purpose-specific SRAMs in a singlechip.

In this paper, we present an architectural solution in con-sideration of the tradeoffs among power, area, speed, and readstability of SRAM. The newly developed SRAM architectureand simulation analysis are elaborated in Section II. Resultson the designed chip are offered in Section III. Conclusionsare given in Section IV.

II. PROPOSED PREEQUALIZE SCHEME AND SIMULATION

ANALYSIS

The column configurations of the conventional and proposedSRAMs are, respectively, shown in Fig. 1(a) and (b). Ingeneral, for a read operation of the conventional SRAM,the bit lines BL and BLB are initially pulled up to anenough high voltage level by switching PC. After WL goesHIGH, the voltage difference between BL and BLB is sensedthrough a pair of latch-based sense amplifiers (SAs) whosegate terminals of their N-tail transistors are controlled by SE,provided that the differential voltage exceeds a predeterminedthreshold [4]. The two SAs separate the bit lines with heavyloads and data read-out lines a and b to reduce the powerdissipation and delay of read.

By contrast, it can be discovered that the vertical p-typeprecharge transistors of the proposed architecture have beenreplaced with two n-type transistors used for performingcolumn selection (it should be noticed that the vertical p-type precharge transistors can still be used in the proposedarchitecture provided that several necessary changes in tran-sistor type and timing arrangement are made accordingly). Onesignificant difference between the conventional and proposedarchitectures is that the horizontal transistor, driven by PC, hasalso been replaced with the transistor independently controlledby EQ. Referring to the timing diagram in the read operationof the proposed architecture (Fig. 1(c)), once a column isselected, BL and BLB are equalized more quickly to avoltage level, close to the input range enabling SAs to havea maximum voltage gain, as compared to the conventionalarchitecture, thereby speeding up the read-out operation. Fur-thermore, because direct connections of the bit lines to the

729978-1-4244-2342-2/08/$25.00 ©2008 IEEE.

Fig. 1. (a) Conventional SRAM architecture. (b) Proposed SRAM architecture. (c) Timing diagram of proposed SRAM in a read operation. Note that WE,used for switching on the write driver, can be done by ANDing the input data with RW [4].

power-supply nodes have been removed for a read operation,reduction in the dynamic power dissipation caused by periodicprecharge in read cycle is achievable.

The proposed architecture, using ”preequalize” rather thanthe conventional ”precharge”, also yields remarkable improve-ments in the read SNM. In general, the geometry ratio (withthe channel length set to 180 nm) of M5 (M6) to M1 (M2)shown in Fig. 2 should be 1.5 to maintain sufficient readstability (as the bit lines of the conventional SRAM areprecharged to a HIGH level which is far away the input levelthat can have SAs achieve maximum voltage gain), and thegeometry ratio of M1 (M2) to M3 (M4) should be 1.5 aswell to maintain a proper write operation [4]. By constrainingthe cell size to be minimum, the W/L ratios of M5 (M6),M1 (M2), and M3 (M4) were set to 0.44/0.18, 0.35/0.18, and0.28/0.18 respectively (all sizes are in µm). Such a designpolicy, however, results in a poor read SNM.

Since the proposed architecture can have their bit linesachieve an opportune moment very soon for the operation ofSAs, there is no need to confine the W/L ratio of M3 and M4

to minimum. Contrarily, the W/L ratio of M3 and M4 can beset to 0.88/0.18, similar to that in conventional static inverter,to dramatically enhance the read SNM of the SRAM cell. Toget wise to how the SNM varies with mismatched thresholdvoltage (Vt), we have varied the threshold voltage M3 or M5

by 10%. The read SNMs with and without Vt mismatch areshown in Fig. 2. It is by far that the SNMs belonging to the cellof the proposed architecture are better and less variable thanthose of the conventional one. The cell area of the proposedand conventional architectures can be seen from the layoutsshown in Fig. 3. Because the heights of the bulks are thesame for each of the transistor types, the areas occupied bythe bulks are mainly determined by their widths governed bythe areas of the cell cores. As a result, we can only considerthe difference between the cell cores. The SRAM with the

Fig. 2. Comparison in read SNMs with and without Vt mismatch onconventional and proposed SRAM architectures.

proposed architecture demonstrates a cell area that is quitecompetitive with that of the conventional design.

Despite the advantages stemming from the proposed archi-tecture in a read operation, the size of M3 and M4 brings abouta concern in the performance of write cycle. Assuming that qis initially HIGH (qb=LOW) and intending to be dischargedto a valid logic LOW through BL when WL goes HIGH, ageometry ratio 0.4 of M1 (M2) to M3 (M4) may cause aninternal race penalty as a result of the strong pull-up abilityin M3, as can be understood from the illustration shown inFig. 4(a). It may affect the maximum permissible operationspeed of the SRAM. We have taken such a factor into account

730

Fig. 3. Cell layouts in (a) proposed and (b) conventional SRAMs.

Fig. 4. (a) Proposed SRAM without WPO. (b) Proposed SRAM with WPOtype I. (c) Proposed SRAM with WPO type II.

and performed several simulations of the write delay versuswidth of M3 and M4 for different process corners. The resultsare plotted in Fig. 5. It should be noted that the delays werecounted from that WL goes to its peak voltage (1.8 V) to thatq is discharged to 600 mV. Obviously, the delay increases withthe increase in width. The results also imply that there willbe a challenge to realizing an SRAM over few GHz using theproposed scheme as the fail-bit rate in a write operation isvery likely to be considerably increased.

The penalty may be eliminated by means of a write-power-off (WPO) scheme. The intention of introducing the WPOscheme is to generate a control signal Ctrl, which is with thesame rising edge as WL, but with reduced pulse width ascompared to that of WL, to dynamically shut the power supplyof the cell as often as a write operation is performed. As shownin Fig. 4(b), the signal can be applied to the gate terminal of apower p-transistor (PPT) that is shared by a row of the cells.The source terminal of PPT is connected to the nominal supplyvoltage and its drain is connected to the source terminals ofM3 and M4 of each of the cells. It may also be feasible toapply Ctrl to a real power transistor of one of the on-chippower regulators, such as low-dropout (LDO) regulator or DC-DC converter, that is in charge of its corresponding SRAM,as depicted in Fig. 4(c). If the WPO scheme is not feasiblefor the application adopting the proposed design, the penaltycan also be easily eliminated by rather reducing the widths ofM5 and M6, and increasing their lengths, than increasing thewidths of M3 and M4.

Fig. 5. Write delay versus width of M3 and M4.

Fig. 6. Impact of Vt mismatch on SAs. Low/High Vt represents0.9/1.1*VtNormal.

In addition to the above, to assure the proposed SRAM of anefficient performance, it is not trivial to investigate the impactof Vt mismatch on the employed SAs. We have performeda similar simulation process as that adopted for the SRAMcells (Fig. 2) for SAs and the results are presented in Fig.6. Lastly it is worthily noticed that the proper operation ofthe SRAM relies on nonoverlapping between WL and EQ. Asa result, a wordline pulsewidth shrink circuit, shown in Fig.7(a), has been proposed and will be used in the design of ournext-generation SRAM with the proposed architecture.

III. WHOLE-CHIP SIMULATION RESULTS

The proposed SRAM has been designed in a 0.18-µmCMOS technology. The screen capture of the whole-chiplayout from the Cadence design software is shown in Fig. 8.The silicon area consumed by the proposed SRAM is around0.147 mm2. After the SRAM was laid out, the results ofminimum supply voltage versus temperature on the proposedSRAM for all process corners were obtained. As can be seenfrom Fig. 9, the minimal supply voltage required by the SRAMincreases as temperature for all the corners, implying the fail-bit rate increases as temperature for a given supply voltage.Table I compares the presented work with the prior arts whoseprocess and capacity are the same as those adopted in our

731

RWWL

WL

EQ

Shrinked Pulse

Fig. 7. (a) Proposed wordline pulsewidth shrink circuit. (b) Illustrationof wordline pulsewidth shrink. Note that EQ should be properly delayed ineffect to synchronize Shrinked Pulse prior to being applied to gate terminalof equalize transistor – otherwise timing violation will occur.

Fig. 8. Layout photograph of proposed SRAM.

current design. In order to give insight into how significancethe improvement in the performance can be achieved, a 4-kbSRAM with the conventional architecture has been designed asa control. By comparison, the proposed SRAM demonstratesa reduction in the total power consumption as compared to theconventional counterpart, under the same operation conditions.

IV. CONCLUSION

We have reported on a novel SRAM architecture withpreequalize scheme. The presented design demonstrates thatlow power, small area, and good read stability can coexist.The concern of write delay as a result of the increasedtransconductance regarding the p-type driving transistors ofthe cell has been discussed and the solutions were provided.

Fig. 9. VDD-min dependence on temperature at an 1-GHz operating clock.

TABLE I

COMPARISON TABLE OF SRAMS

4T CMOS cell* [5]

Dynamic Vt [6]

500

Negative WL* [7]

Proposed

4 0.7654

Speed(MHz)

Technique SRAM Area(mm2)

Capacity(Kbits)

PowerConsumption (mW)

Supply(V)

152 1.8

667 4 1.0418 182 1.8

250 4 0.6295 23.201 1.8

1000 4 0.147 8.932 1.8

Process( m)0.18

0.18

0.18

0.18

* : measurement results

The impacts of Vt mismatch on the cell and sense amplifiersof the SRAM were analyzed. Because the pulse width of WLpulled HIGH must not be overlapped by the equalize pulse, awordline pulsewidth shrink circuit has been designed to assurethe design of a proper operation. A proof-of-concept prototypewith 4-kb capacity designed in a 0.18-µm CMOS demonstratesthe feasibility of the proposed design. Physical measurementis currently underway.

ACKNOWLEDGMENT

The authors would like to acknowledge chip implementationservice provided by CIC. Financial support under NSC 97-2627-B-008-002- is also appreciated.

REFERENCES

[1] K. Kanda, H. Sadaaki, and T. Sakurai, ”90% write power-saving SRAMusing sense-amplifying memory cell,” IEEE J. Solid-State Circuits, vol.36, pp. 927 - 933, Jun. 2004.

[2] L. Chang, D.M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R.K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W.Guarini, and W. Haensch, ”Stable SRAM cell design for the 32 nm nodeand beyond,” Proc. Symp. VLSI Technology Dig., Jun. 2005, pp. 128 -129.

[3] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii,and H. Kobatake, ”A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications,” IEEE J. Solid-State Circuits, vol. 41,pp. 113 - 121, Jan. 2006.

[4] D. A. Hodges, H. G. Jackson, R. A. Saleh, ”Analysis and Design ofDigital Integrated Circuits,” 3rd ed. NY: McGraw-Hill. Press, 2003.

[5] C. C. Wang, Y. L. Tseng, H.Y. Leo, and R. Hu, ”A 4-kB 500-MHz 4TCMOS SRAM using low-VTHN bitline drivers and high-VTHP latches,”IEEE Trans. VLSI Syst., vol. 12, pp. 901 - 909, Sep. 2004.

[6] C. C. Wang, T. H. Chen, and R. Hu, ”A 4-kb 667MHz CMOS SRAMusing dynamic threshold voltage wordline transistor,” Proc. SouthwestSymp. on Mixed-Signal Design, pp. 90 - 93, 2003.

[7] C. C. Wang, C. L. Lee, and W. J. Lin, ”A 4-kb low-power SRAM designwith negative word-line scheme,” IEEE Trans. Circuits Syst. I, vol. 54,pp. 1069 - 1076, May 2007.

732