[IEEE APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) - Macao,...
Transcript of [IEEE APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) - Macao,...
A Low-Power Area-Efficient SRAM with EnhancedRead Stability in 0.18-µm CMOS
Cihun-Siyong Alex Gong*, Ci-Tong Hong*, Kai-Wen Yao, and Muh-Tian Shiue**Department of Electrical Engineering
National Central UniversityTaiwan, R.O.C.
Email: [email protected]; [email protected]; [email protected]*The first two authors contributed equally **Corresponding author
Abstract— Read stability has been considered one of the dom-inant factors governing the overall performance and operationlimitation of static random access memory (SRAM). Further-more, periodic precharge in read/write (R/W) cycle is the majorsource of power consumption in an SRAM circuit. To addressthese two concerns, a newly developed SRAM architecture, withspecific concentration on read operation, is described in thispaper. By utilizing a ”preequalize” scheme, direct connectionsof the bit lines to power-supply nodes at the beginning of readcycle no longer exist, thereby having the SRAM be providedwith an improved power efficiency. The preequalize scheme alsoyields an increased read static noise margin (SNM) and a cellarea comparable to that of the conventional counterpart, due tothe similarity between the proposed SRAM cell and familiarinverter circuit in geometry (aspect) ratios of the transistorsinvolved. Several concerns stemming from the proposed schemeare discussed. A 4-kb-capacity prototype designed in a 0.18-µm CMOS process achieves a more power-efficient operationas compared to that adopting conventional architecture.
I. INTRODUCTION
SRAM continues to be an essential building block inmodern digital signal processing systems. In today’s consumerelectronics, especially in audio, video, and telecommunicationchipsets, it is very possible that more than 40% active siliconarea is consumed by memory: for instance, physical imple-mentations for fast fourier transform (FFT) and other similaralgorithms. As a result, it is of major importance that theperformance of SRAM can be optimized to an extent wherearea overhead, power dissipation, and fail-bit rate of the systemcan be minimized. With regard to the factors that restrict theperformance of SRAM, the precharge in R/W cycle and readstability are of chief concerns.
Reducing the power dissipation of write cycle resulting fromthe precharge has been addressed in [1]. The employmentof sense-amplifying cell allows the cell node to be invertedwhen a VDD/6-only voltage difference between the bit linesis created. As for the enhancement of read stability, a read-SNM-free SRAM cell has been proposed to separate dataretention elements and data output elements such that cell nodewill not be disturbed during read operation [2]. The work,however, requires eight transistors for a single cell, whichresults in almost 30% increase in SRAM area compared to thatadopting conventional six-transistor (6-T) cell. The authors of[3] proposed a 7-T SRAM cell to not only obtain the same
effect but also reduce the increased area overhead to 13%. It,however, is still a bit significant as modern DSP processorsgenerally contain several purpose-specific SRAMs in a singlechip.
In this paper, we present an architectural solution in con-sideration of the tradeoffs among power, area, speed, and readstability of SRAM. The newly developed SRAM architectureand simulation analysis are elaborated in Section II. Resultson the designed chip are offered in Section III. Conclusionsare given in Section IV.
II. PROPOSED PREEQUALIZE SCHEME AND SIMULATION
ANALYSIS
The column configurations of the conventional and proposedSRAMs are, respectively, shown in Fig. 1(a) and (b). Ingeneral, for a read operation of the conventional SRAM,the bit lines BL and BLB are initially pulled up to anenough high voltage level by switching PC. After WL goesHIGH, the voltage difference between BL and BLB is sensedthrough a pair of latch-based sense amplifiers (SAs) whosegate terminals of their N-tail transistors are controlled by SE,provided that the differential voltage exceeds a predeterminedthreshold [4]. The two SAs separate the bit lines with heavyloads and data read-out lines a and b to reduce the powerdissipation and delay of read.
By contrast, it can be discovered that the vertical p-typeprecharge transistors of the proposed architecture have beenreplaced with two n-type transistors used for performingcolumn selection (it should be noticed that the vertical p-type precharge transistors can still be used in the proposedarchitecture provided that several necessary changes in tran-sistor type and timing arrangement are made accordingly). Onesignificant difference between the conventional and proposedarchitectures is that the horizontal transistor, driven by PC, hasalso been replaced with the transistor independently controlledby EQ. Referring to the timing diagram in the read operationof the proposed architecture (Fig. 1(c)), once a column isselected, BL and BLB are equalized more quickly to avoltage level, close to the input range enabling SAs to havea maximum voltage gain, as compared to the conventionalarchitecture, thereby speeding up the read-out operation. Fur-thermore, because direct connections of the bit lines to the
729978-1-4244-2342-2/08/$25.00 ©2008 IEEE.
Fig. 1. (a) Conventional SRAM architecture. (b) Proposed SRAM architecture. (c) Timing diagram of proposed SRAM in a read operation. Note that WE,used for switching on the write driver, can be done by ANDing the input data with RW [4].
power-supply nodes have been removed for a read operation,reduction in the dynamic power dissipation caused by periodicprecharge in read cycle is achievable.
The proposed architecture, using ”preequalize” rather thanthe conventional ”precharge”, also yields remarkable improve-ments in the read SNM. In general, the geometry ratio (withthe channel length set to 180 nm) of M5 (M6) to M1 (M2)shown in Fig. 2 should be 1.5 to maintain sufficient readstability (as the bit lines of the conventional SRAM areprecharged to a HIGH level which is far away the input levelthat can have SAs achieve maximum voltage gain), and thegeometry ratio of M1 (M2) to M3 (M4) should be 1.5 aswell to maintain a proper write operation [4]. By constrainingthe cell size to be minimum, the W/L ratios of M5 (M6),M1 (M2), and M3 (M4) were set to 0.44/0.18, 0.35/0.18, and0.28/0.18 respectively (all sizes are in µm). Such a designpolicy, however, results in a poor read SNM.
Since the proposed architecture can have their bit linesachieve an opportune moment very soon for the operation ofSAs, there is no need to confine the W/L ratio of M3 and M4
to minimum. Contrarily, the W/L ratio of M3 and M4 can beset to 0.88/0.18, similar to that in conventional static inverter,to dramatically enhance the read SNM of the SRAM cell. Toget wise to how the SNM varies with mismatched thresholdvoltage (Vt), we have varied the threshold voltage M3 or M5
by 10%. The read SNMs with and without Vt mismatch areshown in Fig. 2. It is by far that the SNMs belonging to the cellof the proposed architecture are better and less variable thanthose of the conventional one. The cell area of the proposedand conventional architectures can be seen from the layoutsshown in Fig. 3. Because the heights of the bulks are thesame for each of the transistor types, the areas occupied bythe bulks are mainly determined by their widths governed bythe areas of the cell cores. As a result, we can only considerthe difference between the cell cores. The SRAM with the
Fig. 2. Comparison in read SNMs with and without Vt mismatch onconventional and proposed SRAM architectures.
proposed architecture demonstrates a cell area that is quitecompetitive with that of the conventional design.
Despite the advantages stemming from the proposed archi-tecture in a read operation, the size of M3 and M4 brings abouta concern in the performance of write cycle. Assuming that qis initially HIGH (qb=LOW) and intending to be dischargedto a valid logic LOW through BL when WL goes HIGH, ageometry ratio 0.4 of M1 (M2) to M3 (M4) may cause aninternal race penalty as a result of the strong pull-up abilityin M3, as can be understood from the illustration shown inFig. 4(a). It may affect the maximum permissible operationspeed of the SRAM. We have taken such a factor into account
730
Fig. 3. Cell layouts in (a) proposed and (b) conventional SRAMs.
Fig. 4. (a) Proposed SRAM without WPO. (b) Proposed SRAM with WPOtype I. (c) Proposed SRAM with WPO type II.
and performed several simulations of the write delay versuswidth of M3 and M4 for different process corners. The resultsare plotted in Fig. 5. It should be noted that the delays werecounted from that WL goes to its peak voltage (1.8 V) to thatq is discharged to 600 mV. Obviously, the delay increases withthe increase in width. The results also imply that there willbe a challenge to realizing an SRAM over few GHz using theproposed scheme as the fail-bit rate in a write operation isvery likely to be considerably increased.
The penalty may be eliminated by means of a write-power-off (WPO) scheme. The intention of introducing the WPOscheme is to generate a control signal Ctrl, which is with thesame rising edge as WL, but with reduced pulse width ascompared to that of WL, to dynamically shut the power supplyof the cell as often as a write operation is performed. As shownin Fig. 4(b), the signal can be applied to the gate terminal of apower p-transistor (PPT) that is shared by a row of the cells.The source terminal of PPT is connected to the nominal supplyvoltage and its drain is connected to the source terminals ofM3 and M4 of each of the cells. It may also be feasible toapply Ctrl to a real power transistor of one of the on-chippower regulators, such as low-dropout (LDO) regulator or DC-DC converter, that is in charge of its corresponding SRAM,as depicted in Fig. 4(c). If the WPO scheme is not feasiblefor the application adopting the proposed design, the penaltycan also be easily eliminated by rather reducing the widths ofM5 and M6, and increasing their lengths, than increasing thewidths of M3 and M4.
Fig. 5. Write delay versus width of M3 and M4.
Fig. 6. Impact of Vt mismatch on SAs. Low/High Vt represents0.9/1.1*VtNormal.
In addition to the above, to assure the proposed SRAM of anefficient performance, it is not trivial to investigate the impactof Vt mismatch on the employed SAs. We have performeda similar simulation process as that adopted for the SRAMcells (Fig. 2) for SAs and the results are presented in Fig.6. Lastly it is worthily noticed that the proper operation ofthe SRAM relies on nonoverlapping between WL and EQ. Asa result, a wordline pulsewidth shrink circuit, shown in Fig.7(a), has been proposed and will be used in the design of ournext-generation SRAM with the proposed architecture.
III. WHOLE-CHIP SIMULATION RESULTS
The proposed SRAM has been designed in a 0.18-µmCMOS technology. The screen capture of the whole-chiplayout from the Cadence design software is shown in Fig. 8.The silicon area consumed by the proposed SRAM is around0.147 mm2. After the SRAM was laid out, the results ofminimum supply voltage versus temperature on the proposedSRAM for all process corners were obtained. As can be seenfrom Fig. 9, the minimal supply voltage required by the SRAMincreases as temperature for all the corners, implying the fail-bit rate increases as temperature for a given supply voltage.Table I compares the presented work with the prior arts whoseprocess and capacity are the same as those adopted in our
731
RWWL
WL
EQ
Shrinked Pulse
Fig. 7. (a) Proposed wordline pulsewidth shrink circuit. (b) Illustrationof wordline pulsewidth shrink. Note that EQ should be properly delayed ineffect to synchronize Shrinked Pulse prior to being applied to gate terminalof equalize transistor – otherwise timing violation will occur.
Fig. 8. Layout photograph of proposed SRAM.
current design. In order to give insight into how significancethe improvement in the performance can be achieved, a 4-kbSRAM with the conventional architecture has been designed asa control. By comparison, the proposed SRAM demonstratesa reduction in the total power consumption as compared to theconventional counterpart, under the same operation conditions.
IV. CONCLUSION
We have reported on a novel SRAM architecture withpreequalize scheme. The presented design demonstrates thatlow power, small area, and good read stability can coexist.The concern of write delay as a result of the increasedtransconductance regarding the p-type driving transistors ofthe cell has been discussed and the solutions were provided.
Fig. 9. VDD-min dependence on temperature at an 1-GHz operating clock.
TABLE I
COMPARISON TABLE OF SRAMS
4T CMOS cell* [5]
Dynamic Vt [6]
500
Negative WL* [7]
Proposed
4 0.7654
Speed(MHz)
Technique SRAM Area(mm2)
Capacity(Kbits)
PowerConsumption (mW)
Supply(V)
152 1.8
667 4 1.0418 182 1.8
250 4 0.6295 23.201 1.8
1000 4 0.147 8.932 1.8
Process( m)0.18
0.18
0.18
0.18
* : measurement results
The impacts of Vt mismatch on the cell and sense amplifiersof the SRAM were analyzed. Because the pulse width of WLpulled HIGH must not be overlapped by the equalize pulse, awordline pulsewidth shrink circuit has been designed to assurethe design of a proper operation. A proof-of-concept prototypewith 4-kb capacity designed in a 0.18-µm CMOS demonstratesthe feasibility of the proposed design. Physical measurementis currently underway.
ACKNOWLEDGMENT
The authors would like to acknowledge chip implementationservice provided by CIC. Financial support under NSC 97-2627-B-008-002- is also appreciated.
REFERENCES
[1] K. Kanda, H. Sadaaki, and T. Sakurai, ”90% write power-saving SRAMusing sense-amplifying memory cell,” IEEE J. Solid-State Circuits, vol.36, pp. 927 - 933, Jun. 2004.
[2] L. Chang, D.M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R.K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W.Guarini, and W. Haensch, ”Stable SRAM cell design for the 32 nm nodeand beyond,” Proc. Symp. VLSI Technology Dig., Jun. 2005, pp. 128 -129.
[3] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii,and H. Kobatake, ”A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications,” IEEE J. Solid-State Circuits, vol. 41,pp. 113 - 121, Jan. 2006.
[4] D. A. Hodges, H. G. Jackson, R. A. Saleh, ”Analysis and Design ofDigital Integrated Circuits,” 3rd ed. NY: McGraw-Hill. Press, 2003.
[5] C. C. Wang, Y. L. Tseng, H.Y. Leo, and R. Hu, ”A 4-kB 500-MHz 4TCMOS SRAM using low-VTHN bitline drivers and high-VTHP latches,”IEEE Trans. VLSI Syst., vol. 12, pp. 901 - 909, Sep. 2004.
[6] C. C. Wang, T. H. Chen, and R. Hu, ”A 4-kb 667MHz CMOS SRAMusing dynamic threshold voltage wordline transistor,” Proc. SouthwestSymp. on Mixed-Signal Design, pp. 90 - 93, 2003.
[7] C. C. Wang, C. L. Lee, and W. J. Lin, ”A 4-kb low-power SRAM designwith negative word-line scheme,” IEEE Trans. Circuits Syst. I, vol. 54,pp. 1069 - 1076, May 2007.
732