[IEEE ESSCIRC 2007 - 33rd European Solid-State Circuits Conference - Muenchen, Germany...

A 10-Gb/sec Unclocked Current-Mode Logic (CML) Analog Decision-Feedback Equalizer (ADFE) in

0.18-µm CMOS

Soumya Chandramouli, Franklin Bien, Hyoungsoo Kim, Edward Gebara, and Joy Laskar Georgia Electronic Design Center, Dept. of ECE, Georgia Institute of Technology, 85, 5th Street NW

Atlanta, GA, 30308 USA

Abstract—An unclocked analog decision-feedback equalizer (ADFE) is implemented in a 0.18-µm 40 GHz ft CMOS process to equalize legacy FR-4 backplane channels at 8~10-Gb/sec. The critical first feedback-loop latency requirement of the DFE is met by using a novel unclocked feedback topology and current-mode logic (CML) circuit building blocks. The circuit consists of a 4-tap linear analog feed-forward filter that cancels pre-cursor inter-symbol interference (ISI) to partially open the eye and a novel 1-tap analog tunable CML feedback filter that enables cancellation of the first post-cursor at 10-Gb/sec without the use of smaller process nodes or speculative techniques. The chip with pads occupies 1.04 mm2 and draws 240 mA DC current from a 1.8 V supply at a typical process corner. The ADFE is used to equalize 20 inches of FR-4 backplane traces at 8-Gb/sec and 10-Gb/sec.

I. INTRODUCTION The DFE is a non-linear equalizer that has been used to

cancel inter-symbol interference (ISI) and improve bit-error ratio (BER) in multi-Gb/sec wired data transmission systems [1]. The DFE enables ISI cancellation without the noise enhancement associated with linear equalizers. The conventional receiver DFE as shown in Fig. 1 consists of a linear feed-forward filter, whose taps are set to cancel pre-cursor ISI, a non-linear decision element, and a multi-tap feedback filter. The decision element is typically implemented as a clocked comparator, which makes a clean decision on the incoming data stream at every symbol period. This clean decision signal is used to make an appropriate correction to the following symbols by means of multiple feedback loops. In this manner, post-cursor ISI can be cancelled. In order to cancel the ISI caused by the most recent symbol from the present symbol, for a 10-Gb/sec signal, the latency of the first feedback loop, TLOOP, must be equal to the symbol period of 100 ps. At multi-Gb/sec data rates, achieving this latency is a critical stumbling block in the implementation of the DFE due to the inherent clock-data delay and set-up time of the clocked comparator as well as the delay through the feedback filter. A major portion of the clock-data delay includes the time to regenerate an analog signal to CMOS digital levels. State-of-the art DFEs that have been implemented for multi-Gb/sec data transmission systems use techniques such as speculation and half-rate clocking to overcome this latency challenge [1]. This paper describes an unlocked analog technique and circuit architecture to overcome the first feedback loop latency

bottleneck exhibited by the conventional digital DFE architecture. The unclocked circuit can achieve a first feedback-loop latency of 100 ps, which is the symbol duration for a 10-Gb/sec signal. In addition, TLOOP can be controlled dynamically using an external control voltage that enables it to be tuned from 100-125 ps without the use of a clock so that the first post- cursor can be cancelled at 8-Gb/sec and 10-Gb/sec. The tunable feature also enables dynamic adjustment for changes to the design goal caused by process, voltage and temperature (PVT) variations.

II. ANALOG DECISION-FEEDBACK EQUALIZER DESIGN Fig. 2 shows the block diagram of the ADFE with a 4-tap

linear feed-forward filter and a 1-tap non-linear feedback filter. The forward filter taps are set to remove pre-cursor ISI. The variable gain blocks are implemented using a Gilbert cell topology and can provide bipolar gain of +1 to –1 [2]. The gain of the taps can be tuned using the control voltages VC0-VC3. The variable delay cells are implemented using active differential pairs with resistive loads. The product of the resistor and transistor and parasitic capacitances determines the signal delay. The delay cells can also be tuned depending on system requirements [2]. The control voltage VFF provides a degree of control to achieve the optimum delay value for equalization. The design of the linear filter building blocks has been described in a previous publication [2].

The non-linear feedback portion consists of a Gm block, an analog limiting amplifier (ALA) block and a loop latency control (LLC) block that can be used to control the latency through the decision-feedback loop over variations in PVT. A CML output driver (OD) is used to improve the eye-opening and drive a 50-ohm load. Post-cursor ISI cancellation occurs in the Gm block. The primary idea behind the ADFE is that a rail-to-rail digital signal is not needed for decision-feedback to take place. The Gm block uses current switches that are controlled by the decision signals of the previous symbol to steer a weighted tap current and make the appropriate correction to the current symbol. It is shown that a rail-to-rail signal is not needed to achieve the required tap current switching. CML techniques are used to achieve the speed and voltage swing required to control the current switches. For a 1.8 V supply, typical CML voltage levels are between 400 mV and 800 mV. At multi-Gb/sec data rates, CML circuits achieve much faster speeds than static CMOS logic gates [3].

1-4244-1125-4/07/$25.00 ©2007 IEEE. 512

Figure 1. Block diagram of a conventional receiver DFE.

A. Gm Block Post-cursor ISI is the effect of previous symbols on the

current symbol. The effect is most visible when a long stream of 1’s (or high value) is followed by a 0 (or low value) as shown in Fig. 2. The falling edge is unable to reach its full value before the next symbol arrives resulting in a missed transition. The decision block will be unable to resolve transitions that are smaller in magnitude than its input dynamic range, resulting in errors in the output data. The goal of decision-feedback equalization is to make an appropriate correction at the input of the decision block so that all transitions are large enough to be resolved by the decision circuit. The Gm block performs this correction in the current domain [1]. Transistors M1, M2 take in the input signal and generate a differential current. M3, M4 are switches, whose gates are controlled by the decision signals from the previous symbol, VCP and VCM. Depending on the previous symbol, the tap current ITAP is pulled either entirely through M3 or entirely through M4 making a correction to the signal at the input of the ALA. The transition can now be resolved by the decision block. M1 and M2 are designed to have small feature size to minimize the gate-source capacitance, CGS, which will be the load capacitance seen by the linear forward filter. This stage is a unity-gain transconductance stage. The load resistor values are kept as small as possible to push out the pole due to RLOAD*CLOAD, where CLOAD is the capacitance at VOUTP and VOUTM. CLOAD is composed mostly of CGS of the ALA input transistors. An important point to note here is that for this application, the signal swing coming to M1-M2 from the output of the forward filter is between 9-30 mV owing to the low-frequency suppression behavior of the linear filter. The worst-case signal transition has a voltage swing of 9 mV. The minimum swing needed by the ALA to correctly resolve each symbol and drive the output buffer is 17 mV. Therefore, the amount of post-cursor ISI correction, VCORR, needed at the input of the ALA is about 8 mV. The minimum voltage swing needed for the decision signals, VCP and VCM to completely switch the tap current, ITAP, to one side is given by VDECISION in (1) [3]. VCORR is expressed in (2).

LWC

IV

VLWCI

ox

TAPDECISION

DECISIONoxTAP

n

n

μ

μ

2

)(21 2

=⇒

=

(1)

LOADTAPCORR RIV ∗= (2)

Figure 2. Block diagram of ADFE.

Knowing the maximum value of VCORR and RLOAD, the maximum ITAP that will be needed can be determined using (2) and the maximum value of VDECISION needed can be calculated using (1). If RLOAD decreases by a factor of M, the ITAP needed for the same VCORR increases by a factor of M. This increases the VDECISION needed for complete ITAP switching by a factor ofM . This increases the output swing requirements of the

LLC block by a factor of M . The circuit is designed so that a VDECISION signal with a swing of 600 mV is sufficient to switch the maximum value of ITAP that will be used. This voltage swing can provide up to 90 mV of post-cursor ISI correction, if necessary. Knowing this, the feedback loop is designed to provide the decision signals VCP and VCM with a 600 mV swing.

B. Analog Limiting Amplifier (ALA) Instead of a clocked decision element, an unclocked high

gain-bandwidth (GBW) ALA is used to make the decision on the incoming data stream. The key requirements for the ALA are high gain and low latency. The circuit needs to amplify the small signal at the output of the linear filter to large enough levels to drive the CML LLC block and the CML output driver (OD). Furthermore, the delay through the ALA needs to be less than 100 ps, which is the symbol duration for a 10-Gb/sec symbol rate. The ALA consists of cascading stages of two identical cells. Fig. 3 shows a schematic of the ALA cell. The small signal DC gain of each differential pair is given in (3).

Lmv RgA ∗= (3)

However, when the differential input voltage is ≥ Vin,min as defined in (4), the differential pair behaves in a CML fashion and complete switching of the tail current, ISS to one side occurs. The output signal swing is then Vout as defined in (5).

LWC

IVoxn

ssin

μ

2min, = (4)

LSSout RIV ∗= (5)

513

In this manner multiple stages can be cascaded and the final stage will completely switch the tail current if its input is greater than Vin,min. Increasing the number of cascading stages can improve the input dynamic range of the ALA. However, the second requirement of the ALA is the latency through it. The latency is inversely proportional to the 3-dB bandwidth of the circuit, which is determined by the pole at the output of each differential pair stage. Cascading multiple stages decreases the 3-dB bandwidth of the ALA and increases the signal latency through it. This imposes a limit on the value of RL and W/L for each input pair. Increasing W/L, will increase gm and therefore the gain, but will also increase CGS and the Miller input capacitance. Increasing RL will also increase the small signal gain of the first few stages and the output swing of the last stage. However, both these actions will push the pole in, decreasing bandwidth and increasing latency. Therefore, the design of the ALA is a careful balance to achieve the required gain and latency. In order to maximize the GBW of each ALA cell, a modified Cherry-Hooper topology is utilized that uses an active feedback stage to reduce the load resistivity at the drains of M1-M2 thereby improving GBW [4]. Furthermore, the latency through the ALA can vary over PVT. Therefore, some degree of control is needed for the total loop delay. It was decided to limit the ALA to two stages. The two-stage structure has a delay of about 68 ps. The remaining delay is provided by the tunable LLC block, which uses CML techniques to provide the signal swing needed by the current switches in the Gm stage and further improve the input dynamic range of the entire feedback-loop.

Figure 3. ALA cell schematic.

C. Loop Latency Control (LLC) The LLC block consists of two cascaded CML differential

pair stages, D1 and D2, followed by a cross-coupled stage and a voltage-controlled current steering folded bias block that enables different signal delays through the circuit. Fig. 4 shows a schematic of the LLC block. D1 and D2 are designed so that the output swing of the ALA is greater than the Vin,min, as defined in (4) required for CML functioning. The output swing of D1 is Vout = ISS*RLOAD, which is greater than V in,min. In this way, each stage performs complete switching of ISS. The final stage consists of cross-coupled pairs, M5-M6, and M7-M8. The gates of M5-M6 get their input signal, VX from the output of D2. The gates of M7-M8 get their input, VIN

directly from the input of the LLC block. A voltage-controlled current steering technique is used to control the path taken by the signal through the circuit. VCONT controls the bias current through transistors MN2 and MN3 that are mirrored over to MN4 and MN5. Depending of the ratio of ISLOW/IFAST, the signal can travel through the fast path (VIN through M7-M8 to VOUT) or slow path (VIN through D1, D2, and M5-M6, M7-M8 to VOUT). When VCONT is low, all of ICONTROL goes through MN2 and is mirrored to MN4. The fast path is activated and the signal goes through transistors M7-M8. The dominant pole for this path is RLOAD*CLOAD, where CLOAD is capacitive load seen by the LLC. When VCONT is high, all of ICONTROL is switched to MN3 and mirrored to MN5. In this case, the slow signal path is dominant. For in-between values of VCONT, varied delay can be achieved [2]. Using this tunable feature, the latency of the feedback loop can be tuned to the exact symbol period required and be adjusted for PVT variations.

Figure 4. LLC schematic.

D. CML Output Driver (OD) An OD consisting of identical cascaded CML stages, similar to D1 in the LLC block, is used after the ALA to improve the eye-opening and drive a 50-ohm load. The cascading stages ensure that all signal transitions after the ALA are driven to the same output voltage swing, thereby decreasing the thickness of the eye-lid and improving SNR and BER.

III. MEASUREMENT RESULTS The complete circuit is fabricated in TSMC’s 0.18-µm

mixed-mode CMOS process. Fig. 5 shows a die photo of the circuit. A pattern-generator is used to generate differential pseudo-random bit sequence (PRBS) data. The data is sent through a test set-up consisting of a FR-4 backplane and 2 line cards with connector assemblies connecting the line cards to the backplane. The output of the channel is connected to the differential inputs VINP and VINM of the ADFE circuit. VC0-VC3 and VFF are used to control the linear filter tap gains and delay cell latency respectively. VCONT is used to control the feedback filter latency and ITAP controls the feedback tap current. An oscilloscope is used to look at the eye diagram of the received signal at the differential outputs, VOUTP and VOUTM. Fig. 6a and 6b shows the eye diagram of a10-Gb/sec PRBS 223 –1data stream before and after 20 inches of FR-4. The received eye is completely closed after transmission through the FR-4 and equalization is needed to recover the data. Fig. 6c shows the

514

signal after the FR-4 channel applied to the ADFE with the feedback tap current ITAP turned off. This shows the effect of just the linear forward filter and output driver. Fig. 6d shows the eye diagram with ITAP turned on. This shows the effect of analog decision-feedback. The eye is clearly more open with analog decision-feedback and will have a lower BER than the data in 6c. This measurement shows the benefit of decision-feedback equalization and proves the concept of the novel unclocked analog decision-feedback technique and circuit architecture. Clocking the received data at the center of the open eye can reduce the jitter of the output signal. In addition, VCONT can be tuned to provide 125 ps of loop delay. This is the symbol duration for an 8-Gb/sec data stream. Fig. 7a –7d shows the effect of the ADFE for 8-Gb/sec. Again, improved SNR and BER are obtained with analog decision-feedback.

Figure 5. ADFE die photo.

Figure 6. Eye diagrams of a) 10-Gb/sec PRBS 223 –1 input b) after 20-inch FR-4 c) after ADFE with ITAP turned off and d) after ADFE with ITAP turned

on.

) Figure 7. Eye diagrams of a) 8-Gb/sec PRBS 223 –1 input b) after 20-inch FR-4 c) after ADFE with ITAP turned off and d) after ADFE with ITAP turned

on.

IV. CONCLUSION This work demonstrates a novel analog decision-feedback

equalizer circuit implemented in 0.18-µm CMOS for FR-4 equalization. The unclocked architecture overcomes the first feedback-loop latency of conventional DFE topologies without the use of speculative techniques or half-rate clocking. In addition, the unclocked analog technique enables 10- Gb/sec decision-feedback equalization in a 0.18-μm CMOS process and consumes less power than a digital or mixed-signal implementation. The chip with pads occupies 1.04 mm2 and draws 240 mA DC current from a 1.8 V supply at a typical process corner. The circuit is used to equalize 20 inches of FR-4 at 8-Gb/sec and 10-Gb/sec.

ACKNOWLEDGMENT The authors would like to thank Pirelli Labs for their

support.

REFERENCES [1] T. Beukema, et.al., “A 6.4-Gb/s CMOS SerDes core with feed-forward

and decision-feedback equalization,” IEEE J. Solid-State Circuits, vol. 40, no. 12, pp. 2633- 2645, December 2005.

[2] F. Bien, H. Kim, Y. Hur, M. Maeng, J. Cha, S. Chandramouli, E. Gebara, and J. Laskar, “A 10-Gb/s reconfigurable CMOS equalizer employing a transition-detector based output monitoring technique for band-limited serial links,” IEEE Trans. Microwave Theory and Tech., vol. 54, no. 12, pp. 4538- 4547, December 2006.

[3] M. M. Green, and U. Singh, “Design of CMOS CML circuits for high-speed broadband communications,” Proc. International Symposium on Circuits and Systems, vol. 2, pp. 204-207, May 2003.

[4] 10-Gb/s limiting amplifier and laser modulator/driver in 0.18-µm CMOS technology,” IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2138-2146, December 2003.

515

[IEEE ESSCIRC 2007 - 33rd European Solid-State Circuits Conference - Muenchen, Germany...

Documents

Transcript of [IEEE ESSCIRC 2007 - 33rd European Solid-State Circuits Conference - Muenchen, Germany...