[IEEE 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers - San...

3
2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE ISSCC 2001 / SESSION 12 / SIGNAL PROCESSING FOR STORAGE AND CODING / 12.4 12.4 A 300MHz Mixed-Signal FDTS/DFE Disk Read Channel in 0.6μm CMOS Derrick Chunkai Wei 1 , Daniel Q. Sun and Asad A. Abidi Integrated Circuits & Systems Laboratory, University of California, Los Angeles, CA 1 currently with Silicon Laboratories Inc., Austin, TX Future read channel ICs must be faster, more sophisticated, and lower cost, compounding the challenge for the circuit designer. This CMOS realization of a mainly-analog decision feedback equalizer (DFE) with fixed delay tree search (FDTS), operates at 300MHz on rate-6/7 MTR-coded data. In theory, at typical user densities 2.0 to 3.0, this read channel outperforms a convention- al EPR4 read channel followed by a Viterbi detector (VA). Unlike an all-digital implementation, this read channel does not require a signal-path A/D converter, and offers fast clock recovery. The prototype IC (Figure 12.4.1) takes as input the amplified read waveform equalized by a programmable (or possibly adap- tive) continuous-time analog filter to an EPR4 target [1]. The equalized waveform is sampled into a 5-tap analog delay line, which drives both the FIR forward equalizer and the discrete- time phase detector. Four continuously-variable transconductors weight the delay-line taps, and subtract their summed current from that of weighted transconductors whose sign is set by the feedback register. The difference drives the single comparator (slicer). In the presence of noise, it is known in advance that cer- tain data patterns will likely trigger incorrect slicer decisions [2]. The Error Pattern Identifier (EPI) senses these patterns, and using additional threshold tests determines whether the past decision is wrong. An incorrect decision is reversed in the first stage of the feedback register, without contaminating the feed- back signal which is derived from the second and subsequent stages. The EPI boosts performance of a simple DFE to that of a fixed delay tree search with depth of two [2]. The analog delay line is a cascade of 8 open-loop track-and-holds (T/H) (Figure 12.4.2). A FET switch in series with a charge-can- celling dummy samples the input waveform onto the gate capac- itance of a pMOS differential pair. A replica circuit, and two off- chip feedback loops with greatly different bandwidths to avoid interaction, sets the common-mode level of each differential pair and drives its voltage gain to exactly unity. The variable transconductors consist of nMOS differential pairs degenerated by voltage-controlled FET resistors. [2] shows that four taps in the decision feedback are sufficient. Bits in the feedback register generate an analog signal by tog- gling a fixed reference voltage through commutating switches into weighted transconductors. To lower capacitive load on the summation point, the tree-like arrangement of Figure 12.4.3b is used: FFE and DFE are summed respectively, and a 4-input dif- ferential pair computes the final result and drives the slicer. With this arrangement, voltages settle to 5b accuracy at 320MHz. Timing information is retrieved from samples of the EPR4-equal- ized input waveform, not from the decisions (Figure 12.4.4). As the head seeks a track, the phase/frequency detector in Loop I of the PLL locks the VCO to a frequency synthesizer. When the head lands on the disk and starts to read the preamble, Loop II is enabled. Both Loop I and Loop II are second-order to correct any frequency error. The VCO consists of a differential ring oscil- lator with six delay stages. The Loop II phase detector brackets samples of the MTR-coded input waveform into one of five sub- levels of the full-scale, and using a two-stage shift register and synchronous logic detects whether any of three patterns is pre- sent from which it can compute phase error by operations on the samples in the analog delay line. These operations consist of sign reversal through a commutating switch, and the addition of transconductor output currents. Accumulated DC offsets are a major source of error in an all-ana- log signal processor. As the waveform at the read heads has zero DC content, taps from the analog delay line are capacitively cou- pled into the tap-weight transconductors and the phase detector inputs. A buffer amplifier after the coupling capacitor isolates the delay line from kick-back from the regenerative comparator at the phase detector input [3]. Offset of up to 100mV after the coupling capacitors is sampled while the head seeks, and sub- tracted from the phase detector input during read mode [1]. Due to the different bandwidth of Loop I and Loop II (~1:10), two external loop filters are used. A voltage duplicator at the phase detector output equates the charges stored in both loop filter capacitors while Loop I is active. This avoids a jump in charge when switching from Loop I to Loop II, which might cause loss of lock. Phase is acquired in less than 100 clock cycles. The EPI logic is the critical timing path in the DFE section. The main decision path consists of one clocked comparator C 1 (Figure 12.4.5). The EPI is triggered by three consecutive bits with the same polarity. A second threshold check by C2 decides if the bit at Q1 should pass to the next register. A polarity check by com- parator C3 decides whether Q1 or Q2 is to be corrected. Since Q1 is not tapped, a full clock cycle is available to make the correc- tion. However, Q2 is tapped, and any decision to correct it is made at the next rising clock edge. The worst-case propagation delay through the EPI block is 0.8ns. Prior to tapeout, the entire read channel is verified using mixed- signal and behavioral simulations in Analogy SABER. The chip in 1P3M 0.6µm CMOS, occupies 1.84x1.82mm 2 active area, and when clocked at 300MHz dissipates 530mW from 3V. The mea- sured lock range for Loop I spans 80MHz to 410MHz. With Loop II enabled, the RMS jitter on the clock recovered from random data is 36.7ps, equivalent to 2°. When the EPI block is turned off, many decision errors appear at the middle two levels of the con- stellation (Figure 12.4.5). Activating the EPI resolves all samples into the expected six levels (Figure 12.4.5). Figure 12.4.6 shows the measured BER with on-chip clock recovery. The curves depart from trend at higher than 10 -3 BER because of error prop- agation in the decision feedback and clock recovery loops. At nor- mal BER and user density of 2.5, the MTR(2,8)-coded EPR4/FDTS detector performs almost as well as the theoretical EPR4/VA, while at user density of 3.0 it is superior by 0.5dB. Since practical EPR4/VA implementations usually perform 1dB below the theoretical bound, it may be concluded that the pro- posed read channel, using simple hardware and mainly analog signal processing, performs better than EPR4/VA by 1dB. Figure 12.4.7 shows the chip micrograph. References: [1] P. Pai et al., “A 160MHz analog front-end IC for EPR-IV PRML magnetic storage read channels,” IEEE J. of Solid-State Circuits, v. 31, no. 11, pp. 1803-1816, Nov. 1996 [2] D. Wei et al., “An analog EPR4 read channel with an FDTS detector,” Proc. of Intl. Communications Conf., vol. 2, pp. 678-682, Atlanta, GA, June 1998 [3] K. Y. Kim et al., “A 10-b, 100-MS/s CMOS A/D converter,” IEEE J. of Solid-State Circuits, vol. 32, no. 3, pp. 302-311, March 1997

Transcript of [IEEE 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers - San...

Page 1: [IEEE 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers - San Francisco, CA, USA (5-7 Feb. 2001)] 2001 IEEE International Solid-State Circuits Conference.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE

ISSCC 2001 / SESSION 12 / SIGNAL PROCESSING FOR STORAGE AND CODING / 12.4

12.4 A 300MHz Mixed-Signal FDTS/DFE Disk Read Channel in 0.6µm CMOS

Derrick Chunkai Wei1, Daniel Q. Sun and Asad A. Abidi

Integrated Circuits & Systems Laboratory, University of California, LosAngeles, CA1currently with Silicon Laboratories Inc., Austin, TX

Future read channel ICs must be faster, more sophisticated, andlower cost, compounding the challenge for the circuit designer.This CMOS realization of a mainly-analog decision feedbackequalizer (DFE) with fixed delay tree search (FDTS), operates at300MHz on rate-6/7 MTR-coded data. In theory, at typical userdensities 2.0 to 3.0, this read channel outperforms a convention-al EPR4 read channel followed by a Viterbi detector (VA). Unlikean all-digital implementation, this read channel does not requirea signal-path A/D converter, and offers fast clock recovery.

The prototype IC (Figure 12.4.1) takes as input the amplifiedread waveform equalized by a programmable (or possibly adap-tive) continuous-time analog filter to an EPR4 target [1]. Theequalized waveform is sampled into a 5-tap analog delay line,which drives both the FIR forward equalizer and the discrete-time phase detector. Four continuously-variable transconductorsweight the delay-line taps, and subtract their summed currentfrom that of weighted transconductors whose sign is set by thefeedback register. The difference drives the single comparator(slicer). In the presence of noise, it is known in advance that cer-tain data patterns will likely trigger incorrect slicer decisions [2].The Error Pattern Identifier (EPI) senses these patterns, andusing additional threshold tests determines whether the pastdecision is wrong. An incorrect decision is reversed in the firststage of the feedback register, without contaminating the feed-back signal which is derived from the second and subsequentstages. The EPI boosts performance of a simple DFE to that of afixed delay tree search with depth of two [2].

The analog delay line is a cascade of 8 open-loop track-and-holds(T/H) (Figure 12.4.2). A FET switch in series with a charge-can-celling dummy samples the input waveform onto the gate capac-itance of a pMOS differential pair. A replica circuit, and two off-chip feedback loops with greatly different bandwidths to avoidinteraction, sets the common-mode level of each differential pairand drives its voltage gain to exactly unity. The variabletransconductors consist of nMOS differential pairs degeneratedby voltage-controlled FET resistors. [2] shows that four taps inthe decision feedback are sufficient.

Bits in the feedback register generate an analog signal by tog-gling a fixed reference voltage through commutating switchesinto weighted transconductors. To lower capacitive load on thesummation point, the tree-like arrangement of Figure 12.4.3b isused: FFE and DFE are summed respectively, and a 4-input dif-ferential pair computes the final result and drives the slicer.With this arrangement, voltages settle to 5b accuracy at320MHz.

Timing information is retrieved from samples of the EPR4-equal-ized input waveform, not from the decisions (Figure 12.4.4). Asthe head seeks a track, the phase/frequency detector in Loop I ofthe PLL locks the VCO to a frequency synthesizer. When thehead lands on the disk and starts to read the preamble, Loop IIis enabled. Both Loop I and Loop II are second-order to correctany frequency error. The VCO consists of a differential ring oscil-lator with six delay stages. The Loop II phase detector bracketssamples of the MTR-coded input waveform into one of five sub-

levels of the full-scale, and using a two-stage shift register andsynchronous logic detects whether any of three patterns is pre-sent from which it can compute phase error by operations on thesamples in the analog delay line. These operations consist of signreversal through a commutating switch, and the addition oftransconductor output currents.

Accumulated DC offsets are a major source of error in an all-ana-log signal processor. As the waveform at the read heads has zeroDC content, taps from the analog delay line are capacitively cou-pled into the tap-weight transconductors and the phase detectorinputs. A buffer amplifier after the coupling capacitor isolatesthe delay line from kick-back from the regenerative comparatorat the phase detector input [3]. Offset of up to 100mV after thecoupling capacitors is sampled while the head seeks, and sub-tracted from the phase detector input during read mode [1]. Dueto the different bandwidth of Loop I and Loop II (~1:10), twoexternal loop filters are used. A voltage duplicator at the phasedetector output equates the charges stored in both loop filtercapacitors while Loop I is active. This avoids a jump in chargewhen switching from Loop I to Loop II, which might cause loss oflock. Phase is acquired in less than 100 clock cycles. The EPIlogic is the critical timing path in the DFE section. The maindecision path consists of one clocked comparator C1 (Figure12.4.5). The EPI is triggered by three consecutive bits with thesame polarity. A second threshold check by C2 decides if the bitat Q1 should pass to the next register. A polarity check by com-parator C3 decides whether Q1 or Q2 is to be corrected. Since Q1is not tapped, a full clock cycle is available to make the correc-tion. However, Q2 is tapped, and any decision to correct it ismade at the next rising clock edge. The worst-case propagationdelay through the EPI block is 0.8ns.

Prior to tapeout, the entire read channel is verified using mixed-signal and behavioral simulations in Analogy SABER. The chipin 1P3M 0.6µm CMOS, occupies 1.84x1.82mm2 active area, andwhen clocked at 300MHz dissipates 530mW from 3V. The mea-sured lock range for Loop I spans 80MHz to 410MHz. With LoopII enabled, the RMS jitter on the clock recovered from randomdata is 36.7ps, equivalent to 2°. When the EPI block is turned off,many decision errors appear at the middle two levels of the con-stellation (Figure 12.4.5). Activating the EPI resolves all samplesinto the expected six levels (Figure 12.4.5). Figure 12.4.6 showsthe measured BER with on-chip clock recovery. The curvesdepart from trend at higher than 10-3 BER because of error prop-agation in the decision feedback and clock recovery loops. At nor-mal BER and user density of 2.5, the MTR(2,8)-codedEPR4/FDTS detector performs almost as well as the theoreticalEPR4/VA, while at user density of 3.0 it is superior by 0.5dB.Since practical EPR4/VA implementations usually perform 1dBbelow the theoretical bound, it may be concluded that the pro-posed read channel, using simple hardware and mainly analogsignal processing, performs better than EPR4/VA by 1dB. Figure12.4.7 shows the chip micrograph.

References:[1] P. Pai et al., “A 160MHz analog front-end IC for EPR-IV PRML magneticstorage read channels,” IEEE J. of Solid-State Circuits, v. 31, no. 11, pp.1803-1816, Nov. 1996[2] D. Wei et al., “An analog EPR4 read channel with an FDTS detector,”Proc. of Intl. Communications Conf., vol. 2, pp. 678-682, Atlanta, GA, June1998[3] K. Y. Kim et al., “A 10-b, 100-MS/s CMOS A/D converter,” IEEE J. ofSolid-State Circuits, vol. 32, no. 3, pp. 302-311, March 1997

Page 2: [IEEE 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers - San Francisco, CA, USA (5-7 Feb. 2001)] 2001 IEEE International Solid-State Circuits Conference.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE

ISSCC 2001 / February 6, 2001 / Salon 1-6 / 3:15 PM

Figure 12.4.1: FDTS/DF system architecture. Figure 12.4.2: Four-tap analog delay line and buffers.

Figure 12.4.3: (a) DFE implementation; (b) Tree-like FFE/DFE summation. Figure 12.4.4: Phase-lock loop architecture.

Figure 12.4.5: EPI implementation and slicer input with EPI on/off. Figure 12.4.6: Measured bit error rate (BER).

Page 3: [IEEE 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers - San Francisco, CA, USA (5-7 Feb. 2001)] 2001 IEEE International Solid-State Circuits Conference.

Figure 12.4.7: Chip micrograph.

• 2001 IEEE International Solid-State Circuits Conference 0-7803-6608-5 ©2001 IEEE