[IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea...
Transcript of [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea...
A 0.67W/MHz, 5ps Jitter, 4 Locking Cycles, 65nm
ADDLL Jinn-Shyan Wang, Chun-Yuan Cheng, Yu-Chia Liu, and Yi-Ming Wang
Dept. of EE/SOC Research Center, Nat’l Chung-Cheng University, Taiwan
Abstract-This paper presents the design of a 1.0V 150-550MHz
65nm ADDLL using a novel coarse-fine architecture and differential
circuit techniques. When running at 550MHz, this nanometer
ADDLL achieves a peak-to-peak jitter of only 5ps with the shortest 4
locking cycles, while consumes only 0.67W/MHz, about 72%
reduction compared to the existing most power efficient ADDLL.
I. INTRODUCTION
Low power, small jitter, and fast lock-in are three equally
important design goals of the delay locked loop (DLL). All-
digital design techniques provide promising solutions to
conquer the challenge from process variations that submicron
DLLs [1]-[5] faced. Entering the nanometer era, the problem
of process variations gets worse. In addition to all-digital
techniques, we design a 65nm ADDLL by using a novel
coarse-fine architecture for fast lock-in and low power and
using differential circuit techniques for small jitter and high
tolerance to process variations. The implemented ADDLL
achieves the highest power efficiency with the smallest
peak-to-peak jitter and the least locking cycles, compared to
state-of-the-art ADDLLs [1]-[5].
II. ARCHITECTURE AND LOCKING OPERATIONS
The proposed new coarse-fine architecture, shown in Fig.
1, is described first. It integrates an improved half-delay-line
skew-compensation circuit (iHDSC), a phase-detector (PD),
two fine delay lines (FDLs), a SAR loop controller, and a
loadable shift register. The HDSC [6] is an open-loop circuit.
It can get coarsely locked within two cycles, and is power
efficient because the active cells in the lock-in state can be
MU
XM
UX
MU
X
MU
X
MU
X
MU
X
10 1 0
DM
UX
DM
UX
DM
UX
DM
UX
MU
X
minimized. The iHDSC is evolved from the HDSC with the
priority encoder in the HDSC replaced with a thermometer
coder. The main reason for this replacement is that the priority
encoder has a long critical path, and then the maximal clock
frequency (fmax) of the HDSC is limited. We notice that the
output of the time-to-digital converter always contains some
consecutive “1” bits followed by consecutive “0” bits. If the
thermometer coder is used instead, the function is not changed
but the critical path is shortened. The salient features of fast
lock-in and low power are still reserved in the iHDSC.
Fig. 2 shows the timing diagram of the proposed ADDLL.
Once the reset signal is released, the ADDLL begins to
perform coarse locking operation by the iHDSC and then fine
locking operation by other components. Please refer to [6] for
the coarse locking operation. In the end of coarse locking, a
control word is loaded into the shift register to control the
path delay provided by the coarse delay line (CDL).
Afterwards, the phase detector (PD) is activated, and the SAR
controller performs binary successive approximation search
for finding the control word for the 4-bit fine delay line (FDL).
IEEE Asian Solid-State Circuits Conference
1-4244-1360-5/07/$25.00 2007 IEEE
November 12-14, 2007 / Jeju, Korea11-2
300
Ck_extCk_int
Reset
TDC_startTDC_stop
0/180init / Sel
TDC[n-1:0] TC[n-1:0] BR[n-1:0]
Load S[n-1:0] FS[3:0]
lead lag
000.. 0000000.. 0000000.. 0000
000.. 00011000
Operation Measurement Phase-align Fine-lock Maintenance
000.. 0111000.. 0100001.. 0000
001.. 00000100 0110 0111 0111
Initialization
6 Cycles
0111 0111
2 Cycles
£
Start
1000
0001 00000011 00100101 01000111 01101001 10001011 10101101 11001111 1110
0001laglead
0011010101111001101111011111
0010011010101110
01001100
lagleadlagleadlagleadlaglead
lagleadlaglead
laglead
4-bit Binary Search diagram
lagleadlagleadlagleadlagleadlagleadlagleadlaglead
Fig. 2. Operation waveforms and the binary search diagram
The last step takes at most 4 clock cycles to achieve fine
lock-in. Overall, the proposed ADDLL takes at most 6 clock
cycles for lock-in from releasing the reset signal.
III. PHASE DETECTOR
The phase resolution and the operating speed are key
design parameters of the PD. The PD constructed with two
dynamic latches [7] (circuit A in Fig. 3(a)) can achieve high
speed and high resolution for the charge-pump based analog
DLL application. Unfortunately, the output signals of this
dynamic PD are short pulses with the pulse width being
proportional to the phase difference (waveforms in Fig. 3(a)).
When the phase difference approaches zero during the locking
process, the pulse width will become too narrow to drive a
large load it may face. Our analysis shows that we can not
directly apply it in the proposed ADDLL.
The proposed PD is based on the PD in [7] to take its
advantages of high phase resolution and high operating speed,
but several new designs are added to make it capable of
driving large load, providing higher resolution and being
more robust to process variations. The evolution of the new
PD is shown in Fig. 3(b). First, an NC2MOS latch plus an
inverter is attached to each output of circuit A, and other
inverters are added at the input stage to provide clocking
signals for the NC2MOS latches. Now, the PD becomes
circuit B. Because of the latching operation, the pulse width
of the output signals is always stretched to half of the clock
cycle no matter how large the phase difference is. If the clock
frequency is low, the lengthened pulse width is long enough
to drive a large load. In case of a high clock frequency, the
buffer in the dynamic latch can also be sized up to enhance
the driving capability. The phase resolution is still determined
by the input latch and not affected by the NC2MOS output
latch. Second, two inverters with one capacitor in between are
added to generate the new data input for the input latch, and
the PD becomes circuit C. With the extra capacitor, the data
input will arrive later with an intentional skew δ. If the phase
error between int and ext is Δ, the equivalent phase error
detected by the input latch now becomes “Δ+δ”. In other
words, if the minimal detectable phase error of the input latch
is ε, the equivalent minimal detectable phase error between int
and ext now becomes “ε-δ.” This means that the phase
resolution is increased by δ.
(a)
(b)
Fig. 3. (a) Conventional PD, and (b)evolution of the proposed PD
Table I Performance comparisons of different PDs
VDD: 1V
Temp:25°C Process:TT
VDD: 1V Temp:125°C Process: SS
VDD: 0.9V Temp: 125°C Process: SS
VDD: 1V Temp: 25°C Process: TT
Freq:550MHz Circuit_ Resolution resolution resolution
Normalized
Driving cap. A 10ps 19.2ps 33.0ps 1.00 12.6m/65nm B 8ps 14.5ps 25.4ps 0.77 200m/65nm D 5ps 7.8ps 12.8ps 0.39 220m/65nm
301
However, due to extra capacitors, the two paths from
node ext to nodes B and C and the two paths from node int to
nodes A and D are not balanced. When facing PVT variations,
the delay variations of two paths with and without intentional
skew are different. To increase the capability against PVT
variations, we make the circuitry differential by adding unit
capacitance to each path (circuit D in Fig. 3(b)). Simulation
results shown in Table I validate the effectiveness of the
differential circuitry. At the worst corner, the phase resolution
of the proposed design is 12.8ps, which is 61% smaller
compared to 33.0ps of the conventional PD. Moreover,
simulation results shown in the last column of Table I indicate
that the driving capability of the proposed PD (circuit D) is
about 17 times higher than that of the dynamic PD [7], in
terms of the maximal drivable MOS transistor size.
IV. DELAY LINES
The main design considerations of the delay line include
power consumption, delay resolution, and tolerance to PVT
variations. Power consumption of the delay line should be
minimized because it is the main power source in the locked
state. The delay resolution must be small enough because it
affects the jitter of the generated clock signal. However for a
nanometer design, the most important design requirement
should be tolerance to PVT variations. The proposed ADDLL
adopts a coarse delay line and a fine delay line. We carefully
select and design delay cells to achieve the goals of high PVT
tolerance, high resolution, and low power simultaneously.
First of all, we study the characteristics of two state-of-
the-art fine delay cells to see which one is suitable for the
proposed FDL. One is a differential delay cell [2] proposed by
one of the authors of this paper, and the other is a
state-of-the-art single-ended delay cell [5]. Two kinds of 4-bit
FDL are designed as shown in Fig. 4(a). Evaluation results are
listed in Table II. Both FDLs are designed with a resolution of
8.5ps and a delay range of about 0ps~130ps at the typical
operating corner. When delay cells are operated at the worst
corner, i.e. (0.9V, 125°C, SS), the resolution variation of the
differential delay cell is 46%, but that of the single-ended
delay cell is enlarged as high as 111%. This result indicates
that the nanometer design should adopt the differential
circuitry for the delay cell as the phase detector does.
For the CDL, we may intuitively construct it also with
differential delay cells (such as CDL1 in Fig. 4(b)) like the
work [2] did. However, the FDL has taken the responsibility
of providing high delay resolution and process tolerance.
Therefore, reducing power consumption is the most important
design concern for the CDL. We propose to construct the
CDL with single-ended delay cells evolving from that used in
[6], and the proposed CDL is shown as CDL2 in Fig. 4(b).
Performance evaluation results are shown in Table III. The
proposed CDL gets power saving of 56~82% depending on
the number of active cells. The advantage of power reduction
comes from less triggered devices along the signal path in the
proposed CDL.
This Work
[5] (JSSC 07)
In Out
Sel<3:0>
Delay CellDemux
x1
x5.4
x1
x10.2
x1
x2
x1
x3.1
Out
D4~D3D1~D0
T01
T11
T21
T01
T11
T21Bina
ry-to-
Ther
mome
ter D
ecod
er T31
T41
T51
T31
T41
T51
Bina
ry-to-
Ther
mome
ter D
ecod
erT01
In Out
T11 T21
T01 T11 T21
T51 T41 T31
T51 T41 T31
(a)
(b)
Fig. 4. (a) FDLs, and (b) CDLs
Table II Fine delay cell resolution at different corners VDD: 1V
Temp: 25°C Process: TT
VDD: 1V Temp: 125°C Process: SS
VDD: 0.9V Temp: 125°C Process: SS Delay cells
resolution resolution variation resolution variation Single-ended 8.5p 14.1p 66% 17.9p 111% Differential 8.5p 10.4p 22% 12.4p 46%
Table III Comparison of power consumption of different CDLs Type
Feature Differential cell Single-ended cell
# of active cell 20 1 20 1 Delay time 2.79ns 137ps 2.65ns 132ps
Power @ 550MHz 0.9466mW 0.4309mW 0.4185mW 0.0778mW
302
V. EXPERIMENTAL RESULTS
The proposed ADDLL is realized with an industrial 65nm
1.0V CMOS technology, as shown in Fig. 5. PD, CDL, and
FDL adopt full-custom design, but the controller is realized
by cell-based design. The maximal measured clock frequency
is 550MHz, as shown in the left part of Fig. 6. In this case, the
initial phase error of the experimental setup is measured to be
490ps, the measured peak-to-peak jitter is only 5ps, and the
ADDLL achieves lock-in within 4 clock cycles. The number
of active coarse delay cells is estimated to be 4 by simulations.
The measured minimal clock frequency is 150MHz with a 4ps
jitter and 4 lock-in cycles, as shown in the right part of Fig. 6.
Fig. 5. The chip micrograph
1V / 150MHz
CK_ext(O-PAD)CK_int
(O-PAD)
reset
4 Cycle
p-p jitter = 4 ps
1V / 550MHz
CK_ext(O-PAD)CK_int
(O-PAD)
reset
4 Cycle
p-p jitter = 5 ps
Fig. 6. Measurement results
Table IV Performance comparisons [1] (’03) [2] (’05) [3] (’05) [4] (’06) [5] (’07) This Work
Process 0.13m 0.25m 0.18m 0.35m 0.18m 65nm Voltage 1.8V 1.0V 1.8V 3.3V 1.8V 1.0V
Resolution n.a. n.a. <28 ps <30 ps 10 ps 8.5 ps Core area
(mm2) 0.4641 0.09538 0.88 0.075 0.2 0.01189
fmax 400MHz 100MHz 700MHz 260MHz 550MHz 550MHzfmin 66MHz 100MHz 2MHz 140MHz 40MHz 150MHz
Lock Time <150 cycles 8 cycles 32 cycles 10 cycles 14 cycles@fmax
4 cycles @ fmax
p-p Jitter ≈20ps 30ps 17.6ps @ fmax
24.4ps @250MHz
12ps @ fmax
5ps @ fmax
Power 24mW @fmax
0.243mW @fmax
23mW @fmax
9.9mW @250MHz
12.6mW @fmax
0.368mW @fmax
Power index
(W/MHz) 60 2.43 32.85 39.6 22.9 0.67
VI. PERFORMANCE COMPARISONS AND CONCLUSIONS
Performance comparisons are illustrated in Table IV. At
550MHz, this 65nm ADDLL consumes only 0.67W/MHz,
about 72% reduction compared to the most power efficient
ADDLL [2]. It also achieves the fastest lock-in and smallest
jitter, as compared to all the conventional ADDLLs [1]-[5]. To
the best of our knowledge, this ADDLL is the first reported
65nm design, and it occupies the smallest core area.
ACKNOWLEDGEMENT The authors thank the National Science Council, the
Ministry of Economic Affairs, and the NSoC Project of
Taiwan for funding, and also thank UMC for supporting chip
fabrication.
REFERENCES [1] J.-T. Kwak, C.-K. Kwon, K.-W. Kim, S.-H. Lee, and J.-S. Kih,
“Low cost high performance register-controlled digital DLL for
1Gbps x32 DDR SDRAM,” in Symp. VLSI Circuits Dig. Tech.
Papers, 2003, pp.283-284.
[2] Jinn-Shyan Wang, Yi-Ming Wang, Chin-Hao Chen, and Yu-Chia
Liu, “An ultra low power, fast lock-in, small jitter, all digital
delay locked loop,” ISSCC Dig. Tech. Papers, 22.7, 2005.
[3] H.-H. Chang, J.-W. Lin, C.-Y. Yang, and Shen-Iuan Liu, “A
wide-range and fast-locking all-digital cycle- controlled
delay-locked loop,” IEEE J. Solid-State Circuits, vol. 40, no. 3,
pp.661-370, Mar. 2005.
[4] You-Jen Wang, Shao-Ku Kao, and Shen-Iuan Liu, “All-digital
delay-locked loop/pulsewidth-control loop with adjustable duty
cycles,“ IEEE J. Solid-State Circuits, vol. 41, no. 6,
pp.1262-1274, June 2006.
[5] Rong-Jyi Yang and Shen-Iuan Liu, “A 40-550MHz
harmonic-free all-digital delay-locked loop using a variable SAR
algorithm,” IEEE J. Solid-State Circuits, vol. 42, no. 2,
pp.361-373, Feb. 2007.
[6] Yi-Ming Wang and Jinn-Shyan Wang, “A low-power
half-delay-line fast skew-compensation circuit,” IEEE J.
Solid-State Circuits, vol. 39, pp. 906-918, June 2004.
[7] Ching-Yuan Yang and Shen-Iuan Liu, “A one-wire approach for
skew-compensating clock distribution based on bidirectional
techniques,” IEEE J. Solid-State Circuit, vol. 36, no. 2 pp.
266-272, Feb. 2001.
303