[IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea...

4
A 0.67μW/MHz, 5ps Jitter, 4 Locking Cycles, 65nm ADDLL Jinn-Shyan Wang, Chun-Yuan Cheng, Yu-Chia Liu, and Yi-Ming Wang Dept. of EE/SOC Research Center, Nat l Chung-Cheng University, Taiwan Abstract-This paper presents the design of a 1.0V 150-550MHz 65nm ADDLL using a novel coarse-fine architecture and differential circuit techniques. When running at 550MHz, this nanometer ADDLL achieves a peak-to-peak jitter of only 5ps with the shortest 4 locking cycles, while consumes only 0.67μW/MHz, about 72% reduction compared to the existing most power efficient ADDLL. I. INTRODUCTION Low power, small jitter, and fast lock-in are three equally important design goals of the delay locked loop (DLL). All- digital design techniques provide promising solutions to conquer the challenge from process variations that submicron DLLs [1]-[5] faced. Entering the nanometer era, the problem of process variations gets worse. In addition to all-digital techniques, we design a 65nm ADDLL by using a novel coarse-fine architecture for fast lock-in and low power and using differential circuit techniques for small jitter and high tolerance to process variations. The implemented ADDLL achieves the highest power efficiency with the smallest peak-to-peak jitter and the least locking cycles, compared to state-of-the-art ADDLLs [1]-[5]. II. ARCHITECTURE AND LOCKING OPERATIONS The proposed new coarse-fine architecture, shown in Fig. 1, is described first. It integrates an i mproved h alf-d elay-line s kew-c ompensation circuit (iHDSC), a phase-detector (PD), two fine delay lines (FDLs), a SAR loop controller, and a loadable shift register. The HDSC [6] is an open-loop circuit. It can get coarsely locked within two cycles, and is power efficient because the active cells in the lock-in state can be MUX MUX MUX MUX MUX MUX 1 0 1 0 DMUX DMUX DMUX DMUX MUX minimized. The iHDSC is evolved from the HDSC with the priority encoder in the HDSC replaced with a thermometer coder. The main reason for this replacement is that the priority encoder has a long critical path, and then the maximal clock frequency (f max ) of the HDSC is limited. We notice that the output of the time-to-digital converter always contains some consecutive “1” bits followed by consecutive “0” bits. If the thermometer coder is used instead, the function is not changed but the critical path is shortened. The salient features of fast lock-in and low power are still reserved in the iHDSC. Fig. 2 shows the timing diagram of the proposed ADDLL. Once the reset signal is released, the ADDLL begins to perform coarse locking operation by the iHDSC and then fine locking operation by other components. Please refer to [6] for the coarse locking operation. In the end of coarse locking, a control word is loaded into the shift register to control the path delay provided by the coarse delay line (CDL). Afterwards, the phase detector (PD) is activated, and the SAR controller performs binary successive approximation search for finding the control word for the 4-bit fine delay line (FDL). IEEE Asian Solid-State Circuits Conference 1-4244-1360-5/07/$25.00 2007 IEEE November 12-14, 2007 / Jeju, Korea 11-2 300

Transcript of [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea...

Page 1: [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea (2007.11.12-2007.11.14)] 2007 IEEE Asian Solid-State Circuits Conference - A 0.67μW/MHz, 5ps jitter,

A 0.67W/MHz, 5ps Jitter, 4 Locking Cycles, 65nm

ADDLL Jinn-Shyan Wang, Chun-Yuan Cheng, Yu-Chia Liu, and Yi-Ming Wang

Dept. of EE/SOC Research Center, Nat’l Chung-Cheng University, Taiwan

Abstract-This paper presents the design of a 1.0V 150-550MHz

65nm ADDLL using a novel coarse-fine architecture and differential

circuit techniques. When running at 550MHz, this nanometer

ADDLL achieves a peak-to-peak jitter of only 5ps with the shortest 4

locking cycles, while consumes only 0.67W/MHz, about 72%

reduction compared to the existing most power efficient ADDLL.

I. INTRODUCTION

Low power, small jitter, and fast lock-in are three equally

important design goals of the delay locked loop (DLL). All-

digital design techniques provide promising solutions to

conquer the challenge from process variations that submicron

DLLs [1]-[5] faced. Entering the nanometer era, the problem

of process variations gets worse. In addition to all-digital

techniques, we design a 65nm ADDLL by using a novel

coarse-fine architecture for fast lock-in and low power and

using differential circuit techniques for small jitter and high

tolerance to process variations. The implemented ADDLL

achieves the highest power efficiency with the smallest

peak-to-peak jitter and the least locking cycles, compared to

state-of-the-art ADDLLs [1]-[5].

II. ARCHITECTURE AND LOCKING OPERATIONS

The proposed new coarse-fine architecture, shown in Fig.

1, is described first. It integrates an improved half-delay-line

skew-compensation circuit (iHDSC), a phase-detector (PD),

two fine delay lines (FDLs), a SAR loop controller, and a

loadable shift register. The HDSC [6] is an open-loop circuit.

It can get coarsely locked within two cycles, and is power

efficient because the active cells in the lock-in state can be

MU

XM

UX

MU

X

MU

X

MU

X

MU

X

10 1 0

DM

UX

DM

UX

DM

UX

DM

UX

MU

X

minimized. The iHDSC is evolved from the HDSC with the

priority encoder in the HDSC replaced with a thermometer

coder. The main reason for this replacement is that the priority

encoder has a long critical path, and then the maximal clock

frequency (fmax) of the HDSC is limited. We notice that the

output of the time-to-digital converter always contains some

consecutive “1” bits followed by consecutive “0” bits. If the

thermometer coder is used instead, the function is not changed

but the critical path is shortened. The salient features of fast

lock-in and low power are still reserved in the iHDSC.

Fig. 2 shows the timing diagram of the proposed ADDLL.

Once the reset signal is released, the ADDLL begins to

perform coarse locking operation by the iHDSC and then fine

locking operation by other components. Please refer to [6] for

the coarse locking operation. In the end of coarse locking, a

control word is loaded into the shift register to control the

path delay provided by the coarse delay line (CDL).

Afterwards, the phase detector (PD) is activated, and the SAR

controller performs binary successive approximation search

for finding the control word for the 4-bit fine delay line (FDL).

IEEE Asian Solid-State Circuits Conference

1-4244-1360-5/07/$25.00 2007 IEEE

November 12-14, 2007 / Jeju, Korea11-2

300

Page 2: [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea (2007.11.12-2007.11.14)] 2007 IEEE Asian Solid-State Circuits Conference - A 0.67μW/MHz, 5ps jitter,

Ck_extCk_int

Reset

TDC_startTDC_stop

0/180init / Sel

TDC[n-1:0] TC[n-1:0] BR[n-1:0]

Load S[n-1:0] FS[3:0]

lead lag

000.. 0000000.. 0000000.. 0000

000.. 00011000

Operation Measurement Phase-align Fine-lock Maintenance

000.. 0111000.. 0100001.. 0000

001.. 00000100 0110 0111 0111

Initialization

6 Cycles

0111 0111

2 Cycles

£

Start

1000

0001 00000011 00100101 01000111 01101001 10001011 10101101 11001111 1110

0001laglead

0011010101111001101111011111

0010011010101110

01001100

lagleadlagleadlagleadlaglead

lagleadlaglead

laglead

4-bit Binary Search diagram

lagleadlagleadlagleadlagleadlagleadlagleadlaglead

Fig. 2. Operation waveforms and the binary search diagram

The last step takes at most 4 clock cycles to achieve fine

lock-in. Overall, the proposed ADDLL takes at most 6 clock

cycles for lock-in from releasing the reset signal.

III. PHASE DETECTOR

The phase resolution and the operating speed are key

design parameters of the PD. The PD constructed with two

dynamic latches [7] (circuit A in Fig. 3(a)) can achieve high

speed and high resolution for the charge-pump based analog

DLL application. Unfortunately, the output signals of this

dynamic PD are short pulses with the pulse width being

proportional to the phase difference (waveforms in Fig. 3(a)).

When the phase difference approaches zero during the locking

process, the pulse width will become too narrow to drive a

large load it may face. Our analysis shows that we can not

directly apply it in the proposed ADDLL.

The proposed PD is based on the PD in [7] to take its

advantages of high phase resolution and high operating speed,

but several new designs are added to make it capable of

driving large load, providing higher resolution and being

more robust to process variations. The evolution of the new

PD is shown in Fig. 3(b). First, an NC2MOS latch plus an

inverter is attached to each output of circuit A, and other

inverters are added at the input stage to provide clocking

signals for the NC2MOS latches. Now, the PD becomes

circuit B. Because of the latching operation, the pulse width

of the output signals is always stretched to half of the clock

cycle no matter how large the phase difference is. If the clock

frequency is low, the lengthened pulse width is long enough

to drive a large load. In case of a high clock frequency, the

buffer in the dynamic latch can also be sized up to enhance

the driving capability. The phase resolution is still determined

by the input latch and not affected by the NC2MOS output

latch. Second, two inverters with one capacitor in between are

added to generate the new data input for the input latch, and

the PD becomes circuit C. With the extra capacitor, the data

input will arrive later with an intentional skew δ. If the phase

error between int and ext is Δ, the equivalent phase error

detected by the input latch now becomes “Δ+δ”. In other

words, if the minimal detectable phase error of the input latch

is ε, the equivalent minimal detectable phase error between int

and ext now becomes “ε-δ.” This means that the phase

resolution is increased by δ.

(a)

(b)

Fig. 3. (a) Conventional PD, and (b)evolution of the proposed PD

Table I Performance comparisons of different PDs

VDD: 1V

Temp:25°C Process:TT

VDD: 1V Temp:125°C Process: SS

VDD: 0.9V Temp: 125°C Process: SS

VDD: 1V Temp: 25°C Process: TT

Freq:550MHz Circuit_ Resolution resolution resolution

Normalized

Driving cap. A 10ps 19.2ps 33.0ps 1.00 12.6m/65nm B 8ps 14.5ps 25.4ps 0.77 200m/65nm D 5ps 7.8ps 12.8ps 0.39 220m/65nm

301

Page 3: [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea (2007.11.12-2007.11.14)] 2007 IEEE Asian Solid-State Circuits Conference - A 0.67μW/MHz, 5ps jitter,

However, due to extra capacitors, the two paths from

node ext to nodes B and C and the two paths from node int to

nodes A and D are not balanced. When facing PVT variations,

the delay variations of two paths with and without intentional

skew are different. To increase the capability against PVT

variations, we make the circuitry differential by adding unit

capacitance to each path (circuit D in Fig. 3(b)). Simulation

results shown in Table I validate the effectiveness of the

differential circuitry. At the worst corner, the phase resolution

of the proposed design is 12.8ps, which is 61% smaller

compared to 33.0ps of the conventional PD. Moreover,

simulation results shown in the last column of Table I indicate

that the driving capability of the proposed PD (circuit D) is

about 17 times higher than that of the dynamic PD [7], in

terms of the maximal drivable MOS transistor size.

IV. DELAY LINES

The main design considerations of the delay line include

power consumption, delay resolution, and tolerance to PVT

variations. Power consumption of the delay line should be

minimized because it is the main power source in the locked

state. The delay resolution must be small enough because it

affects the jitter of the generated clock signal. However for a

nanometer design, the most important design requirement

should be tolerance to PVT variations. The proposed ADDLL

adopts a coarse delay line and a fine delay line. We carefully

select and design delay cells to achieve the goals of high PVT

tolerance, high resolution, and low power simultaneously.

First of all, we study the characteristics of two state-of-

the-art fine delay cells to see which one is suitable for the

proposed FDL. One is a differential delay cell [2] proposed by

one of the authors of this paper, and the other is a

state-of-the-art single-ended delay cell [5]. Two kinds of 4-bit

FDL are designed as shown in Fig. 4(a). Evaluation results are

listed in Table II. Both FDLs are designed with a resolution of

8.5ps and a delay range of about 0ps~130ps at the typical

operating corner. When delay cells are operated at the worst

corner, i.e. (0.9V, 125°C, SS), the resolution variation of the

differential delay cell is 46%, but that of the single-ended

delay cell is enlarged as high as 111%. This result indicates

that the nanometer design should adopt the differential

circuitry for the delay cell as the phase detector does.

For the CDL, we may intuitively construct it also with

differential delay cells (such as CDL1 in Fig. 4(b)) like the

work [2] did. However, the FDL has taken the responsibility

of providing high delay resolution and process tolerance.

Therefore, reducing power consumption is the most important

design concern for the CDL. We propose to construct the

CDL with single-ended delay cells evolving from that used in

[6], and the proposed CDL is shown as CDL2 in Fig. 4(b).

Performance evaluation results are shown in Table III. The

proposed CDL gets power saving of 56~82% depending on

the number of active cells. The advantage of power reduction

comes from less triggered devices along the signal path in the

proposed CDL.

This Work

[5] (JSSC 07)

In Out

Sel<3:0>

Delay CellDemux

x1

x5.4

x1

x10.2

x1

x2

x1

x3.1

Out

D4~D3D1~D0

T01

T11

T21

T01

T11

T21Bina

ry-to-

Ther

mome

ter D

ecod

er T31

T41

T51

T31

T41

T51

Bina

ry-to-

Ther

mome

ter D

ecod

erT01

In Out

T11 T21

T01 T11 T21

T51 T41 T31

T51 T41 T31

(a)

(b)

Fig. 4. (a) FDLs, and (b) CDLs

Table II Fine delay cell resolution at different corners VDD: 1V

Temp: 25°C Process: TT

VDD: 1V Temp: 125°C Process: SS

VDD: 0.9V Temp: 125°C Process: SS Delay cells

resolution resolution variation resolution variation Single-ended 8.5p 14.1p 66% 17.9p 111% Differential 8.5p 10.4p 22% 12.4p 46%

Table III Comparison of power consumption of different CDLs Type

Feature Differential cell Single-ended cell

# of active cell 20 1 20 1 Delay time 2.79ns 137ps 2.65ns 132ps

Power @ 550MHz 0.9466mW 0.4309mW 0.4185mW 0.0778mW

302

Page 4: [IEEE 2007 IEEE Asian Solid-State Circuits Conference - Jeju City, South Korea (2007.11.12-2007.11.14)] 2007 IEEE Asian Solid-State Circuits Conference - A 0.67μW/MHz, 5ps jitter,

V. EXPERIMENTAL RESULTS

The proposed ADDLL is realized with an industrial 65nm

1.0V CMOS technology, as shown in Fig. 5. PD, CDL, and

FDL adopt full-custom design, but the controller is realized

by cell-based design. The maximal measured clock frequency

is 550MHz, as shown in the left part of Fig. 6. In this case, the

initial phase error of the experimental setup is measured to be

490ps, the measured peak-to-peak jitter is only 5ps, and the

ADDLL achieves lock-in within 4 clock cycles. The number

of active coarse delay cells is estimated to be 4 by simulations.

The measured minimal clock frequency is 150MHz with a 4ps

jitter and 4 lock-in cycles, as shown in the right part of Fig. 6.

Fig. 5. The chip micrograph

1V / 150MHz

CK_ext(O-PAD)CK_int

(O-PAD)

reset

4 Cycle

p-p jitter = 4 ps

1V / 550MHz

CK_ext(O-PAD)CK_int

(O-PAD)

reset

4 Cycle

p-p jitter = 5 ps

Fig. 6. Measurement results

Table IV Performance comparisons [1] (’03) [2] (’05) [3] (’05) [4] (’06) [5] (’07) This Work

Process 0.13m 0.25m 0.18m 0.35m 0.18m 65nm Voltage 1.8V 1.0V 1.8V 3.3V 1.8V 1.0V

Resolution n.a. n.a. <28 ps <30 ps 10 ps 8.5 ps Core area

(mm2) 0.4641 0.09538 0.88 0.075 0.2 0.01189

fmax 400MHz 100MHz 700MHz 260MHz 550MHz 550MHzfmin 66MHz 100MHz 2MHz 140MHz 40MHz 150MHz

Lock Time <150 cycles 8 cycles 32 cycles 10 cycles 14 cycles@fmax

4 cycles @ fmax

p-p Jitter ≈20ps 30ps 17.6ps @ fmax

24.4ps @250MHz

12ps @ fmax

5ps @ fmax

Power 24mW @fmax

0.243mW @fmax

23mW @fmax

9.9mW @250MHz

12.6mW @fmax

0.368mW @fmax

Power index

(W/MHz) 60 2.43 32.85 39.6 22.9 0.67

VI. PERFORMANCE COMPARISONS AND CONCLUSIONS

Performance comparisons are illustrated in Table IV. At

550MHz, this 65nm ADDLL consumes only 0.67W/MHz,

about 72% reduction compared to the most power efficient

ADDLL [2]. It also achieves the fastest lock-in and smallest

jitter, as compared to all the conventional ADDLLs [1]-[5]. To

the best of our knowledge, this ADDLL is the first reported

65nm design, and it occupies the smallest core area.

ACKNOWLEDGEMENT The authors thank the National Science Council, the

Ministry of Economic Affairs, and the NSoC Project of

Taiwan for funding, and also thank UMC for supporting chip

fabrication.

REFERENCES [1] J.-T. Kwak, C.-K. Kwon, K.-W. Kim, S.-H. Lee, and J.-S. Kih,

“Low cost high performance register-controlled digital DLL for

1Gbps x32 DDR SDRAM,” in Symp. VLSI Circuits Dig. Tech.

Papers, 2003, pp.283-284.

[2] Jinn-Shyan Wang, Yi-Ming Wang, Chin-Hao Chen, and Yu-Chia

Liu, “An ultra low power, fast lock-in, small jitter, all digital

delay locked loop,” ISSCC Dig. Tech. Papers, 22.7, 2005.

[3] H.-H. Chang, J.-W. Lin, C.-Y. Yang, and Shen-Iuan Liu, “A

wide-range and fast-locking all-digital cycle- controlled

delay-locked loop,” IEEE J. Solid-State Circuits, vol. 40, no. 3,

pp.661-370, Mar. 2005.

[4] You-Jen Wang, Shao-Ku Kao, and Shen-Iuan Liu, “All-digital

delay-locked loop/pulsewidth-control loop with adjustable duty

cycles,“ IEEE J. Solid-State Circuits, vol. 41, no. 6,

pp.1262-1274, June 2006.

[5] Rong-Jyi Yang and Shen-Iuan Liu, “A 40-550MHz

harmonic-free all-digital delay-locked loop using a variable SAR

algorithm,” IEEE J. Solid-State Circuits, vol. 42, no. 2,

pp.361-373, Feb. 2007.

[6] Yi-Ming Wang and Jinn-Shyan Wang, “A low-power

half-delay-line fast skew-compensation circuit,” IEEE J.

Solid-State Circuits, vol. 39, pp. 906-918, June 2004.

[7] Ching-Yuan Yang and Shen-Iuan Liu, “A one-wire approach for

skew-compensating clock distribution based on bidirectional

techniques,” IEEE J. Solid-State Circuit, vol. 36, no. 2 pp.

266-272, Feb. 2001.

303