# Area Efficient Phase Calibration of a 1.6 GHz Multiphase DLL

Ankur Agrawal\* IBM Research, Yorktown Heights, NY

Pavan Kumar Hanumolu Oregon State University, Corvallis, OR Harvard University, Cambridge, MA

Gu-Yeon Wei

Abstract—This paper describes a digital calibration scheme that corrects for phase spacing errors in a multiphase clock generating delay-locked loop (DLL). The calibration scheme employs sub-sampling using a frequency-offset clock with respect to the DLL reference clock, to measure phase-offsets. The phasecorrection circuit uses one digital-to-analog converter across eight variable-delay buffers to reduce the area consumption by 62%. The test-chip, designed in a 130nm CMOS process, demonstrates a 8-phase 1.6 GHz DLL with a worst-case phase error of 450 fs.

#### I. INTRODUCTION

Multiphase clock generators are widely used in clock and data recovery circuits (CDRs), time-to-digital converters, timeinterleaved analog-to-digital converters, and clock multiplication circuits. These generators often use delay-locked loops (DLL) to generate the multiphase clocks given their simplicity and small footprint. However, DLLs are susceptible to both systematic and random sources of mismatch that introduce phase spacing errors between adjacent clock phases [1]. These errors compromise the performance of the overlying system and calibration is required to compensate for them.

Analog calibration schemes [2], [3] are sensitive to process, voltage and temperature (PVT) variations that plague circuits in modern CMOS technologies. Various digital calibration techniques for multiphase clock generators have also been proposed [4]-[7]. However, digital calibration often has high area overhead. To reduce this area overhead, the authors in [6], [7] propose sharing one calibration logic block to calibrate the multiple phases out of the DLL. Paper [4] uses system-level performance information to perform phase error measurement with low overhead. However, these schemes still require multiple digital-to-phase converters to translate the digital word produced by the calibration logic into phase skew to perform the correction, making them area intensive.

This paper introduces a shared digital-to-analog converter (DAC) topology to implement area-efficient digital phase calibration of an 8-phase, 1.6-GHz DLL, which reduces worstcase phase spacing error from 37 ps to 450 fs.

#### **II. DIGITAL PHASE CALIBRATION ARCHITECTURE**

Fig. 1 shows a block diagram of the proposed phasecalibration scheme. The 8 un-calibrated clock phases ( $\Phi$ 1,2...8) out of the DLL feed a bank of 8 variable-delay buffers that can compensate phase spacing errors and produce equally-spaced clock phases ( $\Phi_C$ 1,2...8). The DLL in this prototype test-chip

\*This work was done while Ankur Agrawal was at Harvard University.



Fig. 1. DLL with shared D/A converter based phase calibration

is intended for a CDR application, and thus the 8 calibrated phases drive both the calibration circuitry and 8 interleaved datapath samplers. The digital calibration circuit first measures the phase error between adjacent clock phases and, depending on the sign of the error, increments or decrements the digital control word for the corresponding delay buffer. A clock, having a small frequency-offset with respect to the DLL reference clock, enables accurate measurement of the phase spacings between adjacent output clock phases of the DLL.

In contrast to most implementations where digital control bits directly drive digital variable-delay buffers, this design uses one DAC to generate the bias voltage for multiple analogcontrolled variable-delay buffers. This shared-DAC topology saves area while preserving high precision. This calibration loop operates independently and does not interfere with the datapath's operation.

The three main components of this calibration circuit are the digital-to-phase conversion circuit, the phase spacing error measurement circuit, and the digital calibration logic. The rest of this section describes each of these circuits in detail.

## A. Digital-to-Phase Conversion Circuit with Shared DAC

Fig. 2 presents details of the proposed variable-delay buffer. A 7-b DAC sets a bias voltage according to a digital control code. Two back-to-back current-starved inverters function as a variable-delay buffer that enables sub-picosecond correction resolution, provides more than  $\pm 50$  ps of range, and preserves monotonicity. While programmable capacitor-based variable delay buffers can be more compact, their small size comes at the expense of limited range or non-monotonicity. Exper-

## 978-1-4577-0223-5/11/\$26.00 ©2011 IEEE



Fig. 2. Variable-delay buffer: (a) schematic (b) experimental characterization

imentally measured delay vs. control code characterization, also shown in Fig. 2, verifies the current-starved buffer has sufficient range, resolution, and monotonicity. Due to the high area cost of each 7-b DAC, using eight of these would prohibitively increase overall area overhead for calibration. Instead, all eight buffers share a single DAC in a fashion similar to time-division multiplexing. A switch and a 75-fF metal-metal capacitor, constructed from 6 metal layers, enable this sharing. When the DAC does not actively drive a particular  $V_{\text{bias}}$  node, the capacitor acts as an analog memory element and holds the  $V_{\text{bias}}$  node steady. The switch and the capacitor only occupy  $45\mu m^2$ , in contrast to a 7-b DAC that occupies  $1800\mu m^2$ .

Fig. 3 illustrates how a single DAC toggles through the 8 digital registers and 8 delay buffers via select pulses regck[1:8] and bufck[1:8], respectively. The *bufck* pulses are narrower than *regck* pulses to allow the DAC to settle prior to setting the control voltage on the  $V_{bias}$  node of each buffer. Since the 75-fF capacitor cannot hold the node voltage indefinitely, periodic refreshing of the  $V_{bias}$  node is needed. The refresh interval must be short enough to minimize the accumulation of unwanted voltage wander, which otherwise translates to jitter on the calibrated clock phases.

Since the  $V_{\text{bias}}$  node is high-impedance most of the time, care must be taken to minimize capacitive coupling from other nodes. As shown in Fig. 2, two inverters connect to the top and bottom current sources, ensuring that the coupling from these inverters cancel each other to first order.

The 7-b current-mode DAC used in this design is a hybrid of binary-coded and thermometer-coded current branches. Thermometer-coded branches for the 3 MSBs ensure monotonicity in the code vs. current transfer function. A digital counter-based pulse-generation circuit produces the *regck[1:8]* and *bufck[1:8]* pulses needed for DAC sharing.



Fig. 3. DAC sharing and cyclic-bias refresh for variable-delay buffers

This shared DAC scheme can be integrated with any digital phase calibration technique to reduce the area consumption in its digital-to-phase conversion circuits.

B. Phase Spacing Error Measurement



Fig. 4. Sub-Sampling a clock with another frequency offset clock

In digital phase calibration schemes, the resolution for phase-error detection determines the maximum steady-state error after calibration. We employ sub-sampling [8] to perform high-resolution yet simple and flexible digital measurement. In this technique, the DLL output clock phases sample another clock with a small (<1000 ppm) frequency offset with respect to the reference clock frequency. As illustrated in Fig. 4, the output signal from each of these samplers is a sub-sampled clock whose frequency is equal to the frequency offset between the two clocks. The phase relationships between the subsampled clocks is proportional to those between the DLL output clocks by effectively stretching out the time-axis by a large factor, possibly exceeding 1000x. This enables highresolution digital measurement of the phase spacing mismatch by counting the number of reference clock cycles between adjacent sub-sampled clock rising edges. Fig. 5 presents a block diagram of the error-measurement circuit. Multiple clock phases out of the DLL sample the frequency offset clock. Two back-to-back flip-flops (MH) provide meta-stability hardening of the sampler outputs. Cycle-to-cycle jitter in the clocks creates bouncing transitions of the sub-sampled clocks. A



Fig. 5. Digital phase spacing measurement scheme

digital de-bouncing circuit, comprising a simple counter to calculate the average position of the rising transitions of the sub-sampled clocks, generates de-bounced clocks with clean transitions. The rising edges of these de-bounced clocks are the start and stop signals to a 8-bit saturating counter, that counts the number of reference clock cycles in each interval to produce a digital measure of phase spacing.

An 8-b counter provides a measurement resolution of  $\leq 0.5$  ps, well below the expected jitter on the output clock phases. However, it operates at the frequency of the reference clock (1.6 GHz), which makes it difficult for a standard-cell based counter in this technology to meet timing requirements. Instead, we implement a ripple-counter based on toggle flip-flops. Although the output bits of a ripple counter do not toggle at the same instant, this is not an issue in this design since the down-stream digital logic waits for the counter output to reliably settle to its final value.

The samplers are based on back-to-back StrongARM latches [9]. Offsets in the samplers add errors to the phase spacing measurements, since these offsets can advance or delay the rising edge of the sub-sampled clock. However, as the input clocks are full-swing signals with sharp edges, the residual errors are small.

## C. Digital Calibration Logic

The on-chip digital calibration logic, implemented using an automated design flow, compares the counter output with the ideal phase spacing count and, depending on the sign of the error, increments or decrements the control word for each clock phase. The counter output updates every period of the subsampled clock and a clock derived from the sub-sampled clock drives the digital calibration logic. Under typical operating conditions, the sub-sampled clock frequency is  $\sim 1$  MHz and the digital logic consumes very little power.

## III. EXPERIMENTAL RESULTS

The phase calibration test chip was fabricated in a  $0.13\mu$ m CMOS process. Measurements from multiple test chips demonstrate accurate calibration. Fig. 5 presents measured phase spacing error in the clock phases before and after calibration from 6 chips: 2 typical (TT) and 4 slow (SS) corner chips. The frequency offset of the calibration clock is 1 MHz. With calibration turned off, the maximum phase error is 37ps: 47% of the ideal phase spacing of 78.125ps. After calibration, this error reduces to 0.45ps (0.6%). A zoom-in of the phase-errors after calibration reveals a mean error of 0.3ps, which is an artifact of a small bug in the digital logic that inadvertently treats the zero phase error case the same as a negative error case.

For this circuit to be fully self-contained, a simple, lowquality, frequency-locked loop (FLL) may be included on-chip to generate the frequency-offset clock. The figure also plots calibration results for a scenario with 15ps of RMS noise added to the frequency-offset clock to mimic a low-quality FLL and demonstrates robust operation. While the jittered frequency-offset clock hardly affects the mean position of the calibrated clock phases, it impacts the jitter on these clock phases; the additional input jitter translates into noise in the phase error measurement that, in turn, manifests as jitter on the calibrated clock phases. Fig.7 plots this increase in jitter when varying amounts of additional jitter is added to the frequencyoffset clock and shows relatively small increase in the jitter on the phase-calibrated clocks .

Wander on the bias nodes of the buffers can add unwanted jitter to the calibrated clocks. Thus, to evaluate the performance impact of the shared-DAC scheme, Fig.8 plots peakto-peak jitter on the calibrated clock vs. refresh interval. We observe that peak-to-peak jitter does not increase appreciably



Fig. 6. Uncalibrated and Calibrated DLL DNL measurement from 6 chips



Fig. 7. Calibrated clock rms jitter vs. frequency-offset clock rms jitter

until the refresh interval is in the 10's of  $\mu$ s range. Given the refresh interval of 0.5  $\mu$ s under normal operating conditions, the cyclic-refresh scheme does not impose any performance penalties.

We summarize the area savings of the shared-DAC scheme in Table I. The circuits required to enable DAC sharing (MUX, cyclic refresh clock generators, switches and capacitors, DAC) consume the area equivalent to less than three DACs, while providing the functionality of eight DACs. The overall area savings in the buffers is 62% when compared to conventional digitally-controlled buffers. The digital logic consumes  $5000\mu m^2$ . While this implementation employed separate logic blocks to calibrate each of the phases, it is possible to share the circuitry and perform the calibration serially [6].

The test chip consumes 20 mW of power from a 1.2V supply, with an estimated 27% of the power consumed in the calibration circuits. Fig. 9 shows a micrograph of the test chip with floor-plan overlay.

TABLE I Area consumption of various circuit sub-blocks (in  $\mu m^2)$ 

| Current starved inverters (x8)  | 680  |
|---------------------------------|------|
| DAC                             | 1800 |
| Input MUX                       | 1400 |
| Output switch + capacitors (x8) | 360  |
| Cyclic refresh pulse generator  | 1500 |
| Digital Logic                   | 5000 |



Fig. 8. Calibrated clock jitter vs. refresh interval



Fig. 9. Chip micrograph with floor-plan overlay

#### **IV. CONCLUSION**

This paper describes an accurate phase calibration technique for multiphase clock generation circuits. The technique, applied to a DLL in our test-chip, employs sub-sampling for phase error measurement and an area-efficient shared-DAC topology for error correction. This technique reduces the DNL in phase spacings from 47% to less than 0.6%. The shared-DAC scheme reduced the area consumption of the digitallycontrolled variable delay buffers by more than 60%.

#### **ACKNOWLEDGMENTS**

The authors would like to thank UMC for chip fabrication.

#### REFERENCES

- A. Agrawal, P.K. Hanumolu, and G.Y. Wei, "A 8× 5 Gb/s sourcesynchronous receiver with clock generator phase error correction," *IEEE Custom Integrated Circuits Conference 2008*, pp. 459–462, 2008.
- [2] L. Wu and W.C. Black, "A Low-Jitter Skew-Calibrated Multi-Phase Clock Generator for Time-Interleaved Applications," in *Digest of Technical Papers. ISSCC*, 2001, pp. 396–397.
- [3] K-J. Hsiao and T-C. Lee, "A Low-Jitter 8-to-10GHz Distributed DLL for Multiple-Phase Clock Generation," in *Digest of Technical Papers. ISSCC*, 2008, pp. 514–515.
- [4] M. El-Chammas and B. Murmann, "A 12-GS/s 81-mW 5-bit Time-Interleaved Flash ADC with Background Timing Skew Calibration," in VLSI Symposium, 2010, pp. 157–158.
- [5] V. Balan, J. Caroselli, and J. Chern, "A 4. 8-6. 4-Gb/s serial link for backplane applications using decision feedback equalization," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 9, pp. 1957–1967, 2005.
- [6] F. Baronti, D. Lunardini, R. Roncella, and R. Saletti, "A self-calibrating delay-locked delay line with shunt-capacitor circuit scheme," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 2, pp. 385, 2004.
- [7] H.H. Chang, J.Y. Chang, C.Y. Kuo, and S.I. Liu, "A 0.7–2-GHz Self-Calibrated Multiphase Delay-Locked Loop," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 5, pp. 1051, 2006.
- [8] P.K. Das, B. Amrutur, J. Sridhar, and V. Visvanathan, "On-chip clock network skew measurement using sub-sampling," in *IEEE Asian Solid-State Circuits Conference*, 2008. A-SSCC'08, 2008, pp. 401–404.
- [9] PK Hanumolu, G.Y. Wei, and U. Moon, "A Wide-Tracking Range Clock and Data Recovery Circuit," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 2, pp. 425–439, 2008.