# ADC-Based Backplane Receiver Design-Space Exploration

Hayun Chung, Member, IEEE, and Gu-Yeon Wei, Member, IEEE

Abstract—Demand for higher throughput backplane communications, coupled with a desire for design portability and flexibility, has led to high-speed backplane receivers that use front-end analog-to-digital converters (ADCs) and digital equalization. Unfortunately, power and complexity of such receivers can be high and require careful design. This paper presents a parameterized ADC-based backplane receiver model that facilitates design-space exploration to optimize the tradeoffs between power and performance—an accurate behavioral model of front-end ADCs is presented for performance estimation and detailed power models for the digital equalizer (EO) blocks are developed for power estimation. Model-based simulations suggest that comparator offset correction resolution is the most critical ADC design parameter when an overall receiver performance is concerned. Further receiver design-space exploration reveals that a Pareto optimal frontier exists, which can be used as a guideline to set the initial receiver configurations depending on a given power and performance constraints.

Index Terms—Analog-to-digital converter (ADC)-based receiver, design-space exploration, high-level model, high-speed.

## I. Introduction

THE continued push for high-speed backplane communications has led to a variety of innovations in high-speed transceiver design. In recent years, designers have begun to consider high-speed backplane receivers that rely on high-speed front-end analog-to-digital converters (ADCs) followed by the digital equalizers (EQs) to exploit the benefits of sophisticated digital signal processing techniques [1]–[5]. These ADC-based receivers further confer the benefits of design portability and reuse, as digital EQs can flexibly reconfigure themselves to accommodate a variety of channel environments. However, the power and complexity of such receivers can be high and require thorough exploration of the design space to find the right balance between power and EQ performance. The primary challenge is that the design space of such receivers is quite large and sensitive to a variety of

Manuscript received February 20, 2013; revised June 13, 2013; accepted July 25, 2013. Date of publication August 15, 2013; date of current version June 23, 2014. This work was supported by the Mixed-Signal Communications IC Design Group at IBM T. J. Watson Research Center, Yorktown Heights, NY. USA.

H. Chung was with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 USA. He is now with Korea Advanced Institute of Science and Technology, Deajeon 305-701, South Korea (e-mail: hayun4@gmail.com).

G.-Y. Wei is with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138 USA (e-mail: guyeon@eecs.harvard.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2013.2275742



Fig. 1. Overall transceiver architecture.

design parameters, such as ADC and digital EQ resolutions and the number of EQ taps.

To explore a large design space in terms of tradeoffs between power and EQ performance, a high-level model of the receiver system that can estimate the power consumption and EQ performance is required. High-level performance models of high-speed link system exist [6]–[10]. However, to enable design space exploration, they require power models as well. Sredojevic *et al.* [11] proposed a high-speed link design space exploration framework based on power and performance high-level models of high-speed link system that relies on analog-type receive-side EQs. This paper presents a design-space framework of an ADC-based backplane receiver based on a set of accurate and parameterized power and performance models of the front-end ADCs and digital EQ blocks.

The proposed framework for design-space exploration consists of: 1) a simple-yet-accurate model of high-speed ADCs [12] and 2) a model of digital EQs. Note that once an incoming symbol stream has been sampled and digitized by frontend ADCs, digital EQ blocks are straightforward to model with discrete-time functional models. Hence, it is important to develop a detailed ADC model that carefully considers a variety of nonidealities such as clock jitter, bandwidth limitations, and so on (Section II-A). Upon thorough verification of this model with experimental measurements from a testchip prototype in [13], the remainder of the model focuses on the power and delay tradeoffs of the digital EQ blocks (Section II-B). To facilitate full exploration, we assume a wide range of flexibility to tune the resolution of computation; interleaving of computation to trade off speed versus power via parallel hardware; and optimal  $V_{dd}$  and device sizing ( $V_{th}$ is left constant) (Section IV). Although the analysis assumes a 65-nm CMOS process, the model can easily accommodate a range of process technologies.

## II. MODELING

Fig. 1 shows the overall backplane transceiver system architecture with an ADC-based receiver. Once the differential data stream passes through a band-limited backplane channel,

1063-8210 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 2. N-bit Flash ADC architecture based on [13].

an auto-gain controller (AGC) scales the channel output such that its voltage swing matches the ADC input voltage range. To be consistent with the verified model of the front-end ADC, the model consists of two-way interleaved 6.25-GS/s Flash ADCs, which enables a 12.5-Gb/s data communication. The digital EQ consists of a feed-forward equalizer (FFE) and a decision-feedback equalizer (DFE) to cancel out both pre and postcursor intersymbol interference (ISI) components.

Clock-and-data recovery (CDR) is another component of high-speed transceivers that significantly affects the EQ performance, by shaping clock jitter. As this paper focuses on investigating the ADC and digital EQ design tradeoffs, a detailed study of the CDR remains as a future work.

## A. Front-End ADC

Front-end ADCs sample and digitize an incoming symbol stream to generate digital codes that feed into the digital EQ. Thus, front-end ADCs require an accurate behavioral model, as any error that occurs during the sampling and digitizing process of the analog input signal can directly degrade EQ performance. To keep the model simple, the model only includes major nonidealities that significantly affect ADC and EQ performance. Because the ADC assumes a Flash structure, a simple power model scales power with respect to the number of comparators.

1) Behavioral Model: One of the most important components of the proposed backplane receiver model is an accurate behavioral model of the high-speed ADC—simple to minimize simulation time and parameterized to understand tradeoffs. In this paper, we assume a differential Flash ADC architecture (shown in Fig. 2) with a two-stage track-and-hold (T/H), based on a test-chip prototype [13]. A two-stage T/H design supports high input bandwidth by buffering large capacitive loading, imposed by multiple comparators, from the input stage. The ADC model further assumes an offset compensation in the comparator via a digital calibration loop that fine-tunes reference levels, also found in [13]. Hence, the proposed model consists of two major components: 1) the two-stage T/H and 2) multiple 1-bit comparators with an offset correction.

While several nonideal error sources have been found in CMOS T/H circuits [14], [15], for simplicity, the model only includes the dominant sources of error that noticeably affect ADC performance [12]—most of error sources are negligible as they can be masked by coarse-grained quantization and/or compensated by simple digital correction circuits. The model includes only three major error sources: 1) sampling clock jitter; 2) input-dependent sampling instant; and 3)  $R_{\rm ON}$  variations.



Fig. 3. Behavioral model of nonideal. (a) Two-stage T/H. (b) 1-bit comparator.

Fig. 3(a) shows the resulting nonideal model of the twostage T/H, developed in MATLAB. Sampling clock jitter, modeled as Gaussian noise, adds random time offsets to an ideal clock edge ( $\Phi_{samp}$ ) with parameterized rise time. To model the input-dependent sampling instant error, a voltagedependent MOSFET switch nonideality, the shifted clock edge is compared with an input signal to determine when  $V_{\rm GS} - V_{\rm TH} = (V_{\rm clk} - V_{in}) - V_{\rm TH} = 0$ , which corresponds to the actual sampling instant. To model  $R_{\rm ON}$  variation, another voltage-dependent MOSFET switch nonideality, an input-voltage-dependent RC low-pass filter models the  $R_{ON}$ variation before capturing the sampled voltage with an ideal switch. Because the second T/H stage tracks a settled output, it is much more immune to various circuit nonidealities. Thus, we model the second T/H stage as an ideal switch followed by a low-pass filter that models the bandwidth limitation due to capacitive loading imposed by the large number of comparators. To parameterize ADC resolution in the model, capacitive loading  $(C_L)$  on the second T/H stage scales with the number of comparators  $(N_{\text{comp}})$ .

The 1-bit comparators that follow the two-stage T/H also suffer from nonidealities, which require careful modeling. As shown in Fig. 3(b), the comparator model includes two major components of error: 1) residual offset and 2) metastability. While we assume offset compensation via digital calibration, finite resolution of offset correction resolution leads to residual comparator offset that cannot be ignored. This residual offset adds a uniformly distributed random voltage, with bounds defined by the limited resolution of offset correction, to the ideal reference voltage level. While an ideal comparator can resolve arbitrarily small input differences, noise governs how a real comparator resolves. Hence, the reference voltage includes an additional Gaussian-distributed random offset ( $\mu=0$ ,  $\sigma=0.1\%$  of ADC supply voltage).



Fig. 4. Front-end ADC behavioral model verification at 6 GS/s. (a) Nyquist-frequency (3.001 GHz) input. (b) Data-frequency (5.999 GHz) input.

To verify the proposed ADC behavioral model, Fig. 4 compares the simulated output FFT of the ADC model operating at 6 GS/s with both Nyquist- and data-frequency (assuming two-way time interleaved half-rate ADCs, this frequency is close to the sampling clock frequency) inputs to experimental measurements from the test-chip prototype in [13]. The overlays of FFT plots and SNDR numbers reveal good agreement between simulated and measured results, which verifies the model's ability to accurately emulate high-speed ADC behavior.

2) Power Model: As the front-end ADCs require very high input bandwidth and high sampling rate, they tend to suffer from high power consumption. Although the two-stage T/H structure can significantly reduce power consumption in the T/H circuitry [13], the switching power of the multiple comparators remains high. The ADC power is modeled as follows:

$$P_{\text{ADC}} = P_{T/H} + (P_{\text{comp}} \times N_{\text{comp}}) \tag{1}$$

where  $P_{T/H}$  is T/H power,  $P_{\rm comp}$  is power of a single comparator, and  $N_{\rm comp}$  is the number of comparators that scales exponentially with ADC bit resolution. To avoid bandwidth degradation due to capacitance from multiple comparators,  $P_{T/H}$  also scales with  $N_{\rm comp}$ .

## B. Digital EQs

Fig. 1 shows the EQ architecture, which consists of two blocks—FFE and DFE. When data rate is high, these digital EQ blocks can suffer from high hardware complexity and power consumption. Thus, they require carefully constructed models based on the detailed design choices to estimate power consumption. The proposed model enables a wide range of parameter sweeps, such as bit resolution, degree of interleaving, pipelining, and so on. For EQ performance estimation, functional models of both FFE and DFE blocks enable fast simulations.



Fig. 5. Functional model of FFE.



Fig. 6. Implementation block diagram of FFE.

1) Feed-Forward Equalizer: The FFE consists of a multitap finite impulse response (FIR) filter to cancel out the precursor ISI components. Fig. 5 shows the functional model of the FFE. As the FFE requires multibit multipliers and adders, it can suffer from high power consumption and complexity even when the FFE has small number of taps.

Fig. 6 shows an example of a one-tap FFE implementation with two-way time interleaving. Note that time interleaving and pipelining can be applied to overcome speed limitations with an extra hardware complexity and latency costs. Except for the final adder that converts the carry-save representation to two's complement, all multipliers and adders assume a carry-save adder (CSA) array structure to efficiently accumulate partial sums and to allow reconfigurations. The final adder uses a Sklansky [16] structure because it has been shown to have good energy-delay product characteristics and to be amenable to bit-level resolution tuning for power savings [17].

To estimate FFE power, let us start with a gate delay expression in 2, based on [18]

$$t_d = \frac{C_L V_{\text{dd}}}{I_d} = \frac{(C_{\text{wire}} + C_{\text{gate}}) V_{\text{dd}}}{\mu C_{\text{ox}} \frac{W}{I} (V_{\text{dd}} - V_{\text{th}})^{1.2}}$$
 (2)

where  $C_{\rm wire}$  and  $C_{\rm gate}$  are the parasitic wire and gate capacitance, respectively. For simplicity,  $C_{\rm wire}$  is assumed to be the same for all the gates and  $C_{\rm gate}$  is scaled proportionally to transistor size, which can be swept. Note that gate delay is a function of supply voltage  $V_{\rm dd}$ . Thus, when the delay requirement on the critical path is relaxed by pipelining and/or time-interleaving, one can lower  $V_{\rm dd}$  to an optimal supply voltage ( $V_{\rm dd\_opt}$ ) level, where the critical path delay minimally satisfies the delay requirement, to save power. Assuming each stage of the CSA array is pipelined and one-way time



Fig. 7. Power consumption of FFE across interleaving depth.

interleaving is allowed, the critical path delay is the delay of one CSA cell and  $V_{dd_opt}$  can be found using 3

$$\sum_{\text{CSA\_cell}} \left( \frac{(C_{\text{wire}} + C_{\text{gate}}) V_{\text{dd\_opt}}}{\mu C_{\text{ox}} \frac{W}{L} (V_{\text{dd\_opt}} - V_{\text{th}})^{1.2}} \right) = \frac{l}{f_{\text{data}}}$$
(3)

where  $f_{\text{data}}$  is a data rate. Once an optimal supply voltage is found, optimal power can be computed with (4)–(7)

$$E_{\rm dyn} = \sum \alpha C_L V_{\rm dd\_opt}^2 \tag{4}$$

$$P_{\rm dyn} = E_{\rm dyn} \times \left(\frac{f_{\rm data}}{l}\right) \tag{5}$$

$$P_{\text{leak}} = \sum_{l \text{leak}} V_{\text{dd\_opt}}$$

$$P_{\text{total}} = P_{\text{dyn}} \times (1 + (l - 1) \times 0.05) + P_{\text{leak}}$$
(6)
$$(7)$$

$$P_{\text{total}} = P_{\text{dyn}} \times (1 + (l - 1) \times 0.05) + P_{\text{leak}}$$
 (7)

where  $\alpha$  is the activity factor and  $I_{leak}$  is leakage current, which is proportional to gate width (W). To consider the impact of wiring complexity overhead, the model adds 5% overhead to FFE dynamic power for each additional level of interleaving (i.e.,  $P_{\rm dyn} \times ((l-1) \times 0.05)$ ). The model also includes the power due to extra flip-flops required for pipelining. Fig. 7 shows the power trend of a one-tap FFE, which consists of two CSA multipliers, one CSA and a Sklansky adder per parallel lane. First, the plot shows that FFE power scales linearly with the number of bits, because the number of CSA cells increases linearly with EQ resolution. Second, an optimal interleaving depth can be seen. As interleaving depth increases, critical path delay relaxes and  $V_{dd\_opt}$  decreases to reduce power consumption. However, once  $V_{\rm dd}$  opt reaches its lower limit, with further increases in interleaving—beyond eight—FFE power grows linearly because dynamic power penalty because of extra wiring increases, as does leakage power due to additional computational blocks. These added costs offset the benefits of further interleaving. Note that if  $V_{\rm dd}$  did not scale, FFE power would simply increase linearly with interleaving depth because of the 5% dynamic power penalty and increased leakage power.

2) Decision-Feedback Equalizer: While an FIR filter can also be used to cancel out the postcursor components, power and complexity overheads can increase significantly as the postcursor cancellation tends to require larger number of taps.



Fig. 8. Functional model of DFE.

Therefore, we assume a DFE structure for the postcursor cancellation. As shown in Fig. 8, the decision slice level of each symbol depends on a 1-bit decision history with its length set by the number of taps. Given the tight feedback loop inherent to DFE structures, loop unrolling alleviates computational speed limitations and complexity by precomputing different slicing levels and comparisons [5]. Fig. 9(a) shows a three-tap DFE implementation with loop unrolling. A set of multiplexers routes the appropriate decision to the output based on the three-level decision history.

A DFE with loop unrolling still suffers from a speed bottleneck due to the single cycle feedback loop required to select the final multiplexer output. To overcome this final speed bottleneck, extra combinational logic can extend the loop delay by one cycle, as shown in Fig. 9(b) [19]. Note that the combinational logic required for this delay extension resides outside of the feedback loop and can be pipelined. Finally, Fig. 9(c) shows the DFE structure assumed in the model for DFE power estimation. We assume a factor-of-two feedback loop delay extension and a two-way time interleaving to further reduce speed limitations. For a factor-of-two feedback loop delay extension, the required logic simply reduces to two 2:1 multiplexers.

For the slicer, a cascade of multiple 1-bit magnitude comparators forms a multibit magnitude comparator. Note that the size of a precomputation circuitry grows exponentially with the number of taps. Therefore, depending on the number of taps, DFE power can quickly grow to substantially high levels. Fig. 10 shows the DFE power trend in our model as a function of the number of DFE taps and EQ resolution. DFE power increases exponentially with respect to the number of taps and power scales linearly with EQ resolution as the number of magnitude comparator cells, which dominates the overall DFE power, scales linearly with EQ resolution. Note that each stage of cascaded magnitude comparators and 2:1 multiplexers can be pipelined to enable operation at the lower  $V_{dd_opt}$  levels and save power.

#### III. SIMULATION AND ANALYSIS FRAMEWORK

To explore the design space of an ADC-based backplane receiver, simulations of power and equalization quality estimations were performed based on the ADC, FFE, and DFE models described in the previous section.

The simulation flow is divided into two—mean square error (mse) estimation and power estimation. EQ mse acts as a measure of equalization quality to avoid long simulation times





Fig. 9. Implementation of a three-tap DFE with loop unrolling (a) with loop unrolling, (b) with loop unrolling and loop delay extension, and (c) with loop unrolling, loop delay extension and two-way time interleaving.

otherwise required for bit-error rate (BER) estimation. Using the Q-function formula, mse of 0.005/0.03 can be translated to BER of 1E-12/1E-3. mse and power estimation results are then combined to explore different tradeoffs in the overall design space. All the simulations are performed in MATLAB and low-level circuit simulations are avoided.

For the mse estimation, we assume a full transceiver that consists of a bandwidth-limited transmitter with 1 ps (rms)



Fig. 10. Power consumption of DFE across different number of taps.

clock jitter, a backplane channel, and the ADC-based receiver model developed in previous sections (i.e., behavioral model of front-end ADCs and functional model of digital EQ blocks). Note that to accurately model the front-end ADC behavior, we use fine time steps ( $t_{\rm step} = t_{\rm symbol}/1000$ ) for the continuous-time signals. Especially, for the continuous-time to discrete-time conversion (performed in the two-stage T/H), we use even finer time resolution ( $t_{\rm fine} = t_{\rm step}/100$ ) relying on shape-preserving cubic interpolation to further improve accuracy without significantly increasing simulation times. Once the digitized discrete-time ADC output is generated, digital EQ simulations run fast using the functional models of the FFE and DFE. For equalization tap coefficient updates, sign-error least mean square (LMS) and sign-sign LMS algorithms are used in the FFE and DFE, respectively.

For the power estimation, power consumption of each building block (i.e., front-end ADC, FFE, and DFE) is added up to estimate the total receiver power. The ADC power estimation relies on the measurement result of a test-chip prototype. For power estimation of digital EQ blocks, detailed power models based on practical implementation choices are used. Optimal  $V_{\rm dd}$  for the FFE and DFE blocks are found based on the critical path delay versus  $V_{\rm dd}$  relationship.

Device parameters from 65-nm CMOS PTM models [20] are used for the simulation of both performance and power estimation. For the channel environment, we use backplane channel models found in [21]. Fig. 11 shows the frequency and pulse responses of a short (1 in) and long (20 in) copper traces on FR4 with connectors provided by Intel and used in the design-space exploration.

## IV. EXPLORING BACKPLANE RECEIVER DESIGN SPACE

High-level simulations of mse and power estimation based on the ADC, FFE, and DFE models enable thorough exploration of the receiver design space that uses ADCs and digital EQs. The design space comprises sweeps of various design parameters associated with the ADC and digital EQ blocks.

# A. ADC Design Parameters

The experimentally verified behavioral model of front-end ADCs allows early stage investigations to establish targets



Fig. 11. Backplane channel models. (a) Frequency response. (b) Pulse response.



Fig. 12. Effect of ADC design parameters on ADC performance. (a) Sampling clock jitter and rise time. (b) Residual comparator offset.

for critical specifications that dictate design effort and govern ADC performance. For instance, design of PLLs and buffers affect clock jitter; buffer sizing affects clock edge rates and power; and minimum resolution for offset compensation affects complexity of the digital calibration circuitry. For this analysis, we assume a 4.5-bit ADC operating at 6.25 GS/s. Fig. 12 shows the simulated SNDR, with both Nyquist- and data-frequency inputs, across a range of values for clock jitter, clock rise time, and offset correction resolution with baseline values of 1 ps (rms), 15 ps, and 0.5 LSB, respectively. Fig. 12(a) shows that clock jitter has the most significant impact on ADC performance, and the effects of clock edge rates are small. Fig. 12(b) shows the average and peak-topeak SNDR for 100 sets of simulations with random uniformly distributed offsets. While ADC performance improves with a finer offset correction, the overall impact is small compared with the effects of clock jitter.

To study the impact of these ADC design parameters on the receiver performance, we apply the same parameter sweeps to the ADC-based receiver model. Two backplane channel models—short (1 in) and long (20 in)—are used for the analysis and to compensate for these channels, the simulation assumes a 4.5-bit ADC + 9-bit digital EQ (one-tap FFE + five-tap DFE) and a 5.5-bit ADC + 10-bit digital EQ (two-tap FFE + eight-tap DFE) structure for the short and long channels, respectively.



Fig. 13. Effect of ADC design parameters on receiver performance. (a) Sampling clock jitter and rise time. (b) Residual comparator offset.

TABLE I
SUMMARY OF RECEIVER PARAMETER SWEEPS USED IN
DESIGN-SPACE EXPLORATION

| Parameter              | Sweep range | Step    | Baseline (⊚) |
|------------------------|-------------|---------|--------------|
| ADC resolution         | 3.5–6 bit   | 0.5 bit | 5 bit        |
| EQ resolution          | 7–14 bit    | 1 bit   | 11 bit       |
| Number of FFE taps     | 0–1         | 1       | 1            |
| Number of DFE taps     | 5–8         | 1       | 6            |
| FFE interleaving depth | 2–20        | 2       | 8            |
| FFE transistor width   | 0.2–1 μm    | 0.2 μm  | 0.2 μm       |
| DFE transistor width   | 0.2–1 μm    | 0.2 μm  | 0.6 μm       |

Fig. 13 compares the effects of clock jitter, clock edge rise time, and resolution of offset error correction (i.e., residual error) on the receiver performance. These simulations again assume the same set of baseline values as before (identified by the big symbols) in the ADC-only analysis and show interesting trends. While clock jitter is clearly the dominant error source that degrades SNDR in the ADC, residual offsets in the ADC also play a significant role in affecting the resulting mse for the overall receiver. This is because ADC output nonlineariy due to residual offsets degrades the performance of the receiver-side EQs that rely on linear operations. Note that the mse for the short channel increases very sharply when coarse-grain offset correction is used as we assume a relatively low ADC resolution (i.e., 4.5 bits) for the short channel. On average, efforts to improve the offset correction by 0.25 LSB offer larger gains in receiver performance than it does to eliminate clock jitter. Extensive investigations of such relationships and tradeoffs allow the designer to efficiently allocate design effort during the early phase of design.

# B. Power-Performance Tradeoff: Sweeping Design Parameters

A thorough design-space exploration for ADC-based backplane receivers is performed based on the proposed ADC, FFE, and DFE models. Simulations for power and performance estimation were run across a wide range of parameter settings.



Fig. 14. ADC-based receiver design-space exploration result and effect of receiver design parameters. (a) Design-space exploration result. (b) ADC resolution. (c) EQ resolution. (d) Number of DFE taps.

Table I shows the summary of receiver parameter sweeps used in the design-space exploration. Only long (20 in) channel is used for this analysis.

Fig. 14(a) shows the design-space exploration result based on a wide range of parameter sweeps, where each point corresponds to a specific design point. Note that plot reveals a Pareto optimal frontier, which can be considered as an optimal design point. One can clearly trade mse performance for lower power consumption and vice versa. Thus, receiver designers can use this Pareto frontier as a guideline to initially set the receiver configurations depending on the given power and performance constraints. To further study the effects of different parameters, we chose one of the Pareto optimal points (marked as ③) and set it as the baseline for multiple parameter sweeps (this baseline setting is also shown in Table I).

Fig. 14(b) shows the effect of ADC resolution on the powermse relationship. As ADC quantization errors affect the digital EQ performance, mse increases sharply when ADC resolution is reduced, especially for coarse-grain ADCs. Fig. 14(c) shows the effect of EQ resolution. Both mse and receiver power show moderate sensitivity to EQ resolution as digital EQ power increases linearly with EQ resolution and fine resolutions (compared with ADC resolutions) are assumed for EQ blocks to guarantee convergence. Fig. 14(d) shows the effect of DFE taps. As DFE power increases exponentially with the number of DFE taps, due to the precomputation structure discussed in Section II-B2, the receiver power consumption is most sensitive to the number of DFE taps. However,



Fig. 15. Power breakdown of the baseline ADC-based receiver.

mse is relatively less sensitive to the number of DFE taps, which suggests one can trade a large amount of power savings for a small increase in mse (e.g., roughly 60% of power can be traded for 0.0025 mse by reducing the number of DFE taps from eight to six).

Fig. 15 shows the power breakdown of the baseline ADC-based receiver. While the FFE consumes lower power compared with other blocks, as only 1-tap is used, ADC and DFE evenly dissipates the rest of power.

C. Effect of Design Conditions: Channel, Data Rate, and Technology Node

ADC-based receivers with different design conditions (i.e., channel, technology node, and data rate) may reveal



Fig. 16. ADC-based receiver design-space exploration result and effect of receiver design parameters. (a) Long (20 in) channel, 65-nm CMOS, 12.5 Gb/s. (b) Short (1 in) channel, 65-nm CMOS, 12.5 Gb/s. (c) Long channel, 65-nm CMOS, 6.25 Gb/s. (d) Long channel, 22-nm CMOS, 12.5 Gb/s.

different power-performance tradeoff profiles. Fig. 16 shows the four design-space exploration results with different design conditions—Fig. 16(a) shows the results of a 12.5 Gb/s receiver in 65-nm CMOS for a long (20 in) channel discussed in the previous section, while Fig. 16(b)–(d) shows the results assuming a shorter (1 in) channel, a more advanced technology node (22-nm CMOS), and lower data rate (6.25 Gb/s), respectively.

Fig. 16(b) shows the design-space exploration results when a short (1 in) channel is applied. Since channel loss is lower, significant amount of digital EQ power can be saved by reducing the number of DFE taps. Thus, the impact of ADC resolution on the overall power consumption becomes higher as the portion of ADC power is larger than EQ power. More importantly, the power-performance plot suggests that coarse ADCs (e.g., 4-bit ADC) can be used while maintaining reasonable mse (e.g., 0.005) as the channel loss is lower.

Fig. 16(c) shows the design-space exploration results at 6.25 Gb/s. Operating at a lower data rate also reduces the required number of taps in digital EQs, significantly reducing the EQ power consumption. Thus, similarly to the case of the short channel, ADC resolution has a higher impact on the overall receiver power consumption. Since lower data rate reduces channel loss, the ADC-based receiver can achieve reasonable mse (0.005) with only 4-bit ADC, which further reduces overall power consumption.

Fig. 16(d) shows the design-space exploration results in a 22-nm CMOS process. Using an advanced fabrication,

technology scales both the ADC and digital EQ power consumption. Therefore, the design-space exploration result in 22-nm CMOS shown in Fig. 16(d) has power-performance profile similar to the result in 65-nm CMOS in Fig. 16(a). In 65-nm CMOS design, power consumption of the ADC-based receiver with reasonable mse (i.e., 0.005) is around 450 mW (translates to 36 pJ/bit), which is somewhat high. However, in 22-nm CMOS the overall power consumption reduces down to 120 mW (translates to 9.6 pJ/bit), which is comparable with that of the conventional analog-type EQs. Considering the analog circuit design overheads in advanced technologies (e.g., worsening on-die variation), the ADC-based receivers will benefit more in the future technology nodes.

#### V. CONCLUSION

This paper demonstrates a simulation framework for design-space exploration of ADC-based backplane receivers in terms of EQ performance and power. Detailed models of receiver components—front-end ADC, FFE, and DFE—are presented to estimate EQ performance and receiver power consumption. As front-end ADC nonidealities can affect the EQ performance, an accurate behavioral model of high-speed ADCs is presented and verified with test-chip prototype measurements. Although digital EQ blocks—FFE and DFE—are straightforward to model functionally, a close attention to modeling their power is required as they can suffer from high power consumption and complexity. Thus, parameterized power models for digital EQ blocks are presented based on the detailed

implementation choices, such as CSA multipliers, Sklansky adders, and cascaded magnitude comparators.

Analysis of front-end ADC design parameters reveals that although sampling clock jitter is the dominant error source, which degrades the ADC-only performance, residual offsets in the comparator affects the receiver performance the most. Simulation results from design-space exploration reveal that the Pareto optimal frontier exists, which can be used as a guideline to set the initial receiver configurations depending on a given power and performance constraints. Analysis on the effect of different design parameters show that the number of DFE taps affects receiver power the most and ADC resolution affects mse the most, while EQ resolution has moderate impact on both receiver power and mse. Design-space exploration results assuming a shorter channel or lower data rate show that the overall power consumption is more sensitive to ADC power consumption, as reductions in the number of EQ taps significantly lower digital EQ power consumption. In addition, design-space exploration result assuming a more advanced technology node reveals that power consumption of the ADCbased receivers will become comparable with that of the conventional analog-type receivers as power consumption of digital blocks scales well with process technology.

#### REFERENCES

- [1] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R. Williams, and T. Willwerth, "A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery," in *Proc. IEEE ISSCC*, Feb. 2007, pp. 436–591.
- [2] J. Cao, B. Zhang, U. Singh, D. Cui, A. Vasani, A. Garg, W. Zhang, N. Kocaman, P. Deyi, B. Raghavan, H. Pan, I. Fujimori, and A. Momtaz, "A 500 mW digitally calibrated AFE in 65 nm CMOS for 10 Gb/s serial links over backplane and multimode fiber," in *Proc. IEEE ISSCC*, Feb. 2009, pp. 370–371.
- [3] H. Yamaguchi, H. Tamura, Y. Doi, Y. Tomita, T. Hamada, M. Kibune, S. Ohmoto, K. Tateishi, O. Tyshchenko, A. Sheikholeslami, T. Higuchi, J. Ogawa, T. Saito, H. Ishida, and K. Gotoh, "A 5 Gb/s transceiver with an ADC-based feedforward CDR and CMA adaptive equalizer in 65 nm CMOS," in *Proc. IEEE ISSCC*, Feb. 2010, pp. 168–169.
- [4] E.-H. Chen, R. Yousry, and C.-K. Yang, "Power optimized ADC-based serial link receiver," *IEEE J. Solid-State Circuits*, vol. 47, no. 4, pp. 938–951, Apr. 2012.
- [5] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, H. Tamura, and M. Kibune, "A 5 Gb/s speculative DFE for 2× blind ADC-based receivers in 65-nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2010, pp. 69–70.
- [6] J. Kim, E.-H. Chen, J. Ren, B. S. Leibowitz, P. Satarzadeh, J. L. Zerbe, and C.-K. K. Yang, "Equalizer design and performance trade-offs in ADC-based serial links," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 9, pp. 2096–2107, Sep. 2011.
- [7] G. Balamurugan, B. Casper, J. Jaussi, M. Mansuri, F. O'Mahony, and J. Kennedy, "Modeling and analysis of high-speed I/O links," *IEEE Trans. Adv. Packag.*, vol. 32, no. 2, pp. 237–247, May 2009.
- [8] B. Casper, M. Haycock, and R. Mooney, "An accurate and efficient analysis method for multi-Gb/s chip-to-chip signaling schemes," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2002, pp. 54–57.
- [9] V. Stojanovic and M. Horowitz, "Modeling and analysis of high-speed links," in *Proc. IEEE CICC*, Sep. 2003, pp. 589–594.
- [10] P. Hanumolu, B. Casper, R. Mooney, G.-Y. Wei, and U.-K. Moon, "Analysis of PLL clock jitter in high-speed serial links," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 879–886, Nov. 2003.

- [11] R. Sredojevic and V. Stojanovic, "Optimization-based framework for simultaneous circuit-and-system design-space exploration: A high-speed link example," in *Proc. IEEE/ACM ICCAD*, Nov. 2008, pp. 314–321.
- [12] H. Chung and G.-Y. Wei, "Design-space exploration of backplane receivers with high-speed ADCs and digital equalization," in *Proc. IEEE CICC*, Sep. 2009, pp. 555–558.
- [13] H. Chung, A. Rylyakov, Z. T. Deniz, J. Bulzacchelli, G.-Y. Wei, and D. Friedman, "A 7.5-GS/s 3.8-ENOB 52-mW flash ADC with clock duty cycle control in 65 nm CMOS," in *Proc. IEEE Symp. VLSI Circuits*, Jun. 2009, pp. 268–269.
- [14] B. Razavi, Principles of Data Conversion System Design. New York, NY, USA: Wiley, 1995.
- [15] W. Kester, The Data Conversion Handbook. Burlington, MA, USA: Newnes, 2005.
- [16] J. Sklansky, "Conditional-sum addition logic," *IEEE Trans. Electron. Comput.*, vol. EC-9, no. 2, pp. 226–231, Jun. 1960.
- [17] D. Patil, O. Azizi, M. Horowitz, R. Ho, and R. Ananthraman, "Robust energy-efficient adder topologies," in *Proc. IEEE Symp. Comput. Arith*metic, Jun. 2007, pp. 16–28.
- [18] K. Chen, C. Hu, P. Fang, M. Lin, and D. Wollesen, "Predicting CMOS speed with gate oxide and voltage scaling and interconnect loading effects," *IEEE Trans. Electron Devices*, vol. 44, no. 11, pp. 1951–1957, Nov. 1997.
- [19] S. Kasturia and H. Winters, "Techniques for high-speed implementation of nonlinear cancellation," *IEEE J. Sel. Areas Commun.*, vol. 9, no. 5, pp. 711–717, Jun. 1991.
- [20] (2011, Jun.). Predictive Technology Model [Online]. http://www.eas.asu.edu/~ptm/
- [21] W. Peters. (2005, Jul.). IEEE P802.3ap Task Force Channel Model Material: Improved HVM ATCA Measurement Data (B1, B12, B20, M1, M20, T1, T12, T20) [Online]. Available: http://www.ieee802.org/3/ap/public/channel\_model



**Hayun Chung** (S'06–M'06) received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 2002 and 2004, respectively, and the Ph.D. degree in engineering sciences from Harvard University, Cambridge, MA, USA, in 2009.

She was with Keio University, Tokyo, Japan, from 2009 to 2011, as a Research Associate. She is currently with the Korea Advanced Institute of Science and Technology, Daejeon, Korea, as a Post-Doctoral Researcher. Her current research interests include

variability-aware re-configurable circuit designs, digitally assisted analog circuits, and inductive-coupling interfaces.



**Gu-Yeon Wei** (M'00) received the B.S., M.S., and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1994, 1997, and 2001, respectively.

He is a Gordon McKay Professor of electrical engineering and computer science with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA. His research efforts primarily focus on opportunities across these layers to develop energy-efficient solutions for a broad range of systems from flapping-wing microrobots

to large-scale servers. His current research interests include multiple layers of a computing system: mixed-signal integrated circuits, power electronics, computer architecture, and compilers for automatic code parallelization.