# Characterizing and Evaluating Voltage Noise in Multi-Core Near-Threshold Processors

Xuan Zhang, Tao Tong, Svilen Kanev, Sae Kyu Lee, Gu-Yeon Wei, David Brooks Harvard University, Cambridge, MA 02138

Abstract—Lowering the supply voltage to improve energy efficiency leads to higher load current and elevated supply sensitivity. In this paper, we provide the first quantitative analysis of voltage noise in multi-core near-threshold processors in a future 10nm technology across SPEC CPU2006 benchmarks. Our results reveal larger guardband requirement and significant energy efficiency loss due to power delivery nonidealities at near threshold, and highlight the importance of accurate voltage noise characterization for design exploration of energy-centric computing systems using near-threshold cores.

# I. INTRODUCTION

While transistor density keeps scaling, multi-core processor's performance no longer follows the exponential trajectory due to the power wall. To get more performance under a fixed power budget, designers must fundamentally improve the energy efficiency of computation. One effective way is lowering the supply voltage ( $V_{DD}$ ). Prior work [2] has shown the core voltage that optimizes the trade-offs between energy efficiency and performance is slightly above the transistor threshold voltage. While such an approach reduces single-thread performance, the system throughput can still be improved with more cores running parallel applications. Based on these findings, near-threshold computing (NTC) has been proposed as one potential solution to the impending "dark silicon" crisis.

Many design issues have to be addressed before we can fully unleash NTC's promised peformance and efficiency. Although the problem of aggravated process variation has caught most recent attention [10], [7], voltage noise is another challenge important to NTC that has yet to be thoroughly investigated. With a fixed power budget, lower  $V_{DD}$  leads to higher current and more pronouced leakage and delay sensitivity to voltage fluctuations. Voltage noise plays a vital role in determining the overall energy efficiency of the system, since the guardband required to accommodate worst case voltage droop may result in considerable energy overhead. Moreover, additional energy has to be consumed by the power delivery network to distribute a lower  $V_{DD}$ . These implications make the effect of voltage noise rather profound in NTC, as compared to super-threshold computing (STC).

In this paper, we present the first quantitative analysis of voltage noise in multi-core near-threshold processors. To accurately characterize voltage noise and evaluate the systemlevel efficiency of NTC, we developed VN-Scope, a systemlevel tool that simulates transient voltage fluctuations of the core over a wide range of supply voltages in response to its switching activity. With it, we are able to evaluate energy efficiency using long-running benchmarks (SPEC CPU2006) at voltages spanning NTC and STC regions. Our results provide useful insights for early-stage design exploration in the following aspects:



Fig. 1: Power delivery system in modern processors

- The required guardband for near-threshold processors can be as large as 36% of the supply voltage according to our voltage noise characterization based on worstcase voltage droop simulation.
- Our quantitative analysis reveals the drastic difference between energy efficiency under ideal and realistic power delivery assumptions (Figure 10). The efficiency loss caused by power delivery imperfections is close to 61% for near-threshold cores.
- We are able to sweep system-level parameters, such as: loadline impedance, extrinsic decoupling capacitance, number of C4 bumps, and estimate their impact on end design objectives early in the design process.

# II. BACKGROUND

The power delivery system in modern processors consists of connections and voltage regulator modules (VRMs) [5] at the board level, as illustrated by Figure 1. Ideally, the impedance of the power delivery network (PDN) should be low and flat across all frequencies to minimize IR drop and voltage droop. However, due to physical limitations, the target PDN impedance has stagnated around  $1m\Omega$  according to the ITRS roadmap. The dominant impedance peak several times higher than  $1m\Omega$  exists at a middle frequency around 100MHz.

Voltage noise results from the non-ideal power delivery system and the fluctuation of power consumption under varying workload. Noise can be separated into static IR-drop and dynamic Ldi/dt-drop (inductive noise). The former is the static voltage drop due to the resistance of the interconnects and is proportional to the DC impedance; the latter is caused by the inductance and the capacitance in the PDN and represents the transients of voltage noise when current load changes. One special case of inductive noise happens when the load current fluctuates at the resonance frequency of the PDN, exciting its peak impedance. Generally speaking, resonance is the cause for worst-case voltage noise in modern processors [9].

To avoid functional errors caused by voltage fluctuation, extra voltage margin has to be allocated based on the worst case voltage droop. This can hurt energy efficiency, because it increases the supply voltage without getting any return in performance. If we define energy per cycle  $(E_{cyc})$  as:

$$E_{cyc} = I_{lkg} (V_{DD}^0 + \Delta V) T_{clk} + \alpha C_{eff} (V_{DD}^0 + \Delta V)^2 \quad (1)$$

# 978-1-4799-1235-3/13/\$31.00 ©2013 IEEE

82

Symposium on Low Power Electronics and Design



Fig. 2: Normalized  $E_{cyc}$  as a function of voltage margin VRM Motherboard Socket&Package C4 Bumps



Fig. 3: Equivalent circuit of the power delivery network

where  $I_{lkg}$  is the leakage current,  $V_{DD}^0$  is the nominal core voltage with no margin,  $\Delta V$  is the voltage margin,  $T_{clk}$  is the clock cycle,  $\alpha$  stands for activity factor, and  $C_{eff}$  is the equivalent capacitance representing dynamic power. Both the leakage and the dynamic part of  $E_{cyc}$  increase with  $\Delta V$ .

NTC can exacerbate this efficiency loss. Figure 2 plots the change of energy per cycle normalized by  $E_{cyc}$  at  $\Delta V = 0$  as a function of voltage margin. An activity factor of 0.2 is chosen to represent the typical activity of a processor core. Compared to STC, the NTC curve has a steeper response slope, because  $\Delta V$ 's impact is more pronounced at lower  $V_{DD}^0$ . Clearly, accurate evaluation of energy efficiency in NTC requires accounting for adequate voltage margin.

#### III. MODELING METHODOLOGY

To simulate voltage noise over a wide supply range and thus determine the necessary margin, we developed a tool called VN-Scope that captures the voltage dependency in leakage and dynamic power. Our goal is to achieve fast voltage transient simulation using the PDN and the core models described below to handle simulations of benchmark traces on the order of billions of clock cycles for early-stage design evaluation. It is validated against SPICE simulation using a 10nm predictive technology model (PTM) [11].

# A. Power Delivery Network Modeling

The power delivery network is simplified into the equivalent RLC network in Figure 3. The values of the resistances, inductances, and capacitance in the off-chip model are extracted from regulator design guidelines [5] of the stateof-art board and package for high-performance server-class processors. The parameters used for C4 bumps are obtained from recent literature [13]. We summarize the RLC values in our PDN model in Table I.

# B. Core Modeling

The circuit model we used for the core captures the scaling of leakage and dynamic power from STC to NTC.

TABLE I: RLC parameters in the power delivery network

| $R_{B,P}$      | $0.3m\Omega$  | $L_{B,P}$ | 40pH  | $C_{B,P}$ | $1256\mu F$ |
|----------------|---------------|-----------|-------|-----------|-------------|
| $R_{B,S1(S2)}$ | $0.1m\Omega$  | $L_{B,S}$ | 45pH  |           |             |
| $R_{cav}$      | $0.15m\Omega$ | Lcav      | 20pH  | $C_{cav}$ | $1222\mu F$ |
| $R_{P,S}$      | $0.2m\Omega$  | $L_{P,S}$ | 6pH   |           |             |
| $R_{P,P}$      | $0.54m\Omega$ | $L_{P,P}$ | 2.5pH | $C_{P,P}$ | $120\mu F$  |
| Rhumn          | $10m\Omega$   | Lhumn     | 50pH  |           |             |



Fig. 4: Core circuit model

Fig. 5: Normalized leakage over  $V_{core}$  in 45nm CMOS

As illustrated in Figure 4, the non-switching part of the core is modeled by a variable resistance  $R_{lkg}$  to account for the leakage and a fixed capacitor  $C_{d,int}$  to represent the intrinsic decoupling effect, with  $C_{d,ext}$  being the extrinsic decoupling capacitor. We model the dynamic power by a variable resistor whose value is proportional to  $\frac{1}{\alpha C_{eff}F_{clk}}$ . In this way, different workloads can be represented by activity ( $\alpha$ ) traces.

To extract the values of  $R_{lkg}$  and  $C_{d,int}$ , we characterized the DC and AC behavior of an array of interver pairs as shown in the shaded view in Figure 4. Instead of using arrays of mixed logic gates to represent the leakage current and the intrinsic capacitance of the core [12], we find inverter pairs are sufficient to reflect how leakage current scales with the core voltage. This is verified by the SPICE simulation of leakage currents in different digital blocks normalized by their leakages at  $V_{core} =$ 1V shown in Figure 5. Since all these diverse digital blocks share similar leakage scaling trend, extracting the parameters from an array of inverter pairs adequately captures the voltage dependency of the leakage current. Although the simulations are from 45nm CMOS, we believe this trend can extend to future technology nodes as well.



Fig. 6: Frequency response of  $V_{core}$  to activity stimulus



Fig. 7: Transient waveforms of voltage noise for validation

# C. Fast Transient Simulation

To achieve fast simulation for long-running benchmark traces, we resort to a convolution-based algorithm, but instead of using the impedance of the PDN as the transfer function [3], the transfer function  $H_{\alpha}(s)$  in VN-Scope is defined as the ratio between the output core voltage  $(V_{core})$  given the input activity stimulus A(s) in the frequency domain:  $H_{\alpha}(s) = \frac{V_{core}(s)}{A(s)}$ . Our activity-based transfer function takes into account the voltage dependency of the dynamic power and can generate accurate transient voltage waveforms without sacrificing the simulation speed.

The frequency response of  $H_{\alpha}(s)$  for the STC and NTC core configurations in Table IV is obtained by AC analysis and is presented in Figure 6. While the mid-frequency peak response persists for both cases, NTC exhibits a lower peak magnitude shifted to a lower frequency due to its increased amount of intrinsic decoupling capacitance from more cores. Since the low frequency response corresponds to the magnitude of the static IR-drop, Figure 6 also shows that the IR-drop gets worse in NTC because of larger current load.

# D. Model Validation

Results from VN-Scope using the extracted  $R_{lkg}$  and  $C_{d,int}$  values are validated against Cadence transient simulation that takes the transistor-level netlist of the inverter arrays as shown in Figure 4. Since our focuses are high-performance NTC



Fig. 8: Trasient waveforms of sample trace from 481.wrf.wrf

TABLE II: Worst Voltage Droop Simulation Comparison

|          | resonate | pls@200 | pls@500 | mcf ph2 | gcc ph4 | avg SPEC |
|----------|----------|---------|---------|---------|---------|----------|
| Cadence  | 198mV    | 204mV   | 201mV   | 104mV   | 113mV   | 108mV    |
| VN-Scope | 199mV    | 209mV   | 205mV   | 105mV   | 113mV   | 108mV    |
| Error    | 0.7%     | 2.3%    | 1.7%    | 0.6%    | 0.1%    | 0.3%     |
|          |          |         |         |         |         |          |

multi-core processors with power consumption of tens of watts that would require prohibitively large die area, it is simply impractical to consider in current technology node. We choose to model the system in 10nm using the PTM-MG model [11]. To capture the worst case voltage fluctuation in the system, we constructed synthetic activity traces at the resonance frequency of various waveforms (e.g., sine, square, and triangle), as well as periodic pulse and step traces. Examples of the synthetic transient waveforms are presented in Figure 7. In addition to validating against Cadence simulations, we also compared our model with the conventional linearized RLC model [3], [8]. In the zoom-in boxes of Figure 7a and Figure 7b, the core voltage waveforms calculated by our model more closely track the Cadence waveforms, while the linearized model exaggerates the magnitude of the voltage fluctuation.

Sample traces from SPEC CPU2006 benchmarks are also used for validation. For example, Figure 8 presents the waveforms of a trace slice from phase 2 of 481.wrf.wrf. Sample slices are taken from every representative phase of SPEC CPU2006 benchmarks, and each slice consists of an activity trace of 5M clock cycles. We summarize the validation results in Table II and Table III. Table II shows that the worst voltage droop calculated by VN-Scope (VNS) is within 2.5% accuracy of Cadence simulation.

We validated the leakage and dynamic power, as well as the total power consumed by the processor core and by the off-chip power source. Again, VN-Scope is able to accurately estimate these power consumption within 1.5%, and the simulation speed-up of VN-Scope compare to Cadence is more than 180x. It is worth noting that for sample benchmark traces, the average leakage power is three times the average dynamic power in NTC cores. Although only NTC results are shown here, similar validation has been performed at various supply voltages with the same level of accuracy and speed.

# **IV. SIMULATION FRAMEWORK**

This section describes the simulation framework employed in our study to evaluate voltage noise and energy efficiency

TABLE III: Power Calculation Comparison



Fig. 9: Simulation Infrastructure

of multi-core processors at different supply voltage configurations, while keeping the thermal design power (TDP) of the processor constant.

#### A. Simulation Infrastructure

Figure 9 illustrates the simulation infrastructure. The central VN-Scope block consists of modules to model transient behavior of key components, such as processor core and power delivery network. Basic technology parameters (device and wire models, the number of metal layers, C4 bump size and pitch) are used to configure the core and grid models, while the PDN model requires input from the off-chip design parameters at the package and board level.

The input to VN-Scope is the activity trace  $\alpha(t_n)$  generated by an architecture simulator. It is used to calculate the core voltage trace  $V_{core}(t_n)$ , as well as the leakage and dynamic current traces  $(I_{lkg}(t_n) \text{ and } I_{dyn}(t_n))$  and the source current trace  $I_{src}(t_n)$ . Core and source power consumption obtained from these transient traces, combined with performance data from the simulator, allows us to evaluate the energy efficiency of the processor. Given VN-Scope's fast simulation speed, it is possible to interact with the simulator by feeding back the instantaneous  $V_{core}(t_n)$ , so that architecture-level techniques can be applied to mitigate noise.

### B. System Configuration

Our goal is to evaluate the performance and efficiency of a many-core processor design, optimized for energy efficiency, as opposed to single-thread performance. Such designs have been getting traction for servers built out of low-power simple cores [1], especially for workloads where ample thread- and request-level parallelism is available.

A few assumptions are made to derive the system parameters in our simulation. All the systems are configured assuming a constant TDP of 80W and 10nm technology. The total TDP is split into 75% dynamic power and 25% leakage power during STC operation, and the dynamic power budget is further divided into 25% core power and 25% network-on-chip (NoC) power [4]. Considering the I/O circuits usually

TABLE IV: System Configurations (TDP=80W)

|                       | STC      | UTC      | MTC         | LTC          | NTC         |
|-----------------------|----------|----------|-------------|--------------|-------------|
| #. cores              | 25       | 50       | 100         | 200          | 300         |
| min V <sub>core</sub> | 765mV    | 600mV    | 480mV       | 400mV        | 360mV       |
| $V_{src}$             | 975mV    | 820mV    | 711mV       | 625mV        | 565mV       |
| Vmargin               | 210mV    | 220mV    | 231mV       | 225mV        | 205mV       |
| max $\bar{F}_{clk}$   | 2.8GHz   | 2.2GHz   | 1.6GHz      | 1.0GHz       | 750MHz      |
| die area              | $42mm^2$ | $84mm^2$ | $164mm^{2}$ | $326 mm^{2}$ | $488mm^{2}$ |
| I/O C4                | 246      | 347      | 472         | 572          | 636         |
| power C4              | 279      | 656      | 1497        | 3336         | 5218        |

do not share the same voltage domain as the core logics, we did not account for the I/O power in the TDP budget. We also limit our simulation to multi-core systems with multi-program workload, so that the aggregated throughput scales linearly with the number of cores. To simulate a multi-program workload with the single-core power trace of an x86 Atom processor generated by XIOSim [6], we construct the multi-core traces by assuming each core starts its own workload at random time intervals modeled by a Poisson distribution. We use the L2 misses calculated by XIOSim as a proxy to represent the intensity of network traffic, because for multi-core system with private caches running multi-program workload, network traffic largely originates from last-level cache misses.

The clock frequency, die area, and the number of cores of the STC system configuration in Table IV is derived from an Atom core in 45nm technology, which operates at 1.6GHz, consumes 4W TDP, and occupies  $52mm^2$ . Since 1.6GHzis equivalent to 49 fanout-of-4 (FO4) delay in 45nm, we assume our multi-core system in 10nm has similar FO4 delay based clock frequency  $(F_{clk})$  of 2.8GHz. The scaling factor embedded in the PTM model suggests a 4W TDP core in 45nm will convert to 3.2W in 10nm, therefore our STC system consists of 25 Atom-like cores to reach a total TDP of 80W with a total die area of  $42mm^2$ . The total number of C4 bumps is calculated based on the typical bump pitch of  $284\mu m$  [13]. To allocate I/O bumps, we assume 2 DDR3 memory controllers are needed to meet the throughput of 25 cores at 2.8GHz, and each DDR requires 80 I/O pads, with a 10% overhead that scales with throughput. Additional 70 pads are allocated for I/O functions that do not scale with the number of cores. All the remaining C4 bumps are dedicated to power. For STC, the minimum core voltage (765mV) is the nominal supply voltage for 10nm transistors.

VN-Scope takes the fixed STC configuration derived above and scales to other supply voltages with the same TDP budget. Take the NTC configuration as an example. First, the minimum  $V_{core}$  is selected at 360mV to meet the minimum operating voltage of a 8T-SRAM cell in 10nm technology, which in turn sets the maximum  $F_{clk}$  at 750MHz. With a lower  $V_{core}$  and a lower  $F_{clk}$ , each NTC core only consumes 1/12 STC core power, hence the NTC processor consists of a total of 300 cores and occupies  $488mm^2$  die area. The number of I/O bumps are assumed to scale linearly with the increase in throughput.

We determine voltage margins by an iterative process. Synthetic stressmarks are constructed to emulate the worst case voltage droop simulated using VN-Scope. The off-chip source voltages  $V_{src}$  are chosen to ensure that the worst voltage droops from executing all stressmarks meets the minimum  $V_{core}$  within 3mV. To characterize noise at different supply



Fig. 10: Energy efficiency measured in MIPS/Watt in STC and NTC with and without voltage margin

voltages, we also choose three intermediate voltages (UTC, MTC, LTC) between STC and NTC, and derive their corresponding configurations following the same process. The system configurations are summarized in . It is worth noting that even though the absolute value of the voltage margin  $V_{margin}$  remains relatively constant as the supply lowers, the cause of the worst case droop changes from resonance noise in STC to Ldi/dt noise induced by abrupt current step in NTC. Nonetheless, a 205mV margin results in a larger percentage guardband (36%) at NTC.

## V. RESULTS EVALUATION

In this section, we present the simulation results for the systems described in Table IV. We are able to quantitatively evaluate the impact of voltage noise on energy efficiency as supply voltage scales, as well as to estimate its sensitivity to various system-level parameters.

# A. Energy Efficiency

As discussed in Section II, additional voltage margins increase the energy spent per cycle and thus degrade energy efficiency. This is verified by our simulation results from SPEC CPU2006 benchmarks in Figure 10. The maximum energy efficiency assuming ideal power delivery with no voltage margin is represented by the light blue (STC) and light red (NTC) bars measured in MIPS/Watt. Under such ideal conditions, the energy efficiency of NTC outperforms that of STC by  $2.24 \times$ .

However, when the same evaluation is performed with a non-ideal power delivery network accounting for extra voltage guardband, energy efficiency is penalized by 52% for STC and 61% for NTC. Since near-threshold operation suffers more than its super-threshold counterpart, its energy advantage is reduced to  $1.83 \times$ .

# B. Sources of Energy Loss

Since both no-margin and with-margin configurations have the same operating frequency, the discrepancy in their energy efficiency can be entirely attributed to the higher power consumption of the non-ideal system, which can be broken down into four parts:





- Computation power: the power consumed by the ideal core with zero voltage margin.
- Voltage margin overhead: additional power burned by the core voltage to account for the worst-case droop.
- Power delivery loss: some amount of the power is lost in the delivery network mostly due to IR-drop.
- Off-chip regulator inefficiency: the off-chip regulator has efficiency loss, which is often a function of the output voltage. Lower supply voltage as demanded by NTC leads to higher loss in the off-chip regulator.

Our power consumption breakdown analysis shown in Figure 11 reveals that voltage margin overhead is by far the dominant source of energy loss, accounting for 38% of the total power in STC and 31% in NTC. As we move to a lower supply voltage in NTC, both the IR-drop-induced power delivery loss and the off-chip regulator loss increase. The percentage of computation power drops from 48% in STC to 39% in NTC, which explains the more pronounced negative impact of voltage noise on NTC. Despite their performance differences, the SPEC benchmarks show only a small variance in their power consumption break-down represented by the error bars in Figure 11, because despite transient differences, averaged activities of the benchmarks are more similar over the entire execution time.

We can repeat the same analysis for multiple voltage configurations. Figure 12 shows the change of energy efficiency and power break-downs for several supply voltages under the same TDP budget. Although NTC achieves best efficiency among these configurations, LTC comes as a close second with 67% die area. Evaluations of such trade-offs between efficiency







Fig. 14: Energy efficiency changes with decoupling cap

and system cost are important in real-world implementation and deployment of NTC.

# C. Sensitivity to Physical Parameters

Given the considerable energy loss due to non-ideal power delivery, reducing the parasitic resistance and inductance of PDNs (by better board or package design) can improve efficiency. Figure 13 shows that the computation proportion increases slightly with lower impedance. This suggests that reducing the PDN impedance uniformly by 25% is unlikely to solve the voltage noise problem. A similar conclusion can be made for C4 bump parameters, as the impedance from power bumps is only a fraction of the overall PDN impedance. This insensitivity to DC resistance stems from Ldi/dt noise's dominance in determining the worst voltage droop. One way to reduce Ldi/dt noise is to lower the PDN peak impedance by adding extrinsic on-chip decoupling capacitance. To investigate its potential improvement, we sweep the decoupling capacitor from 100nF to  $3.2\mu F$  for STC and NTC cores. Given a gate capacitance density of  $160nF/mm^2$  in 10nm technology, the trade-off involved in sizing decoupling capacitors is between voltage margin and die area, as summarized in Table V. A bigger decap benefits STC more, because its  $C_{d,int}$  is much smaller to begin with. Figure 14 shows 50% energy efficiency improvement with  $20mm^2$  additional die area for STC cores.

TABLE V: Margin and Die Area with Different  $C_{d,ext}$ 

| $C_{d,ext}(nF)$                   | 100  | 200  | 400  | 800  | 1600 | 3200 |
|-----------------------------------|------|------|------|------|------|------|
| <b>U</b> margin (mV)              | 242  | 234  | 210  | 169  | 133  | 104  |
| $\overline{\Sigma}$ area $(mm^2)$ | 41.2 | 41.8 | 43.1 | 45.6 | 50.6 | 60.6 |
| 2 margin (mV)                     | 205  | 204  | 198  | 190  | 183  | 175  |
| $\mathbf{Z}$ area $(mm^2)$        | 488  | 489  | 490  | 492  | 497  | 507  |

To summarize, the voltage margin dictated by worst case voltage droop presents a huge energy penalty. Physical design alone cannot solve the efficiency loss caused by voltage noise, which calls for a new design perspective (e.g. design for the average case with resilience to tolerate or recover from worst case). The simulation infrastructure we developed for systemlevel noise characterization and evaluation can be used to further identify these opportunities.

### VI. CONCLUSION

Voltage noise can significantly limit the improvement of energy efficiency in modern processors, as we move to lower supply voltages, such as near-threshold computing. Using a compact voltage noise tool like VN-Scope, we are able to accurately characterize voltage noise over a wide range of supply voltages spanning NTC and STC for multi-core processors running full-scale benchmark workloads in 10nm technology. In addition to evaluating the impact of voltage noise on energy efficiency and identifying the dominant loss mechanism involved, we also perform sensitivity study on various physical design parameters and workload combination that helps to reveal future architecture-level opportunities to mitigate and eliminate the worst case voltage fluctuations.

#### ACKNOWLEDGMENT

The material reported in this paper is based partly upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-13-C-0022.

#### REFERENCES

- [1] D. G. Andersen et al. FAWN: a Fast Array of Wimpy Nodes. In *SOSP*, 2009.
- [2] R. G. Dreslinski et al. Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits. *Proceedings of the IEEE*, 2010.
- [3] M. S. Gupta et al. DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors. In *HPCA*, 2008.
- [4] Y. Hoskote et al. A 5-GHz Mesh Interconnect for a Teraflops Processor. *Micro*, *IEEE*, 2007.
- [5] Intel Corp. Voltage Regulator Module and Enterprise Voltage Regulator-Down. 11.1 edition, 2009.
- [6] S. Kanev et al. XIOSim: Power-Performance Modeling of Mobile x86 Cores. In *ISLPED*, 2012.
- [7] U. R. Karpuzcu et al. Varius-ntv: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages. In DSN, 2012.
- [8] M. Ketkar and E. Chiprout. A Microarchitecture-Based Framework for pre- and post-Silicon Power Delivery Analysis. In *MICRO*, 2009.
- [9] Y. Kim and L. John. Automated di/dt Stressmark Generation for Microprocessor Power Delivery Networks. In *ISLPED*, 2011.
- [10] S. Seo, R. G. Dreslinski, M. Woh, Y. Park, C. Charkrabari, S. Mahlke, D. Blaauw, and T. Mudge. Process variation in near-threshold wide simd architectures. In *DAC*, 2012.
- [11] S. Sinha et al. Exploring sub-20nm FinFET Design with Predictive Technology Models. In *DAC*, 2012.
- [12] A. A. Sinkar, H. Wang, and N. S. Kim. Workload-aware voltage regulator optimization for power efficient multi-core processors. In *DATE*, 2012.
- [13] R. Zhang et al. Some limits of power delivery in the multicore era. In *WEED-12 at ISCA-39*, 2012.