Publications by Type: Journal Article

2007
Xiaoyao Liang, Ramon Canal, Gu Wei, and David Brooks. 12/2007. “Process Variation Tolerant Register Files Based On Dynamic Memories.” Workshop on Architectural Support for Gigascale Integration, held with Int’l Symposium on Computer Architecture (ISCA-34). Publisher's VersionAbstract
Transistor gate length and threshold voltage variability due to process variations will greatly impact the stability, leakage power, and performance of future microprocessors. These variations are especially detrimental to continued scaling of 6T SRAM (6-transistor static memory) structures. This paper proposes replacing traditional SRAM-based cells in mutliported register files with cells based on 3T1D DRAM (3-transistor, 1diode dynamic memory) cells, which can absorb the effects of device physical variations into a single parameter– the data retention time. By leveraging the transient data in the processor and dependency slack in the pipeline, retention time variation can be hidden into the existing processor architecture. Thus the proposed register file can effectively tolerate very large process variation with little or even no impact on performance, addresses stability concerns, and reduces power consumption, when compared with ideal SRAM-based designs. Detailed circuit and architectural simulations and analysis verify a 1% normalized performance loss even under very large process variations, and 22% average power savings.
Process Variation Tolerant Register Files Based On Dynamic Memories
David Brooks, Robert Dick, Russ Joseph, and Li Shang. 5/2007. “Power, thermal, and reliability modeling in nanometer-scale microprocessors.” Micro, IEEE, 27, 3, Pp. 49–62. Publisher's VersionAbstract
System integration and performance requirements are dramatically increasing the power consumptions and power densities of high-performance microprocessors. High power consumption introduces challenges to various aspects of microprocessor and computer system design. It increases the cost of cooling and packaging design, reduces system reliability, complicates power supply circuitry design, and reduces battery life. Researchers have recently dedicated intensive effort to power-related design problems. Modeling is the essential first step toward design optimization. In this article, the power, thermal and reliability modeling problems are explained and recent advances in their accurate and efficient analysis are surveyed.
Power, thermal, and reliability modeling in nanometer-scale microprocessors
Benjamin Lee and David Brooks. 5/2007. “Spatial Sampling and Regression Strategies.” Micro, IEEE, 27, 3, Pp. 74–93. Publisher's VersionAbstract
This new simulation paradigm for microarchitectural design evaluation and optimization counters growing simulation costs stemming from the exponentially increasing size of design spaces. the authors demonstrate how to obtain a more comprehensive understanding of the design space by selectively simulating a modest number of designs from that space and then more effectively leveraging the simulation data using techniques in statistical inference.
Spatial Sampling and Regression Strategies
Wonyoung Kim, Meeta Gupta, Gu-Wei, and David Brooks. 2007. “Enabling on-chip switching regulators for multi-core processors using current staggering.” Proceedings of the Work. on Architectural Support for Gigascale Integration.Abstract

Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a wellknown technique to reduce energy in portable systems, but DVFS effectiveness suffers from the fact that voltage transitions occur on the order of tens of microseconds. Voltage regulators that are integrated on the same chip as the microprocessor core provide the benefit of both nanosecond-scale voltage switching and improved power delivery. However, the implementation of on-chip regulators presents many challenges including regulator efficiency and output voltage transient characteristics. In this paper, we discuss architectural support for on-chip regulator designs. Specifically, we show that in a chip-multiprocessor system, current staggering can be employed by restricting the simultaneous enabling/disabling of cores due to clock gating. We discuss tradeoffs between current staggering and regulator circuit design parameters, and we show that regulation efficiency of greater than 80% is possible for a variety of multi-threaded applications.

Benjamin Lee and David Brooks. 2007. “Statistical inference for efficient microarchitectural analysis.” SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, Pp. 130–es. Publisher's VersionAbstract

Microarchitectural design exploration is often inefficient and ad hoc due to computational costs of simulators. Trends toward multi-core, multi-threading lead to diversity in viable core designs, thereby requiring comprehensive design exploration while exponentially increasing design space size. Similarly, application performance topology is a function of input parameters, but models to optimize performance and/or predict scalability are increasingly difficult to derive analytically due to system complexity. We collect measurements sampled sparsely, uniformly at random from the space of interest and formulate non-linear regression models. We demonstrate the broad effectiveness of regression for predicting (1) the power and performance of a microarchitectural design space with median error rates of 5.5 to 7.5 percent using 1K samples from a 1B point space and (2) the performance of parallel applications, Semicoarsening Multigrid and High-Performance Linpack, with median error rates of 2.5 to 5.0 percent using 500 samples from more than 3K points.

Statistical inference for efficient microarchitectural analysis
2006
Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, and Kevin Skadron. 12/22/2006. “Impact of thermal constraints on multi-core architectures.” 10th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronics Systems, San Diego. Publisher's VersionAbstract
This paper shows how thermal constraints affect the multidimensional design space for chip multiprocessors, considering the inter-related variables of CPU count, pipeline depth, superscalar width, L2 cache size, and operating voltage and frequency. The results show the importance of thermal modeling and the need for new thermal modeling capabilities and hence the need for collaboration between the thermal engineeringand computerarchitecturecommunities. Thermalconstraints both shift the optimal intra- and inter-core organization, and dominate other physical constraints such as pinbandwidth and power delivery. Different thermal constraints also require different optimization strategies. For aggressive cooling solutions, reducing power density is at least as important as reducing total power, while for low-cost cooling solutions, reducing total power is more important.
Impact of thermal constraints on multi-core architectures
BC Lee and David Brooks. 10/20/2006. “Wild and Crazy Ideas Session-Session 5-Estimation and Prediction of Power and Performance-Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction.” SIGOPS Operating Systems Review, 40, 5, Pp. 185–194. Publisher's VersionAbstract

We propose regression modeling as an efficient approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This paper addresses fundamental challenges in microarchitectural simulation cost by reducing the number of required simulations and using simulated results more effectively via statistical modeling and inference.Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving median error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) application-specific models to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing sensitivity to the number of samples simulated for model formulation, we find fewer than 4,000 samples from a design space of approximately 22 billion points are sufficient. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

Wild and Crazy Ideas Session-Session 5-Estimation and Prediction of Power and Performance-Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction
Benjamin Lee, David Brooks, Bronis Supinski, and Martin Schulz. 9/29/2006. “Regression Modeling Strategies for Parameter Space Exploration”.Abstract
Increasing system and algorithmic complexity, combined with a growing number of tuanble application parameters, pose significant challenges for analytical performance modeling. This report outlines a series of robust techniques that enable efficient parameter space exploration based on empirical statistical modeling. In particular, this report applies statistical techniques such as clustering, association, correlation analyses to understand the parameter space better. Results from these statistical techniques guide the construction of piecewise polynomial regression models. Residual and significance tests ensure the resulting model is unbiased and efficient We demonstrate these techniques in R, a statistical computing environment, for predicting the performance of semicoarsening multigrid. 50 and 75 percent of predictions achieve error rates of 5.5 and 10.0 percent or less, respectively.
Regression Modeling Strategies for Parameter Space Exploration
B Lee and David Brooks. 6/18/2006. “Statistically rigorous regression modeling for the microprocessor design space.” ISCA-33: Workshop on Modeling, Benchmarking, and Simulation.Abstract
Regression models enhance existing techniques in detailed microarchitectural simulation by reducing the number of required simulations and using simulation data more efficiently to identify trends and trade-offs. We present a rigorous derivation of such models for microprocessor performanceandpowerprediction, emphasizing the need to apply domain-specific knowledge when performing statistical inference. In particular, we propose sampling observations uniformly at random from a large design space, discuss approaches for identifying statistically significant predictors, and detail strategies for effectively modeling predictor interaction and non-linearity. The resulting models enable computationally efficient statistical inference, requiring the simulation of only 1 in every 5 million points of a joint microarchitecture-application design space while achieving median prediction error rates as low as 4.1 percent for performance and 4.3 percent for power.
Statistically rigorous regression modeling for the microprocessor design space
Qiang Wu, Margaret Martonosi, Douglas Clark, Vijay Reddi, Dan Connors, Youfeng Wu, Jin Lee, and David Brooks. 1/2006. “Dynamic-compiler-driven control for microprocessor energy and performance.” Micro, IEEE, 26, 1, Pp. 119–129. Publisher's VersionAbstract
A general dynamic-compilation environment offers power and performance control opportunities for microprocessors. The authors propose a dynamic-compiler-driven runtime voltage and frequency optimizer. A prototype of their design, implemented and deployed in a real system, achieves energy savings of up to 70 percent
Xiaoyao Liang and David Brooks. 2006. “Latency adaptation for multiported register files to mitigate the impact of process variations.” Workshop on Architectural Support for Gigascale Integration (ASGI-06, held in conjuction with ISCA-33).Abstract

Design variability due to die-to-die and within-die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high-performance microprocessors in future process technology generations. This variability manifests itself by increasing the frequency variance and decreasing the mean frequency of fabricated chips. In this paper we develop a model for the impact of variability on the performance of multiported SRAM-based structures such as physical register files which are key architectural components that may encounter variability problems. We find that naively resizing or increasing the access latency of these performance critical datapath resources can have frequency benefits, but may incur a significant IPC loss that limits overall system performance. We propose an extension to latency adaptation called port switching which more efficiently exploits the technique to remedy the IPC loss. We find that even under a conservative, worst case study, 18 % mean frequency improvement with less than 5 % IPC loss is possible for the 65nm technology node. Finally, we contrast the impact of die-to-die and within-die variations on chip performance and demonstrate that the proposed technique can compensate the frequency loss mainly due to within-die variations.

Latency adaptation for multiported register files to mitigate the impact of process variations
Benjamin Lee and David Brooks. 2006. “Regression modeling strategies for microarchitectural performance and power prediction.” Proceedings of the 2006 ASPLOS Conference, Pp. 185–194.Abstract

We propose regression modeling as an effective approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This report addresses fundamental challenges in microarchitectural simulation costs via statistical modeling. Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) per benchmark application-specific models designed to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing model sensitivity to sample and region sizes, we find 4,000 samples from a design space of approximately 22 billion points, are sufficient for both application-specific and regional modeling and prediction. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

Regression modeling strategies for microarchitectural performance and power prediction
Benjamin Lee and David Brooks. 2006. “Statistical inference”.
2005
Hanumolu Kumar, Gu Wei, and Moon Ku. 6/2005. “Equalizers for high-speed serial links.” International journal of high speed electronics and systems, 15, 02, Pp. 429–458. Publisher's VersionAbstract

In this tutorial paper we present equalization techniques to mitigate inter-symbol interference (ISI) in high-speed communication links. Both transmit and receive equalizers are analyzed and high-speed circuits implementing them are presented. It is shown that a digital transmit equalizer is the simplest to design, while a continuous-time receive equalizer generally provides better performance. Decision feedback equalizer (DFE) is described and the loop latency problem is addressed. Finally, techniques to set the equalizer parameters adaptively are presented.

Equalizers for high-speed serial links
Benjamin Lee and David Brooks. 1/2005. “Effects of pipeline complexity on SMT/CMP power-performance efficiency.” Power, 106, Pp. 1.Abstract
We consider processor core complexity and its impli-cations for the power-performance efficiency of SMT and CMP architectures, exploring fundamental trade-offs be-tween the efficiency of multi-core architectures and the com-plexity of their cores from a power-performance perspec-tive. Taking pipeline depth and width as proxies for core complexity, we conduct power-performance simulations of several SMT and CMP architectures employing cores of varying complexity. Our analyses identify efficient pipeline dimensions and outline the implications of using a power-performance efficiency metric for core complexity. Collectively, our results suggest SMT architectures en-able efficient increases in pipeline dimensions and core complexity. Furthermore, reducing pipeline di-mensions in CMP cores is inefficient, assuming ideal power-performance scaling from voltage/frequency scal-ing and circuit re-tuning. Given these conclusions, we formulate guidelines for complexity effective design.
Effects of pipeline complexity on SMT/CMP power-performance efficiency
Jayanth Srinivasan, Sarita Adve, Pradip Bose, Jude Rivers, Y. Li, David Brooks, Z Hu, K Skadron, V Srinivasan, and M Gschwind. 2005. “The case for microarchitectural awareness of lifetime reliability.” IEEE Micro, 25, 3, Pp. 70–80.
Xiaoyao Liang and David Brooks. 2005. “Highly accurate power modeling method for SRAM structures with simple circuit simulation.” 2nd Watson Conf. Interaction Between Architecture, Circuits, Compilers.
2004
David Brooks and Joerg Henkel. 8/2004. “High level power modeling and analysis.” International Symposium on Low Power Electronics and Design: Proceedings of the 2004 international symposium on Low power electronics and design. Publisher's Version
Victor Zyuban, David Brooks, Viji Srinivasan, Michael Gschwind, Pradip Bose, Philip Strenski, and Philip Emma. 8/2004. “Integrated analysis of power and performance for pipelined microprocessors.” Computers, IEEE Transactions on, 53, 8, Pp. 1004–1016. Publisher's VersionAbstract
Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in today’s cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. In this paper, we describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.
Integrated analysis of power and performance for pipelined microprocessors
David Brooks, Pradip Bose, and Margaret Martonosi. 3/2004. “Power-performance simulation: design and validation strategies.” ACM SIGMETRICS Performance Evaluation Review, 31, 4, Pp. 13–18. Publisher's VersionAbstract

Microprocessor research and development increasingly relies on detailed simulations to make design choices. As such, the structure, speed, and accuracy of microarchitectural simulators is of critical importance to the field. This paper describes our experiences in building two simulators, using related but distinct approaches.One of the most important attributes of a simulator is its ability to accurately convey design trends as different aspects of the microarchitecture are varied. In this work, we break down accuracy---a broad term--- into two sub-types: relative and absolute accuracy. We then discuss typical abstraction errors in power-performance simulators and show when they do (or do not) affect the design rule choices a user of those simulator might make. By performing this validation study using the Wattch and Power Timer simulators, the work addresses validation issues both broadly and in the specific case of a fairly widely-used simulator.

Power-performance simulation: design and validation strategies

Pages