Publications

2007
Ratnayake NS, F Haratsch, and Gu Wei. 9/2007. “Serial Sum-Product Architecture for Low-Density Parity-Check Codes.” In 2007 16th International Conference on Computer Communications and Networks, Pp. 154–158. IEEE. Publisher's VersionAbstract
A serial sum-product architecture for low-density parity-check (LDPC) codes is presented. In the proposed architecture, a standard bit node processing unit computes the bit to check node messages sequentially, while the check node computations are broken up into several steps and computed on the fly. This bit node centric architecture requires considerably less memory compared to other serial architectures, including the check node centric architecture.
Serial Sum-Product Architecture for Low-Density Parity-Check Codes
Meeta Gupta, Krishna Rangan, Michael Smith, Gu Wei, and David Brooks. 8/27/2007. “Towards a software approach to mitigate voltage emergencies.” In Low Power Electronics and Design (ISLPED), 2007 ACM/IEEE International Symposium on, Pp. 123–128. IEEE. Publisher's VersionAbstract
Increases in peak current draw and reductions in the operating voltages of processors continue to amplify the importance of dealing with voltage fluctuations in processors. One approach suggested has been to not only react to these fluctuations but also attempt to eliminate future occurrences of these fluctuations by dynamically modifying the executing program. This paper investigates the potential of a very simple dynamic scheme to appreciably reduce the number of run-time voltage emergencies. It shows that we can map many of the voltage emergencies in the execution of the SPEC benchmarks on an aggressive superscalar design to a few static loops, categorize the microarchitectural cause of the emergencies in each important loop through simple observations and a simple priority function, and finally apply straight forward software optimization strategies to mitigate up to 70% of the future voltage swings.
Towards a software approach to mitigate voltage emergencies
Helal M, Straayer Z, Gu Wei, and Perrott H. 6/14/2007. “A low jitter 1.6 GHz multiplying DLL utilizing a scrambling time-to-digital converter and digital correlation.” In 2007 IEEE Symposium on VLSI Circuits, Pp. 166–167. IEEE. Publisher's VersionAbstract
This paper presents a 1.6 GHz multiplying delay-locked loop (MDLL) that leverages time-to-digital conversion and a digital correlation technique to achieve low deterministic jitter while still maintaining low random jitter. A proposed time-to-digital converter consists of a ring oscillator that is gated on and off to accurately measure time and scramble the measurement's residual error. Using a 50 MHz reference, the prototype system has measured reference spurs less than -59 dBc and an overall measured jitter of 1.41 ps.
A low jitter 1.6 GHz multiplying DLL utilizing a scrambling time-to-digital converter and digital correlation
David Brooks, Robert Dick, Russ Joseph, and Li Shang. 5/2007. “Power, thermal, and reliability modeling in nanometer-scale microprocessors.” Micro, IEEE, 27, 3, Pp. 49–62. Publisher's VersionAbstract
System integration and performance requirements are dramatically increasing the power consumptions and power densities of high-performance microprocessors. High power consumption introduces challenges to various aspects of microprocessor and computer system design. It increases the cost of cooling and packaging design, reduces system reliability, complicates power supply circuitry design, and reduces battery life. Researchers have recently dedicated intensive effort to power-related design problems. Modeling is the essential first step toward design optimization. In this article, the power, thermal and reliability modeling problems are explained and recent advances in their accurate and efficient analysis are surveyed.
Power, thermal, and reliability modeling in nanometer-scale microprocessors
Benjamin Lee and David Brooks. 5/2007. “Spatial Sampling and Regression Strategies.” Micro, IEEE, 27, 3, Pp. 74–93. Publisher's VersionAbstract
This new simulation paradigm for microarchitectural design evaluation and optimization counters growing simulation costs stemming from the exponentially increasing size of design spaces. the authors demonstrate how to obtain a more comprehensive understanding of the design space by selectively simulating a modest number of designs from that space and then more effectively leveraging the simulation data using techniques in statistical inference.
Spatial Sampling and Regression Strategies
Meeta Gupta, Jarod Oatley, Russ Joseph, Gu Wei, and David Brooks. 4/16/2007. “Understanding voltage variations in chip multiprocessors using a distributed power-delivery network.” In Design, Automation & Test in Europe Conference & Exhibition, 4/16/2007. DATE'07, Pp. 1–6. Nice, France: IEEE. Publisher's VersionAbstract
Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and power management require that designers be increasingly cognizant of power supply variations. These variations, primarily due to fast changes in supply current, can be attributed to architectural gating events that reduce power dissipation. In order to study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks that allows system designers to study localized, on-chip supply fluctuations in high-performance microprocessors. Using this model, the authors analyze voltage variations in the context of next-generation chip-multiprocessor (CMP) architectures using both real applications and synthetic current traces. They find that the activity of distinct cores in CMPs present several new design challenges when considering power supply noise, and they describe potentially problematic activity sequences that are unique to CMP architectures
Understanding voltage variations in chip multiprocessors using a distributed power-delivery network
Benjamin Lee, David Brooks, Bronis Supinski, Martin Schulz, Karan Singh, and Sally McKee. 3/2007. “Methods of inference and learning for performance modeling of parallel applications.” In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, Pp. 249–258. ACM. Publisher's VersionAbstract

Increasing system and algorithmic complexity combined with a growing number of tunable application parameters pose significant challenges for analytical performance modeling. We propose a series of robust techniques to address these challenges. In particular, we apply statistical techniques such as clustering, association, and correlation analysis, to understand the application parameter space better. We construct and compare two classes of effective predictive models: piecewise polynomial regression and artifical neural networks. We compare these techniques with theoretical analyses and experimental results. Overall, both regression and neural networks are accurate with median error rates ranging from 2.2 to 10.5 percent. The comparable accuracy of these models suggest differentiating features will arise from ease of use, transparency, and computational efficiency.

Methods of inference and learning for performance modeling of parallel applications
Benjamin Lee and David Brooks. 2/10/2007. “Illustrative design space studies with microarchitectural regression models.” In High Performance Computer Architecture, 2/10/2007. HPCA 2/10/2007. IEEE 13th International Symposium on, Pp. 340–351. Phoenix, Arizona, USA: IEEE. Publisher's VersionAbstract
We apply a scalable approach for practical, comprehensive design space evaluation and optimization. This approach combines design space sampling and statistical inference to identify trends from a sparse simulation of the space. The computational efficiency of sampling and inference enables new capabilities in design space exploration. We illustrate these capabilities using performance and power models for three studies of a 260,000 point design space: (1) Pareto frontier analysis, (2) pipeline depth analysis, and (3) multiprocessor heterogeneity analysis. For each study, we provide an assessment of predictive error and sensitivity of observed trends to such error. We construct Pareto frontiers and find predictions for Pareto optima are no less accurate than those for the broader design space. We reproduce and enhance prior pipeline depth studies, demonstrating constrained sensitivity studies may not generalize when many other design parameters are held at constant values. Lastly, we identify efficient heterogeneous core designs by clustering per benchmark optimal architectures. Collectively, these studies motivate the application of techniques in statistical inference for more effective use of modern simulator infrastructure
Illustrative design space studies with microarchitectural regression models
Wonyoung Kim, Meeta Gupta, Gu-Wei, and David Brooks. 2007. “Enabling on-chip switching regulators for multi-core processors using current staggering.” Proceedings of the Work. on Architectural Support for Gigascale Integration.Abstract

Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a wellknown technique to reduce energy in portable systems, but DVFS effectiveness suffers from the fact that voltage transitions occur on the order of tens of microseconds. Voltage regulators that are integrated on the same chip as the microprocessor core provide the benefit of both nanosecond-scale voltage switching and improved power delivery. However, the implementation of on-chip regulators presents many challenges including regulator efficiency and output voltage transient characteristics. In this paper, we discuss architectural support for on-chip regulator designs. Specifically, we show that in a chip-multiprocessor system, current staggering can be employed by restricting the simultaneous enabling/disabling of cores due to clock gating. We discuss tradeoffs between current staggering and regulator circuit design parameters, and we show that regulation efficiency of greater than 80% is possible for a variety of multi-threaded applications.

Benjamin Lee and David Brooks. 2007. “Statistical inference for efficient microarchitectural analysis.” SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, Pp. 130–es. Publisher's VersionAbstract

Microarchitectural design exploration is often inefficient and ad hoc due to computational costs of simulators. Trends toward multi-core, multi-threading lead to diversity in viable core designs, thereby requiring comprehensive design exploration while exponentially increasing design space size. Similarly, application performance topology is a function of input parameters, but models to optimize performance and/or predict scalability are increasingly difficult to derive analytically due to system complexity. We collect measurements sampled sparsely, uniformly at random from the space of interest and formulate non-linear regression models. We demonstrate the broad effectiveness of regression for predicting (1) the power and performance of a microarchitectural design space with median error rates of 5.5 to 7.5 percent using 1K samples from a 1B point space and (2) the performance of parallel applications, Semicoarsening Multigrid and High-Performance Linpack, with median error rates of 2.5 to 5.0 percent using 500 samples from more than 3K points.

Statistical inference for efficient microarchitectural analysis
Mark Hempstead, Nikhil Tripathi, Patrick Mauro, Gu Wei, and David Brooks. 2007. “Ultra low power system for sensor network applications.” 32nd International Symposium on Computer Architecture (ISCA'05). Publisher's VersionAbstract
Recent years have seen a burgeoning interest in embedded wireless sensor networks with applications ranging from habitat monitoring to medical applications. Wireless sensor networks have several important attributes that require special attention to device design. These include the need for inexpensive, long-lasting, highly reliable devices coupled with very low performance requirements. Ultimately, the "holy grail" of this design space is a truly untethered device that operates off of energy scavenged from the ambient environment. In this paper, we describe an application-driven approach to the architectural design and implementation of a wireless sensor device that recognizes the event-driven nature of many sensor-network workloads. We have developed a full-system simulator for our sensor node design to verify and explore our architecture. Our simulation results suggest one to two orders of magnitude reduction in power dissipation over existing commodity-based systems for an important class of sensor network applications. We are currently in the implementation stage of design, and plan to tape out the first version of our system within the next year.
Ultra low power system for sensor network applications
2006
Yingmin Li, Benjamin Lee, David Brooks, Zhigang Hu, and Kevin Skadron. 12/22/2006. “Impact of thermal constraints on multi-core architectures.” 10th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronics Systems, San Diego. Publisher's VersionAbstract
This paper shows how thermal constraints affect the multidimensional design space for chip multiprocessors, considering the inter-related variables of CPU count, pipeline depth, superscalar width, L2 cache size, and operating voltage and frequency. The results show the importance of thermal modeling and the need for new thermal modeling capabilities and hence the need for collaboration between the thermal engineeringand computerarchitecturecommunities. Thermalconstraints both shift the optimal intra- and inter-core organization, and dominate other physical constraints such as pinbandwidth and power delivery. Different thermal constraints also require different optimization strategies. For aggressive cooling solutions, reducing power density is at least as important as reducing total power, while for low-cost cooling solutions, reducing total power is more important.
Impact of thermal constraints on multi-core architectures
Fang Chi, Sharon Kedar, Susan Owen, Gu Wei, David Brooks, and Jonathan Lees. 12/18/2006. “System-on-chip architecture design for intelligent sensor networks.” In 2006 International Conference on Intelligent Information Hiding and Multimedia, Pp. 579–582. IEEE. Publisher's VersionAbstract
While wireless sensor networks can generically be used for a wide variety of applications, breakthrough innovations are most often achieved when driven by a genuine need or application, with its specific system-level and science-related requirements and objectives. Hence, our work focuses on the development of wireless sensor network system-on-chip devices and supporting software for volcano monitoring, which we call Sensor Network for Active Volcanoes (SNAV). In this paper we present preliminary results of our research and development work on intelligent sensor networks for monitoring hazardous environments especially the SNAV system-on-chip design for active volcanoes monitoring.
Xiaoyao Liang and David Brooks. 12/9/2006. “Mitigating the impact of process variations on processor register files and execution units.” In Microarchitecture, 12/9/2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Pp. 504–514. IEEE. Publisher's VersionAbstract
Design variability due to die-to-die and within-die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high-performance microprocessors in future process technology generations. One serious manifestation of this increased variability is a reduction in the mean frequency of fabricated chips due to fluctuations in device characteristics causing reduced circuit performance. In this paper, we propose to mitigate the impact of variations through variable-latency register files and execution units which are key architectural components that may encounter variability problems. We also illustrate the importance of closing the gap in expected delay of these distinct structures. A post fabrication test and configuration strategy is proposed. We find that 23% mean frequency improvement with an average IPC loss of 3% (and never exceeding 5% for worst case chips) is possible for the 65nm technology node by properly adopting the proposed schemes
Mitigating the impact of process variations on processor register files and execution units
Benjamin Lee and David Brooks. 12/2006. “Accurate and efficient regression modeling for microarchitectural performance and power prediction.” In ACM SIGOPS Operating Systems Review, 5th ed., 40: Pp. 185–194. ACM. Publisher's VersionAbstract

We propose regression modeling as an efficient approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This paper addresses fundamental challenges in microarchitectural simulation cost by reducing the number of required simulations and using simulated results more effectively via statistical modeling and inference.Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving median error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) application-specific models to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing sensitivity to the number of samples simulated for model formulation, we find fewer than 4,000 samples from a design space of approximately 22 billion points are sufficient. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

Accurate and efficient regression modeling for microarchitectural performance and power prediction
Xiaoyao Liang and David Brooks. 11/5/2006. “Microarchitecture parameter selection to optimize system performance under process variation.” In Computer-Aided Design, 11/5/2006. ICCAD'06. IEEE/ACM International Conference on, Pp. 429–436. IEEE. Publisher's VersionAbstract

Design variability due to within-die and die-to-die process variations has the potential to significantly reduce the maximum operating frequency and the effective yield of high-performance microprocessors in future process technology generations. This variability manifests itself by increasing the number and criticality of long delay paths. To quantify this impact, we use an architectural process variation model that is appropriate for the analysis of system performance in the early-stages of the design process. We propose a method of selecting microarchitectural parameters to mitigate the frequency impact due to process variability for distinct structures, while minimizing IPC (instructions-per-cycle) loss. We propose an optimization procedure to be used for system-level design decisions, and we find that joint architecture and statistical timing analysis can be more advantageous than pure circuit level optimization. Overall, the technique can improve the 90% yield frequency by about 14% with 3% IPC loss for a baseline machine with a 20FO4 logic depth per pipestage. This approach is sensitive to the selection of processor pipeline depth, and we demonstrate that machines with aggressive pipelines will experience greater challenges in coping with process variability.

Microarchitecture parameter selection to optimize system performance under process variation
Lukasz Strozek and David Brooks. 10/22/2006. “Efficient architectures through application clustering and architectural heterogeneity.” In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, Pp. 190–200. ACM. Publisher's VersionAbstract

Customizing architectures for particular applications is a promising approach to yield highly energy-efficient designs for embedded systems. This work explores the benefits of architectural customization for a class of embedded architectures typically used in energy-constrained application domains such as sensor node and multimedia processing. We implement a process flow that analyzes runtime profiles of applications and combines this information with a model for our architectural design space providing a robust customization engine built upon a fully automated method for determining an efficient architecture (together with appropriate application transformations). By profiling embedded benchmarks from a variety of sensor and multimedia applications, the paper shows the relative energy savings resulting from various architectural optimizations and identifies the number of architectures that achieves near-optimal savings for a group of applications. This paper proposes the use of heterogeneous chip-multiprocessors as a cost-effective approach to capitalize on the potential energy savings provided by application customization while executing a range of applications efficiently.

Efficient architectures through application clustering and heterogeneity
BC Lee and David Brooks. 10/20/2006. “Wild and Crazy Ideas Session-Session 5-Estimation and Prediction of Power and Performance-Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction.” SIGOPS Operating Systems Review, 40, 5, Pp. 185–194. Publisher's VersionAbstract

We propose regression modeling as an efficient approach for accurately predicting performance and power for various applications executing on any microprocessor configuration in a large microarchitectural design space. This paper addresses fundamental challenges in microarchitectural simulation cost by reducing the number of required simulations and using simulated results more effectively via statistical modeling and inference.Specifically, we derive and validate regression models for performance and power. Such models enable computationally efficient statistical inference, requiring the simulation of only 1 in 5 million points of a joint microarchitecture-application design space while achieving median error rates as low as 4.1 percent for performance and 4.3 percent for power. Although both models achieve similar accuracy, the sources of accuracy are strikingly different. We present optimizations for a baseline regression model to obtain (1) application-specific models to maximize accuracy in performance prediction and (2) regional power models leveraging only the most relevant samples from the microarchitectural design space to maximize accuracy in power prediction. Assessing sensitivity to the number of samples simulated for model formulation, we find fewer than 4,000 samples from a design space of approximately 22 billion points are sufficient. Collectively, our results suggest significant potential in accurate and efficient statistical inference for microarchitectural design space exploration via regression models.

Wild and Crazy Ideas Session-Session 5-Estimation and Prediction of Power and Performance-Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction
Mark Hempstead, Gu Wei, and David Brooks. 10/2006. “Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations.” In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, Pp. 368–378. ACM. Publisher's VersionAbstract

Rising interest in the applications of wireless sensor networks has spurred research in the development of computing systems for low-throughput, energy-constrained applications. Unlike traditional performance oriented applications, sensor network nodes are primarily constrained by operation lifetime, which is limited by power consumption. Advanced CMOS process technologies provide ever increasing transistor density and improved performance characteristics. However, shrinking feature size and decreasing threshold voltages also lead to significant increases in leakage current, which is especially troublesome for applications with significant idle times. This work investigates tradeoffs between leakage and active power for low-throughput applications. We study these issues across a range of process technologies on a computing architecture that provides explicit support for fine-grain leakage-control techniques such as Vdd-gating and adaptive body bias. We present a methodology for selecting design parameters, including choice of process technology, that makes the optimal tradeoff between active power and leakage power for a given workload. Our results show that leakage power will dominate the selection of process technology, and architectures that support advanced leakage control techniques at the circuit level will be essential. We argue that without advanced low-power architectures future nano-scale process technologies will not be suited for sensor network applications.

Architecture and circuit techniques for low-throughput, energy-constrained systems across technology generations
Benjamin Lee, David Brooks, Bronis Supinski, and Martin Schulz. 9/29/2006. “Regression Modeling Strategies for Parameter Space Exploration”.Abstract
Increasing system and algorithmic complexity, combined with a growing number of tuanble application parameters, pose significant challenges for analytical performance modeling. This report outlines a series of robust techniques that enable efficient parameter space exploration based on empirical statistical modeling. In particular, this report applies statistical techniques such as clustering, association, correlation analyses to understand the parameter space better. Results from these statistical techniques guide the construction of piecewise polynomial regression models. Residual and significance tests ensure the resulting model is unbiased and efficient We demonstrate these techniques in R, a statistical computing environment, for predicting the performance of semicoarsening multigrid. 50 and 75 percent of predictions achieve error rates of 5.5 and 10.0 percent or less, respectively.
Regression Modeling Strategies for Parameter Space Exploration

Pages