Publications by Year: 2001

2001
David Brooks, Margaret Martonosi, John Wellman, and Pradip Bose. 12/2001. “Power-performance modeling and tradeoff analysis for a high end microprocessor.” Power-Aware Computer Systems, Pp. 126–136. Publisher's VersionAbstract
We describe a new power-performance modeling toolkit, developed to aid in the evaluation and definition of future power-efficient, PowerPC TM processors. The base performance models in use in this project are: (a) a fast but cycle-accurate, parameterized research simulator and (b) a slower, pre-RTL reference model that models a specific high-end machine in full, latchaccurate detail. Energy characterizations are derived from real, circuit-level power simulation data. These are then combined to form higher-level energy models that are driven by microarchitecture-level parameters of interest. The overall methodology allows us to conduct power-performance tradeoff studies in defining the follow-on design points within a given product family. We present a few experimental results to illustrate the kinds of tradeoffs one can study using this tool.
Power-performance modeling and tradeoff analysis for a high end microprocessor
David Brooks and Martonosi (advisor) Margaret. 11/2001. “Design and modeling of power-efficient computer architectures.” Princeton University ProQuest Dissertations Publishing. Publisher's VersionAbstract

Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and compiler writers, in addition to circuit designers. Most traditional power analysis tools achieve high accuracy by calculating power estimates for designs only after the circuit design, layout, and floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.

This thesis presents a methodology for estimating power dissipation at a much earlier stage in the design cycle and at a much higher level. Watch and PowerTimer are two working examples of the use of this methodology. Both tools provide a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. These tools are 1000X or more faster than existing layout-level power tools, and yet maintain accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. These tools can allow architects to explore and cull the design space early on and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

This thesis also considers several applications of architectural-level power modeling to propose specific architectural-level power and temperature saving optimizations—value-based clock gating and dynamic thermal management. Value-based clock gating is a technique which exploits the dynamic bitwidth requirements of typical applications to save power within arithmetic units and the memory hierarchy. We have demonstrated that this technique can save roughly 50% of the power in the integer execution units. With dynamic thermal management, temperature sensors and throttling techniques are combined to adaptively slow down the CPU for extended periods of particularly high-power code sequences. This allows the CPU package and power delivery system to be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

The techniques presented in this thesis represent some of the first work in the area of high-performance, low-power processor design at the architectural level. We hope that this work, and the other research in the area of low-power architectural modeling and design, will help future generations of processor architectures to meet the many new challenges in this area.

Design and modeling of power-efficient computer architectures
Gu Wei. 11/2001. “Energy-efficient I/O interface design with adaptive power-supply regulation”. Publisher's VersionAbstract
This paper presents a low-power high-speed CMOS signaling interface that operates off of an adaptively regulated supply. A feedback loop adjusts the supply voltage on a chain of inverters until the delay through the chain is equal to half of the input period. This voltage is then distributed to the I/O subsystem through an efficient switching power-supply regulator. Dynamically scaling the supply with respect to frequency leads to a simple and robust design consisting mostly of digital CMOS gates, while enabling maximum energy efficiency. The interface utilizes high-impedance drivers for operation across a wide range of voltages and frequencies, a dual-loop delay-locked loop for accurate timing recovery, and an input receiver whose bandwidth tracks with the I/O frequency to filter out high-frequency noise. Test chips fabricated in a 0.35-/spl mu/m CMOS technology achieve transfer rates of 0.2-1.0 Gb/s/pin with a regulated supply ranging from 1.3-3.2 V.
Energy-efficient I/O interface design with adaptive power-supply regulation
Russ Joseph, David Brooks, and Margaret Martonosi. 8/2001. “Live, runtime power measurements as a foundation for evaluating power/performance tradeoffs.” Workshop on Complexity Effectice Design WCED, held in conjunction with ISCA, 28.Abstract
Of the many ways one could gauge the complexity-effectiveness of a design or design element, one candidate approach is to consider a design's power/performance tradeoffs. This paper describes our early-stage results in a broad effort to evaluate the power-performance tradeoffs of a range of benchmarks and microarchitectures. In particular, this paper presents power data collected on-the-fly on real x86 machines as they execute carefully-constructed microbenchmarks. The microbenchmarks exercise aspects of the system such as data cache and branch predictor. They are parametrically-variable to consider how load dependence, cache miss rate, branch mispredict rate, and branch distance all impact the power and performance of a CPU. For example, from these experiments, we learn that CPU performance increases essentially monotonically with cache hit rate, while CPU power encounters a maximum at roughly 80-90% cache hit rates. Likewise, we show results demonstrating that performance-neutral issues such as bit populations in the data cache values can display interesting power trends. While the experimental results are preliminary, we feel that the techniques described in this paper will o er a useful foundation for a broad range of power/performance tradeoffs.
Alper Buyuktosunoglu, Stanley Schuster, David Brooks, Pradip Bose, Peter Cook, and David Albonesi. 6/11/2001. “An adaptive issue queue for reduced power at high performance.” Power-Aware Computer Systems, Pp. 25–39. Publisher's VersionAbstract

Increasing power dissipation has become a major constraint for future performance gains in the design of microprocessors. In this paper, we present the circuit design of an issue queue for a superscalar processor that leverages transmission gate insertion to provide dynamic low-cost configurability of size and speed. A novel circuit structure dynamically gathers statistics of issue queue activity over intervals of instruction execution. These statistics are then used to change the size of an issue queue organization on-the-fly to improve issue queue energy and performance. When applied to a fixed, full-size issue queue structure, the result is up to a 70% reduction in energy dissipation. The complexity of the additional circuitry to achieve this result is almost negligible. Furthermore, self-timed techniques embedded in the adaptive scheme can provide a 56% decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size from 32 entries (largest possible) to 8 entries (smallest possible in our design).

An adaptive issue queue for reduced power at high performance
Alper Buyuktosunoglu, David Albonesi, Stanley Schuster, David Brooks, Pradip Bose, and Peter Cook. 3/2001. “A circuit level implementation of an adaptive issue queue for power-aware microprocessors.” In Proceedings of the 11th Great Lakes symposium on VLSI, Pp. 73–78. ACM. Publisher's VersionAbstract
Increasing power dissipation has become a major constraint for future per~brmartce gains in the design of microproces- sors. In this paper, we present the circuit design of an issue queue for a superscalar processor that leverages transmis- sion gate insertion to provide dynamic low-cost configura- bility of size and speed. A novel circuit structure dynami- cally gathers statistics of issue queue activity over intervals of instruction execution. These statistics are then used to change the size of an issue queue organization on-the-fly to improve issue queue energy and performance. When applied to a fixed, full-size issue queue structure, the result is up to a 70% reduction in energy dissipation. The complexity of the additional circuitry to achieve this result is almost neg- ligible. Furthermore, self-timed techniques embedded in the adaptive scheme can provide a 56% decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size f¥om 32 entries (largest possible) to 8 entries (smallest possible in our design). 
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
David Brooks and Margaret Martonosi. 1/19/2001. “Dynamic thermal management for high-performance microprocessors.” In High-Performance Computer Architecture, 1/19/2001. HPCA. The Seventh International Symposium on, Pp. 171–182. IEEE. Publisher's VersionAbstract

With the increasing clock rate and transistor count of today's microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPU power dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip's rated maximum power dissipation. However system designers still must design thermal heat sinks to withstand the worse-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specifically we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and software implementations. With approximate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications.

Pradip Bose, Margaret Martonosi, and David Brooks. 2001. “Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions.” Tutorial, ACM SIGMETRICS. Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions