Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and compiler writers, in addition to circuit designers. Most traditional power analysis tools achieve high accuracy by calculating power estimates for designs only after the circuit design, layout, and floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.
This thesis presents a methodology for estimating power dissipation at a much earlier stage in the design cycle and at a much higher level. Watch and PowerTimer are two working examples of the use of this methodology. Both tools provide a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. These tools are 1000X or more faster than existing layout-level power tools, and yet maintain accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. These tools can allow architects to explore and cull the design space early on and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
This thesis also considers several applications of architectural-level power modeling to propose specific architectural-level power and temperature saving optimizations—value-based clock gating and dynamic thermal management. Value-based clock gating is a technique which exploits the dynamic bitwidth requirements of typical applications to save power within arithmetic units and the memory hierarchy. We have demonstrated that this technique can save roughly 50% of the power in the integer execution units. With dynamic thermal management, temperature sensors and throttling techniques are combined to adaptively slow down the CPU for extended periods of particularly high-power code sequences. This allows the CPU package and power delivery system to be designed for a much lower maximum power rating, with minimal performance impact for typical applications.
The techniques presented in this thesis represent some of the first work in the area of high-performance, low-power processor design at the architectural level. We hope that this work, and the other research in the area of low-power architectural modeling and design, will help future generations of processor architectures to meet the many new challenges in this area.
This paper presents a low-power high-speed CMOS signaling interface that operates off of an adaptively regulated supply. A feedback loop adjusts the supply voltage on a chain of inverters until the delay through the chain is equal to half of the input period. This voltage is then distributed to the I/O subsystem through an efficient switching power-supply regulator. Dynamically scaling the supply with respect to frequency leads to a simple and robust design consisting mostly of digital CMOS gates, while enabling maximum energy efficiency. The interface utilizes high-impedance drivers for operation across a wide range of voltages and frequencies, a dual-loop delay-locked loop for accurate timing recovery, and an input receiver whose bandwidth tracks with the I/O frequency to filter out high-frequency noise. Test chips fabricated in a 0.35-/spl mu/m CMOS technology achieve transfer rates of 0.2-1.0 Gb/s/pin with a regulated supply ranging from 1.3-3.2 V.