Publications by Author: Pradip Bose

2017
Ramon Bertran, Pradip Bose, David Brooks, Jeff Burns, Alper Buyuktosunoglu, Nandhini Chandramoorthy, Eric Cheng, Martin Cochet, Schuyler Eldridge, Daniel Friedman, and others. 11/5/2017. “Very low voltage (VLV) design.” In 2017 IEEE International Conference on Computer Design (ICCD), Pp. 601–604. Boston, MA, USA: IEEE. Publisher's VersionAbstract
This paper is a tutorial-style introduction to a special session on: Effective Voltage Scaling in the Late CMOS Era. It covers the fundamental challenges and associated solution strategies in pursuing very low voltage (VLV) designs. We discuss the performance and system reliability constraints that are key impediments to VLV. The associated trade-offs across power, performance and reliability are helpful in inferring the optimal operational voltage-frequency point. This work was performed under the auspices of an ongoing DARPA program (named PERFECT) that is focused on maximizing system-level energy efficiency.
Very low voltage (VLV) design
2015
Sam Xi, Hans Jacobson, Pradip Bose, Gu Wei, and David Brooks. 2/7/2015. “Quantifying Sources of Error in McPAT and Potential Impacts on Architectural Studies.” In International Symposium on High Performance Computer Architecture (HPCA). Publisher's VersionAbstract
Architectural power modeling tools are widely used by the computer architecture community for rapid evaluations of high-level design choices and design space explorations. Currently, McPAT is the de facto power model, but the literature does not yet contain a careful examination of its modeling accuracy. In addition, the issue of how greatly power modeling error can affect architectural-level studies has not been quantified before. In this work, we present the first rigorous assessment of McPAT’s core power and area models with a detailed, validated power modeling toolchain used in current industrial practice. We find that McPAT’s predictions can have significant error because some of the models are either incomplete, too high-level, or assume implementations of structures that differ from that of the core at hand. We demonstrate that large errors are possible when using McPAT’s dynamic power estimates in the context of voltage noise and thermal hotspots, but for steady-state properties, accurately modeling leakage power is more important. Based on our analysis, we are able to provide guidelines for creating accurate McPAT models, even without access to detailed industrial power modeling tools. We conclude that in spite of its accuracy gaps, McPAT is still a very useful tool for many architectural studies, and its limitations can often be adequately addressed for a given research study of interest.
Quantifying Sources of Error in McPAT and Potential Impacts on Architectural Studies
2014
Pradip Bose, David Brooks, Subhasish Mitra, Karthick Rajamani, Mircea Stan, Kevin Skadron, and Gu Wei. 4/1/2014. “Cross-Layer Modeling Framework for Energy-Efficient Resilience”. Publisher's VersionAbstract

We describe a novel cross-layer, resilience focused integrated modeling framework. This is targeted to help define ultra energy-efficient embedded systems in the post-14nm CMOS design era, without compromising system-level resilience. The targeted application domain is represented by the suite of applications and kernels announced as part of the ongoing PERFECT program sponsored by DARPA MTO.

Cross-Layer Modeling Framework for Energy-Efficient Resilience
2009
Meeta Gupta, Jude Rivers, Pradip Bose, Gu Wei, and David Brooks. 12/12/2009. “Tribeca: design for PVT variations with local recovery and fine-grained adaptation.” In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Pp. 435–446. New York, NY, USA: IEEE. Publisher's VersionAbstract

With continued advances in CMOS technology, parameter variations are emerging as a major design challenge. Irregularities during the fabrication of a microprocessor and variations of voltage and temperature during its operation widen worst-case timing margins of the design - degrading performance significantly. Because runtime variations like supply voltage droops and temperature fluctuations depend on the activity signature of the processor's workload, there are several opportunities to improve performance by dynamically adapting margins. This paper explores the power-performance efficiency gains that result from designing for typical conditions while dynamically tuning frequency and voltage to accommodate the runtime behavior of workloads. Such a design depends on a fail-safe mechanism that allows it to protect against margin violations during adaptation; we evaluate several such mechanisms, and we propose a local recovery scheme that exploits spatial variation among the units of the processor. While a processor designed for worst-case conditions might only be capable of a frequency that is 75% of an ideal processor with no parameter variations, we show that a fine-grained global frequency tuning mechanism improves power-performance efficiency (BIPS 3 /W) by 40% while operating at 91% of an ideal processor's frequency. Moreover, a per-unit voltage tuning mechanism aims to reduce the effect of within-die spatial variations to provide a 55% increase in power-performance efficiency. The benefits reported are clearly substantial in light of the <1% area overhead relative to existing global recovery mechanisms.

Tribeca: design for PVT variations with local recovery and fine-grained adaptation
2005
Jayanth Srinivasan, Sarita Adve, Pradip Bose, Jude Rivers, Y. Li, David Brooks, Z Hu, K Skadron, V Srinivasan, and M Gschwind. 2005. “The case for microarchitectural awareness of lifetime reliability.” IEEE Micro, 25, 3, Pp. 70–80.
2004
Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron, and Pradip Bose. 8/11/2004. “Understanding the energy efficiency of simultaneous multithreading.” In Proceedings of the 2004 international symposium on Low power electronics and design, Pp. 44–49. Newport Beach, CA, USA: ACM. Publisher's VersionAbstract
Simultaneous multithreading (SMT) has proven to be an effective method of increasing the performance of microprocessors by extracting additional instruction-level parallelism from multiple threads. In current microprocessor designs, power-efficiency is of critical importance, and we present modeling extensions to an architectural simulator to allow us to study the power-performance efficiency of SMT. After a thorough design space exploration we find that SMT can provide a performance speedup of nearly 20% for a wide range of applications with a power overhead of roughly 24%. Thus, SMT can provide a substantial benefit for energy-efficiency metrics such as ED/sup 2/. We also explore the underlying reasons for the power uplift, analyze the impact of leakage-sensitive process technologies, and discuss our model validation strategy.
Understanding the energy efficiency of simultaneous multithreading
Victor Zyuban, David Brooks, Viji Srinivasan, Michael Gschwind, Pradip Bose, Philip Strenski, and Philip Emma. 8/2004. “Integrated analysis of power and performance for pipelined microprocessors.” Computers, IEEE Transactions on, 53, 8, Pp. 1004–1016. Publisher's VersionAbstract
Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in today’s cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. In this paper, we describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.
Integrated analysis of power and performance for pipelined microprocessors
David Brooks, Pradip Bose, and Margaret Martonosi. 3/2004. “Power-performance simulation: design and validation strategies.” ACM SIGMETRICS Performance Evaluation Review, 31, 4, Pp. 13–18. Publisher's VersionAbstract

Microprocessor research and development increasingly relies on detailed simulations to make design choices. As such, the structure, speed, and accuracy of microarchitectural simulators is of critical importance to the field. This paper describes our experiences in building two simulators, using related but distinct approaches.One of the most important attributes of a simulator is its ability to accurately convey design trends as different aspects of the microarchitecture are varied. In this work, we break down accuracy---a broad term--- into two sub-types: relative and absolute accuracy. We then discuss typical abstraction errors in power-performance simulators and show when they do (or do not) affect the design rule choices a user of those simulator might make. By performing this validation study using the Wattch and Power Timer simulators, the work addresses validation issues both broadly and in the specific case of a fairly widely-used simulator.

Power-performance simulation: design and validation strategies
2003
David Brooks, Pradip Bose, Vijayalakshmi Srinivasan, Michael Gschwind, Philip Emma, and Michael Rosenfield. 9/2003. “New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors.” IBM Journal of Research and Development, 47, 5.6, Pp. 653–670. Publisher's VersionAbstract
The PowerTimer toolset has been developed for use in early-stage, microarchitecture-level power-performance analysis of microprocessors. The key component of the toolset is a parameterized set of energy functions that can be used in conjunction with any given cycle-accurate microarchitectural simulator. The energy functions model the power consumption of primitive and hierarchically composed building blocks which are used in microarchitecture-level performance models. Examples of structures modeled are pipeline stage latches, queues, buffers and component read/write multiplexers, local clock buffers, register files, and cache array macros. The energy functions can be derived using purely analytical equations that are driven by organizational, circuit, and technology parameters or behavioral equations that are derived from empirical, circuit-level simulation experiments. After describing the modeling methodology, we present analysis results in the context of a current-generation superscalar processor simulator to illustrate the use and effectiveness of such early-stage models. In addition to average power and performance tradeoff analysis, PowerTimer is useful in assessing the typical and worst-case power (or current) swings that occur between successive cycle windows in a given workload execution. Such a characterization of workloads at the early stage of microarchitecture definition helps pinpoint potential inductive noise problems on the voltage rail that can be addressed by designing an appropriate package or by suitably tuning the dynamic power management controls within the processor.
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors
2002
Viji Srinivasan, David Brooks, Michael Gschwind, Pradip Bose, Victor Zyuban, Philip Strenski, and Philip Emma. 11/18/2002. “Optimizing pipelines for power and performance.” In Microarchitecture, 11/18/2002. (MICRO-35). Proceedings. 35th Annual IEEE/ACM International Symposium on, Pp. 333–344. IEEE. Publisher's VersionAbstract
During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.
Optimizing pipelines for power and performance
Alper Buyuktosunoglu, David Albonesi, Stanley Schuster, David Brooks, Pradip Bose, and Peter Cook. 1/2002. “Power-efficient issue queue design.” In Power aware computing, Pp. 35–58. Kluwer Academic Publishers. Publisher's VersionAbstract

Increasing levels of power dissipation threaten to limit the performance gains of future high-end, out-of-order issue microprocessors. Therefore, it is imperative that designers devise techniques that significantly reduce the power dissipation of the key hardware structures on the chip without unduly compromising performance. Such a key structure in out-of-order designs is the issue queue. Although crucial in achieving high performance, the issue queues are often a major contributor to the overall power consumption of the chip, potentially affecting both thermal issues related to hot spots and energy issues related to battery life. In this chapter, we present two techniques that significantly reduce issue queue power while maintaining high performance operation. First, we evaluate the power savings achieved by implementing a CAM/RAM structure for the issue queue as an alternative to the more power-hungry latch-based issue queue used in many designs. We then present the microarchitecture and circuit design of an adaptive issue queue that leverages transmission gate insertion to provide dynamic low-cost configurability of size and speed. We compare two different dynamic adaptation algorithms that use issue queue utilization and parallelism metrics in order to size the issue queue on-the-fly during execution. Together, these two techniques provide over a 70% average reduction in issue queue power dissipation for a collection of the SPEC CPU2000 integer benchmarks, with only a 3% overall performance degradation.

Power-efficient issue queue design
Pradip Bose, David Brooks, Viji Srinivasan, and Philip Emma. 2002. Power-Performance and Power Swing Characterization in Adaptive Microarchitectures. Technical Paper Archive - Research Reports. IBM Research. Publisher's VersionAbstract
In this paper, we present an analysis of some of the fundamental power-performance tradeoffs in processors that employ adaptive techniques to vary sizes, bandwidths, clock-gating modes and clock frequencies. Initial expectations are set using simple analytical reasoning models. Later, simulation-based data is presented in the context of a simple, low-power super scalar processor prototype (called LPX) that is currently under development as a test vehicle. There are three fundamental issues that we attempt to address in this paper: (a) Does dynamic adaptation - in clocking or microarchitectural resources - help extend the power-performance efficiency range of wider-issue superscalars ? (b) What factors of power and power-density reductions are within practical reach in future adaptive processors ? (c) Does the presence of dynamic adaptation modes cause unacceptably large, worst-case power (or current) swings in affected sub-units ?
Power-Performance and Power Swing Characterization in Adaptive Microarchitectures
2001
David Brooks, Margaret Martonosi, John Wellman, and Pradip Bose. 12/2001. “Power-performance modeling and tradeoff analysis for a high end microprocessor.” Power-Aware Computer Systems, Pp. 126–136. Publisher's VersionAbstract
We describe a new power-performance modeling toolkit, developed to aid in the evaluation and definition of future power-efficient, PowerPC TM processors. The base performance models in use in this project are: (a) a fast but cycle-accurate, parameterized research simulator and (b) a slower, pre-RTL reference model that models a specific high-end machine in full, latchaccurate detail. Energy characterizations are derived from real, circuit-level power simulation data. These are then combined to form higher-level energy models that are driven by microarchitecture-level parameters of interest. The overall methodology allows us to conduct power-performance tradeoff studies in defining the follow-on design points within a given product family. We present a few experimental results to illustrate the kinds of tradeoffs one can study using this tool.
Power-performance modeling and tradeoff analysis for a high end microprocessor
Alper Buyuktosunoglu, Stanley Schuster, David Brooks, Pradip Bose, Peter Cook, and David Albonesi. 6/11/2001. “An adaptive issue queue for reduced power at high performance.” Power-Aware Computer Systems, Pp. 25–39. Publisher's VersionAbstract

Increasing power dissipation has become a major constraint for future performance gains in the design of microprocessors. In this paper, we present the circuit design of an issue queue for a superscalar processor that leverages transmission gate insertion to provide dynamic low-cost configurability of size and speed. A novel circuit structure dynamically gathers statistics of issue queue activity over intervals of instruction execution. These statistics are then used to change the size of an issue queue organization on-the-fly to improve issue queue energy and performance. When applied to a fixed, full-size issue queue structure, the result is up to a 70% reduction in energy dissipation. The complexity of the additional circuitry to achieve this result is almost negligible. Furthermore, self-timed techniques embedded in the adaptive scheme can provide a 56% decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size from 32 entries (largest possible) to 8 entries (smallest possible in our design).

An adaptive issue queue for reduced power at high performance
Alper Buyuktosunoglu, David Albonesi, Stanley Schuster, David Brooks, Pradip Bose, and Peter Cook. 3/2001. “A circuit level implementation of an adaptive issue queue for power-aware microprocessors.” In Proceedings of the 11th Great Lakes symposium on VLSI, Pp. 73–78. ACM. Publisher's VersionAbstract
Increasing power dissipation has become a major constraint for future per~brmartce gains in the design of microproces- sors. In this paper, we present the circuit design of an issue queue for a superscalar processor that leverages transmis- sion gate insertion to provide dynamic low-cost configura- bility of size and speed. A novel circuit structure dynami- cally gathers statistics of issue queue activity over intervals of instruction execution. These statistics are then used to change the size of an issue queue organization on-the-fly to improve issue queue energy and performance. When applied to a fixed, full-size issue queue structure, the result is up to a 70% reduction in energy dissipation. The complexity of the additional circuitry to achieve this result is almost neg- ligible. Furthermore, self-timed techniques embedded in the adaptive scheme can provide a 56% decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size f¥om 32 entries (largest possible) to 8 entries (smallest possible in our design). 
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
Pradip Bose, Margaret Martonosi, and David Brooks. 2001. “Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions.” Tutorial, ACM SIGMETRICS. Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions
2000
David Brooks, Pradip Bose, Stanley Schuster, Hans Jacobson, Prabhaka Kudva, Alper Buyuktosunoglu, J Wellman, Victor Zyuban, Manish Gupta, and Peter Cook. 2000. “Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors.” Micro, IEEE, 20, 6, Pp. 26–44.