Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in today’s cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. In this paper, we describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.
The PowerTimer toolset has been developed for use in early-stage, microarchitecture-level power-performance analysis of microprocessors. The key component of the toolset is a parameterized set of energy functions that can be used in conjunction with any given cycle-accurate microarchitectural simulator. The energy functions model the power consumption of primitive and hierarchically composed building blocks which are used in microarchitecture-level performance models. Examples of structures modeled are pipeline stage latches, queues, buffers and component read/write multiplexers, local clock buffers, register files, and cache array macros. The energy functions can be derived using purely analytical equations that are driven by organizational, circuit, and technology parameters or behavioral equations that are derived from empirical, circuit-level simulation experiments. After describing the modeling methodology, we present analysis results in the context of a current-generation superscalar processor simulator to illustrate the use and effectiveness of such early-stage models. In addition to average power and performance tradeoff analysis, PowerTimer is useful in assessing the typical and worst-case power (or current) swings that occur between successive cycle windows in a given workload execution. Such a characterization of workloads at the early stage of microarchitecture definition helps pinpoint potential inductive noise problems on the voltage rail that can be addressed by designing an appropriate package or by suitably tuning the dynamic power management controls within the processor.
During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.