Publications by Author: Meeta Gupta

2015
Vijay Reddi, Meeta Gupta, Glenn Holloway, Gu Wei, Michael Smith, and David Brooks. 2015. “Adaptive event-guided system and method for avoiding voltage emergencies”.Abstract
In a preferred embodiment, the present invention is a system for avoiding voltage emergencies. The system comprises a microprocessor, an actuator for throttling the microprocessor, a voltage emergency detector and a voltage emergency predictor. The voltage emergency detector may comprise, for example, a checkpoint recovery mechanism or a sensor. The voltage emergency predictor of a preferred embodiment comprises means for tracking control flow instructions and microarchitectural events, means for storing voltage emergency signatures that cause voltage emergencies, means for comparing current control flow and microarchitectural events with stored voltage emergency signatures to predict voltage emergencies, and means for actuating said actuator to throttle said microprocessor to avoid predicted voltage emergencies.
Adaptive event-guided system and method for avoiding voltage emergencies
2010
Vijay Reddi, Simone Campanoni, Meeta Gupta, Michael Smith, Gu Wei, David Brooks, and Kim Hazelwood. 9/2010. “Eliminating voltage emergencies via software-guided code transformations.” ACM Transactions on Architecture and Code Optimization (TACO), 7, 2, Pp. 1-28. Publisher's VersionAbstract
In recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage fluctuations. These fluctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage fluctuations can lead to timing violations or even transistor lifetime issues. In this paper, we present a hardware-software collaborative approach to mitigate voltage fluctuations. A checkpoint-recovery mechanism rectifies errors when voltage violates maximum tolerance settings, while a run-time software layer reschedules the program’s instruction stream to prevent recurring violations at the same program location. The run-time layer, combined with the proposed code rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby significantly improving overall performance. Our solution is a radical departure from the ongoing industry standard approach to circumvent the issue altogether by optimizing for the worst case voltage flux, which compromises power and performance efficiency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver efficient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.
Eliminating voltage emergencies via software-guided code transformations
Vijay Reddi, Meeta Gupta, Glenn Holloway, Michael Smith, Gu Wei, and David Brooks. 1/2010. “Predicting voltage droops using recurring program and microarchitectural event activity.” IEEE Micro, 30, 1. Publisher's VersionAbstract
Shrinking feature size and diminishing supply voltage are making circuits more sensitive to supply voltage fluctuations within a microprocessor. If left unattended, voltage fluctuations can lead to timing violations or even transistor lifetime issues. A mechanism that dynamically learns to predict dangerous voltage fluctuations based on program and microarchitectural events can help steer the processor clear of danger.
Predicting voltage droops using recurring program and microarchitectural event activity
2009
Meeta Gupta, Jude Rivers, Pradip Bose, Gu Wei, and David Brooks. 12/12/2009. “Tribeca: design for PVT variations with local recovery and fine-grained adaptation.” In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Pp. 435–446. New York, NY, USA: IEEE. Publisher's VersionAbstract

With continued advances in CMOS technology, parameter variations are emerging as a major design challenge. Irregularities during the fabrication of a microprocessor and variations of voltage and temperature during its operation widen worst-case timing margins of the design - degrading performance significantly. Because runtime variations like supply voltage droops and temperature fluctuations depend on the activity signature of the processor's workload, there are several opportunities to improve performance by dynamically adapting margins. This paper explores the power-performance efficiency gains that result from designing for typical conditions while dynamically tuning frequency and voltage to accommodate the runtime behavior of workloads. Such a design depends on a fail-safe mechanism that allows it to protect against margin violations during adaptation; we evaluate several such mechanisms, and we propose a local recovery scheme that exploits spatial variation among the units of the processor. While a processor designed for worst-case conditions might only be capable of a frequency that is 75% of an ideal processor with no parameter variations, we show that a fine-grained global frequency tuning mechanism improves power-performance efficiency (BIPS 3 /W) by 40% while operating at 91% of an ideal processor's frequency. Moreover, a per-unit voltage tuning mechanism aims to reduce the effect of within-die spatial variations to provide a 55% increase in power-performance efficiency. The benefits reported are clearly substantial in light of the <1% area overhead relative to existing global recovery mechanisms.

Tribeca: design for PVT variations with local recovery and fine-grained adaptation
Vijay Reddi, Meeta Gupta, Michael Smith, Gu Wei, David Brooks, and Simone Campanoni. 7/26/2009. “Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack.” In Proceedings of the 46th Annual Design Automation Conference, Pp. 788–793. San Francisco, CA. Publisher's VersionAbstract
Power constrained designs are becoming increasingly sensitive to supply voltage noise. We propose a hardware-software collaborative approach to enable aggressive operating margins: a checkpoint-recovery mechanism corrects margin violations, while a run-time software layer reschedules the program's instruction stream to prevent recurring margin crossings at the same program location. The run-time layer removes 60% of these events with minimal overhead, thereby significantly improving overall performance.
Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack
Vijay Reddi, Meeta Gupta, Michael Smith, Gu Wei, David Brooks, and Simone Campanoni. 7/26/2009. “Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack.” In 2009 46th ACM/IEEE Design Automation Conference, Pp. 788–793. San Francisco, CA: IEEE. Publisher's VersionAbstract
Power constrained designs are becoming increasingly sensitive to supply voltage noise. We propose a hardware-software collaborative approach to enable aggressive operating margins: a checkpoint-recovery mechanism corrects margin violations, while a run-time software layer reschedules the program's instruction stream to prevent recurring margin crossings at the same program location. The run-time layer removes 60% of these events with minimal overhead, thereby significantly improving overall performance.
Software-assisted hardware reliability: abstracting circuit-level challenges to the software stack
Meeta Gupta, Vijay Reddi, Glenn Holloway, Gu Wei, and David Brooks. 4/20/2009. “An event-guided approach to reducing voltage noise in processors.” In Design, Automation &amp; Test in Europe Conference &amp; Exhibition, 4/20/2009. DATE'09., Pp. 160–165. Nice, France: IEEE. Publisher's Version An event-guided approach to reducing voltage noise in processors
Vijay Reddi, Meeta Gupta, Glenn Holloway, Gu Wei, Michael Smith, and David Brooks. 2/14/2009. “Voltage emergency prediction: Using signatures to reduce operating margins.” In 2009 IEEE 15th International Symposium on High Performance Computer Architecture, Pp. 18–29. Raleigh, NC, USA: IEEE. Publisher's VersionAbstract
Inductive noise forces microprocessor designers to sacrifice performance in order to ensure correct and reliable operation of their designs. The possibility of wide fluctuations in supply voltage means that timing margins throughout the processor must be set pessimistically to protect against worst-case droops and surges. While sensor-based reactive schemes have been proposed to deal with voltage noise, inherent sensor delays limit their effectiveness. Instead, this paper describes a voltage emergency predictor that learns the signatures of voltage emergencies (the combinations of control flow and microarchitectural events leading up to them) and uses these signatures to prevent recurrence of the corresponding emergencies. In simulations of a representative superscalar microprocessor in which fluctuations beyond 4% of nominal voltage are treated as emergencies (an aggressive configuration), these signatures can pinpoint the likelihood of an emergency some 16 cycles ahead of time with 90% accuracy. This lead time allows machines to operate with much tighter voltage margins (4% instead of 13%) and up to 13.5% higher performance, which closely approaches the 14.2% performance improvement possible with an ideal oracle-based predictor.
Voltage emergency prediction: Using signatures to reduce operating margins
Vijay Reddi, Meeta Gupta, Glenn Holloway, Michael Smith, Gu-Yeon Wei, and David Brooks. 2/14/2009. “Voltage emergency prediction: Using Signatures to Reduce Operating Margins.” In 2009 IEEE 15th International Symposium on High Performance Computer Architecture. Publisher's VersionAbstract

Inductive noise forces microprocessor designers to sacrifice performance in order to ensure correct and reliable operation of their designs. The possibility of wide fluctuations in supply voltage means that timing margins throughout the processor must be set pessimistically to protect against worst-case droops and surges. While sensor-based reactive schemes have been proposed to deal with voltage noise, inherent sensor delays limit their effectiveness. Instead, this paper describes a voltage emergency predictor that learns the signatures of voltage emergencies (the combinations of control flow and microarchitectural events leading up to them) and uses these signatures to prevent recurrence of the corresponding emergencies. In simulations of a representative superscalar microprocessor in which fluctuations beyond 4% of nominal voltage are treated as emergencies (an aggressive configuration), these signatures can pinpoint the likelihood of an emergency some 16 cycles ahead of time with 90% accuracy. This lead time allows machines to operate with much tighter voltage margins (4% instead of 13%) and up to 13.5% higher performance, which closely approaches the 14.2% performance improvement possible with an ideal oracle-based predictor.

Voltage emergency prediction: Using Signatures to Reduce Operating Margins
Vijay Reddi, Meeta Gupta, Krishna Rangan, Simone Campanoni, Glenn Holloway, Michael Smith, Gu Wei, and David Brooks. 1/2009. “Voltage noise: Why it’s bad, and what to do about it.” 5th IEEE Workshop on Silicon Errors in Logic-System Effects (SELSE), Palo Alto, CA.Abstract
Power constrained designs are becoming increasingly sensitive to supply voltage noise. We propose hardware-software collaboration to enable aggressive voltage margins: a fail-safe hardware mechanism tolerates margin violations in order to train a run-time software layer that reschedules instructions to avoid recurring violations. Additionally, the software controls an emergency signature-based predictor that throttles to suppress emergencies that code rescheduling cannot eliminate.
Voltage noise: Why it’s bad, and what to do about it
2008
Meeta Gupta, Krishna Rangan, Michael Smith, Gu Wei, and David Brooks. 2/16/2008. “DeCoR: A delayed commit and rollback mechanism for handling inductive noise in processors.” In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, Pp. 381–392. IEEE. Publisher's VersionAbstract
Increases in peak current draw and reductions in the operating voltage of processors stress the importance of dealing with voltage fluctuations in processors. Noise-margin violations lead to undesired effects, like timing violations, which may result in incorrect execution of applications. Several recent architectural solutions for inductive noise have been proposed that, unfortunately, have a strong correlation to the underlying power-delivery package model and require a feedback loop that is largely constrained by the voltage/current sensor characteristics. The resulting solutions are not robust across a wide range of microprocessor designs and packaging technologies. This paper proposes a Delayed-commit and rollback scheme (DeCoR) that guarantees correctness, insensitive to the package model or the responsiveness of the voltage sensors. In particular, our approach recovers from, rather than attempting to avoid, voltage emergencies. This approach incurs a small performance penalty when compared to an ideal machine that does not have voltage emergencies. We show that explicit checkpoint-recovery schemes, intended to handle infrequent events, e.g., radiation-induced soft errors, suffer from large performance overheads for frequently-occurring voltage emergencies. DeCoR requires very few modifications to modern processor designs, as it leverages the existing store queue and reorder buffers. Unlike conventional designs that conservatively protect all components of the processor from inductive noise with overly-large timing margins, our approach only requires conservative protection of the architected register state and cache write paths.
DeCoR: A delayed commit and rollback mechanism for handling inductive noise in processors
Wonyoung Kim, Meeta Gupta, Gu Wei, and David Brooks. 2/16/2008. “System level analysis of fast, per-core DVFS using on-chip switching regulators.” In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, Pp. 123–134. Salt Lake City, UT, USA: Ieee. Publisher's VersionAbstract
Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce energy in digital systems, but the effectiveness of DVFS is hampered by slow voltage transitions that occur on the order of tens of microseconds. In addition, the recent trend towards chip-multiprocessors (CMP) executing multi-threaded workloads with heterogeneous behavior motivates the need for per-core DVFS control mechanisms. Voltage regulators that are integrated onto the same chip as the microprocessor core provide the benefit of both nanosecond-scale voltage switching and per-core voltage control. We show that these characteristics provide significant energy-saving opportunities compared to traditional off-chip regulators. However, the implementation of on-chip regulators presents many challenges including regulator efficiency and output voltage transient characteristics, which are significantly impacted by the system-level application of the regulator. In this paper, we describe and model these costs, and perform a comprehensive analysis of a CMP system with on-chip integrated regulators. We conclude that on-chip regulators can significantly improve DVFS effectiveness and lead to overall system energy savings in a CMP, but architects must carefully account for overheads and costs when designing next-generation DVFS systems and algorithms.
System level analysis of fast, per-core DVFS using on-chip switching regulators
2007
Meeta Gupta, Krishna Rangan, Michael Smith, Gu Wei, and David Brooks. 8/27/2007. “Towards a software approach to mitigate voltage emergencies.” In Low Power Electronics and Design (ISLPED), 2007 ACM/IEEE International Symposium on, Pp. 123–128. IEEE. Publisher's VersionAbstract
Increases in peak current draw and reductions in the operating voltages of processors continue to amplify the importance of dealing with voltage fluctuations in processors. One approach suggested has been to not only react to these fluctuations but also attempt to eliminate future occurrences of these fluctuations by dynamically modifying the executing program. This paper investigates the potential of a very simple dynamic scheme to appreciably reduce the number of run-time voltage emergencies. It shows that we can map many of the voltage emergencies in the execution of the SPEC benchmarks on an aggressive superscalar design to a few static loops, categorize the microarchitectural cause of the emergencies in each important loop through simple observations and a simple priority function, and finally apply straight forward software optimization strategies to mitigate up to 70% of the future voltage swings.
Towards a software approach to mitigate voltage emergencies
Meeta Gupta, Jarod Oatley, Russ Joseph, Gu Wei, and David Brooks. 4/16/2007. “Understanding voltage variations in chip multiprocessors using a distributed power-delivery network.” In Design, Automation &amp; Test in Europe Conference &amp; Exhibition, 4/16/2007. DATE'07, Pp. 1–6. Nice, France: IEEE. Publisher's VersionAbstract
Recent efforts to address microprocessor power dissipation through aggressive supply voltage scaling and power management require that designers be increasingly cognizant of power supply variations. These variations, primarily due to fast changes in supply current, can be attributed to architectural gating events that reduce power dissipation. In order to study this problem, the authors propose a fine-grain, parameterizable model for power-delivery networks that allows system designers to study localized, on-chip supply fluctuations in high-performance microprocessors. Using this model, the authors analyze voltage variations in the context of next-generation chip-multiprocessor (CMP) architectures using both real applications and synthetic current traces. They find that the activity of distinct cores in CMPs present several new design challenges when considering power supply noise, and they describe potentially problematic activity sequences that are unique to CMP architectures
Understanding voltage variations in chip multiprocessors using a distributed power-delivery network
Wonyoung Kim, Meeta Gupta, Gu-Wei, and David Brooks. 2007. “Enabling on-chip switching regulators for multi-core processors using current staggering.” Proceedings of the Work. on Architectural Support for Gigascale Integration.Abstract

Portable, embedded systems place ever-increasing demands on high-performance, low-power microprocessor design. Dynamic voltage and frequency scaling (DVFS) is a wellknown technique to reduce energy in portable systems, but DVFS effectiveness suffers from the fact that voltage transitions occur on the order of tens of microseconds. Voltage regulators that are integrated on the same chip as the microprocessor core provide the benefit of both nanosecond-scale voltage switching and improved power delivery. However, the implementation of on-chip regulators presents many challenges including regulator efficiency and output voltage transient characteristics. In this paper, we discuss architectural support for on-chip regulator designs. Specifically, we show that in a chip-multiprocessor system, current staggering can be employed by restricting the simultaneous enabling/disabling of cores due to clock gating. We discuss tradeoffs between current staggering and regulator circuit design parameters, and we show that regulation efficiency of greater than 80% is possible for a variety of multi-threaded applications.