Publications by Type: Miscellaneous

2019
Lillian Pentecost, Udit Gupta, Elisa Ngan, Gu Wei, David Brooks, Johanna Beyer, and Michael Behrisch. 11/17/2019. “CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance.” ProTools workshop co-located with Supercomputing. Publisher's VersionAbstract
Performance analysis and optimization are essential tasks for hardware and software engineers. In the age of datacenter-scale computing, it is particularly important to conduct comparative performance analysis to understand discrepancies and limitations among different hardware systems and applications. However, there is a distinct lack of productive visualization tools for these comparisons. We present CHAMPVis [1], a web-based, interactive visualization tool that leverages the hierarchical organization of hardware systems to enable productive performance analysis. With CHAMPVis, users can make definitive performance comparisons across applications or hardware platforms. In addition, CHAMPVis provides methods to rank and cluster based on performance metrics to identify common optimization opportunities. Our thorough task analysis reveals three types of datacenter-scale performance analysis tasks: summarization, detailed comparative analysis, and interactive performance bottleneck identification. We propose techniques for each class of tasks including (1) 1-D feature space projection for similarity analysis; (2) Hierarchical parallel coordinates for comparative analysis; and (3) User interactions for rapid diagnostic queries to identify optimization targets. We evaluate CHAMPVis by analyzing standard datacenter applications and machine learning benchmarks in two different case studies.
CHAMPVis: Comparative Hierarchical Analysis of Microarchitectural Performance
Gu Wei, Tao Tong, David Brooks, and Sae Lee. 2019. “Capacitor reconfiguration of a single-input, multi-output, switched-capacitor converter”.
2017
Gu Wei and David Brooks. 2017. “An SoC Platform Architecture for Mini Autonomous Drones.” National Science Foundation Award Abstract # 1551044. Publisher's Version An SoC Platform Architecture for Mini Autonomous Drones
2016
Michael Karpelson, Wood J, and Gu Wei. 2/9/2016. “System and method for efficient drive of capacitive actuators with voltage amplification”.Abstract
A circuit for driving a plurality of capacitive actuators, the circuit having a low-voltage side, a high voltage side and a flyback transformer between the two. The low-voltage side comprises first and second pairs of low-side switches connected in series across an input voltage. The flyback transformer has a primary winding connected to the two pairs of switches. The high-voltage side has a pair of switches connected between the secondary winding of the flyback transformer and a ground and a plurality of capacitive loads and bidirectional switches to connect the loads to the secondary winding of the flyback transformer and a ground.
System and method for efficient drive of capacitive actuators with voltage amplification
Gu Wei, David Brooks, Simone Campanoni, Kevin Brownell, and Svilen Kanev. 2016. “Methods and apparatus for parallel processing”.
2015
Vijay Reddi, Meeta Gupta, Glenn Holloway, Gu Wei, Michael Smith, and David Brooks. 2015. “Adaptive event-guided system and method for avoiding voltage emergencies”.Abstract
In a preferred embodiment, the present invention is a system for avoiding voltage emergencies. The system comprises a microprocessor, an actuator for throttling the microprocessor, a voltage emergency detector and a voltage emergency predictor. The voltage emergency detector may comprise, for example, a checkpoint recovery mechanism or a sensor. The voltage emergency predictor of a preferred embodiment comprises means for tracking control flow instructions and microarchitectural events, means for storing voltage emergency signatures that cause voltage emergencies, means for comparing current control flow and microarchitectural events with stored voltage emergency signatures to predict voltage emergencies, and means for actuating said actuator to throttle said microprocessor to avoid predicted voltage emergencies.
Adaptive event-guided system and method for avoiding voltage emergencies
2014
Pradip Bose, David Brooks, Subhasish Mitra, Karthick Rajamani, Mircea Stan, Kevin Skadron, and Gu Wei. 4/1/2014. “Cross-Layer Modeling Framework for Energy-Efficient Resilience”. Publisher's VersionAbstract

We describe a novel cross-layer, resilience focused integrated modeling framework. This is targeted to help define ultra energy-efficient embedded systems in the post-14nm CMOS design era, without compromising system-level resilience. The targeted application domain is represented by the suite of applications and kernels announced as part of the ongoing PERFECT program sponsored by DARPA MTO.

Cross-Layer Modeling Framework for Energy-Efficient Resilience
2006
Yingmin Li, Kevin Skadron, Benjamin Lee, and David Brooks. 2006. “Quantifying Latency and Throughput Compromises in CMP Design”. Publisher's VersionAbstract

Designers of chip multiprocessors will increasingly be called upon to optimize for a combination of design metrics under a variety of design constraints. The adoption of chip multiprocessors has also led to a shift in design metrics toward aggregate throughput and away from single thread latency. We examine the compromises between latency and throughput under various power, thermal, area, and bandwidth constraints to quantify the latency penalties of a purely throughput optimized design. We consider a large chip multiprocessor design space that includes core count, core complexity (pipeline dimensions, in-order versus out-of-order execution), and cache hierarchy sizes. We demonstrate an approach to effectively assess trade-offs given a comprehensive core model, a set of optimization criteria, and a set of design constraints. We perform a number of case studies to evaluate these trade-offs, exposing significant single thread latency penalties when optimizing solely for throughput and neglecting other measures of performance. As single thread latency continues to be one of several design metrics, any choice to compromise latency should be well understood before implementation. Collectively, our results suggest single thread latency is still a design metric of importance given that optimizing throughput alone will significantly compromise latency. Furthermore, the case for simple, in-order cores should be taken with caution given this balanced view of performance.

Quantifying Latency and Throughput Compromises in CMP Design
2004
David Brooks. 2004. “Integrated Architectural Level Power-Performance Modeling Toolkit”.Abstract
We are currently developing a robust, integrated infrastructure for studying power-performance issues across a range of systems.  By leveraging a common ISA and shared simulation infrastructure, we will be able to perform apples-to-apples comparisons between processors intended for specific design spaces.  For example, recently there has been significant attention brought to the idea of reusing microprocessor cores in multiple design spaces.  In particular, there has been much interest in exploring the possibility of using multiple low-power, embedded processors in blade systems or SMP-on-a-chip designs for server workloads.  There has also been interest in taking server-class microprocessors and bringing them into use in lower-end systems.  For example, the processor core of the original POWER4 microprocessor has recently been introduced as the PowerPC970 -- a 64-bit microprocessor for use in blade servers and desktop (and potentially laptop) systems. We utilize the MET/Turandot toolkit originally developed at IBM TJ Watson Research Center as the underlying PowerPC microarchitecture performance simulator [3].  Turandot is flexible enough to model a broad range of microarchitectures and has undergone extensive validation [3].  In addition,  Turandot has been augmented with power models to explore power-performance tradeoffs in an internal IBM tool called PowerTimer [4]. Turandot is freely available to the research community through licensing arrangements with IBM, and we are currently working with IBM to develop an external, public release of PowerTimer. 
Integrated Architectural Level Power-Performance Modeling Toolkit