Performance analysis and optimization are essential tasks for hardware and software engineers. In the age of datacenter-scale computing, it is particularly important to conduct comparative performance analysis to understand discrepancies and limitations among different hardware systems and applications. However, there is a distinct lack of productive visualization tools for these comparisons. We present CHAMPVis , a web-based, interactive visualization tool that leverages the hierarchical organization of hardware systems to enable productive performance analysis. With CHAMPVis, users can make definitive performance comparisons across applications or hardware platforms. In addition, CHAMPVis provides methods to rank and cluster based on performance metrics to identify common optimization opportunities. Our thorough task analysis reveals three types of datacenter-scale performance analysis tasks: summarization, detailed comparative analysis, and interactive performance bottleneck identification. We propose techniques for each class of tasks including (1) 1-D feature space projection for similarity analysis; (2) Hierarchical parallel coordinates for comparative analysis; and (3) User interactions for rapid diagnostic queries to identify optimization targets. We evaluate CHAMPVis by analyzing standard datacenter applications and machine learning benchmarks in two different case studies.
A circuit for driving a plurality of capacitive actuators, the circuit having a low-voltage side, a high voltage side and a flyback transformer between the two. The low-voltage side comprises first and second pairs of low-side switches connected in series across an input voltage. The flyback transformer has a primary winding connected to the two pairs of switches. The high-voltage side has a pair of switches connected between the secondary winding of the flyback transformer and a ground and a plurality of capacitive loads and bidirectional switches to connect the loads to the secondary winding of the flyback transformer and a ground.
In a preferred embodiment, the present invention is a system for avoiding voltage emergencies. The system comprises a microprocessor, an actuator for throttling the microprocessor, a voltage emergency detector and a voltage emergency predictor. The voltage emergency detector may comprise, for example, a checkpoint recovery mechanism or a sensor. The voltage emergency predictor of a preferred embodiment comprises means for tracking control flow instructions and microarchitectural events, means for storing voltage emergency signatures that cause voltage emergencies, means for comparing current control flow and microarchitectural events with stored voltage emergency signatures to predict voltage emergencies, and means for actuating said actuator to throttle said microprocessor to avoid predicted voltage emergencies.
We describe a novel cross-layer, resilience focused integrated modeling framework. This is targeted to help define ultra energy-efficient embedded systems in the post-14nm CMOS design era, without compromising system-level resilience. The targeted application domain is represented by the suite of applications and kernels announced as part of the ongoing PERFECT program sponsored by DARPA MTO.
Designers of chip multiprocessors will increasingly be called upon to optimize for a combination of design metrics under a variety of design constraints. The adoption of chip multiprocessors has also led to a shift in design metrics toward aggregate throughput and away from single thread latency. We examine the compromises between latency and throughput under various power, thermal, area, and bandwidth constraints to quantify the latency penalties of a purely throughput optimized design. We consider a large chip multiprocessor design space that includes core count, core complexity (pipeline dimensions, in-order versus out-of-order execution), and cache hierarchy sizes. We demonstrate an approach to effectively assess trade-offs given a comprehensive core model, a set of optimization criteria, and a set of design constraints. We perform a number of case studies to evaluate these trade-offs, exposing significant single thread latency penalties when optimizing solely for throughput and neglecting other measures of performance. As single thread latency continues to be one of several design metrics, any choice to compromise latency should be well understood before implementation. Collectively, our results suggest single thread latency is still a design metric of importance given that optimizing throughput alone will significantly compromise latency. Furthermore, the case for simple, in-order cores should be taken with caution given this balanced view of performance.
We are currently developing a robust, integrated infrastructure for studying power-performance issues across a range of systems. By leveraging a common ISA and shared simulation infrastructure, we will be able to perform apples-to-apples comparisons between processors intended for specific design spaces. For example, recently there has been significant attention brought to the idea of reusing microprocessor cores in multiple design spaces. In particular, there has been much interest in exploring the possibility of using multiple low-power, embedded processors in blade systems or SMP-on-a-chip designs for server workloads. There has also been interest in taking server-class microprocessors and bringing them into use in lower-end systems. For example, the processor core of the original POWER4 microprocessor has recently been introduced as the PowerPC970 -- a 64-bit microprocessor for use in blade servers and desktop (and potentially laptop) systems. We utilize the MET/Turandot toolkit originally developed at IBM TJ Watson Research Center as the underlying PowerPC microarchitecture performance simulator . Turandot is flexible enough to model a broad range of microarchitectures and has undergone extensive validation . In addition, Turandot has been augmented with power models to explore power-performance tradeoffs in an internal IBM tool called PowerTimer . Turandot is freely available to the research community through licensing arrangements with IBM, and we are currently working with IBM to develop an external, public release of PowerTimer.