Parameter variations have become a dominant challenge in microprocessor design. Voltage variation is especially daunting because it happens so rapidly. We measure and characterize voltage variation in a running Intel Core2 Duo processor. By sensing on-die voltage as the processor runs single-threaded, multi-threaded, and multi-program workloads, we determine the average supply voltage swing of the processor to be only 4%, far from the processor’s 14% worst-case operating voltage margin. While such large margins guarantee correctness, they penalize performance and power efficiency. We investigate and quantify the benefits of designing a processor for typical-case (rather than worst-case) voltage swings, assuming that a fail-safe mechanism protects it from infrequently occurring large voltage fluctuations. With today’s processors, such resilient designs could yield 15% to 20% performance improvements. But we also show that in future systems, these gains could be lost as increasing voltage swings intensify the frequency of fail-safe recoveries. After characterizing microarchitectural activity that leads to voltage swings within multi-core systems, we show that a voltage-noise-aware thread scheduler in software can co-schedule phases of different programs to mitigate error recovery overheads in future resilient processor designs.
Flapping-wing mechanisms inspired by biological insects have the potential to enable a new class of small, highly maneuverable aerial robots with hovering capabilities. In order for such devices to operate without an external power source, it is necessary to address a complex system design challenge: the integration of all of the required components on board the robot. This paper discusses the flight energetics of flapping-wing robotic insects with the goal of selecting design parameters that enable power autonomy and maximize flight time. The subsystems of the robot are analyzed both from a broad perspective and using a detailed set of models for a piezoelectrically driven two-wing design. The models are used to perform a system-level optimization for the maximum flight time permitted by current technology, compare the resulting robot configurations to biological insects across several key metrics, and discuss the effect of performance gains in various subsystems of the robot.
We propose and apply a new simulation paradigm for microarchitectural design evaluation and optimization. This paradigm enables more comprehensive design studies by combining spatial sampling and statistical inference. Specifically, this paradigm (i) defines a large, comprehensive design space, (ii) samples points from the space for simulation, and (iii) constructs regression models based on sparse simulations. This approach greatly improves the computational efficiency of microarchitectural simulation and enables new capabilities in design space exploration.
We illustrate new capabilities in three case studies for a large design space of approximately 260,000 points: (i) Pareto frontier, (ii) pipeline depth, and (iii) multiprocessor heterogeneity analyses. In particular, regression models are exhaustively evaluated to identify Pareto optimal designs that maximize performance for given power budgets. These models enable pipeline depth studies in which all parameters vary simultaneously with depth, thereby more effectively revealing interactions with nondepth parameters. Heterogeneity analysis combines regression-based optimization with clustering heuristics to identify efficient design compromises between similar optimal architectures. These compromises are potential core designs in a heterogeneous multicore architecture. Increasing heterogeneity can improve bips3/w efficiency by as much as 2.4×, a theoretical upper bound on heterogeneity benefits that neglects contention between shared resources as well as design complexity. Collectively these studies demonstrate regression models' ability to expose trends and identify optima in diverse design regions, motivating the application of such models in statistical inference for more effective use of modern simulator infrastructure.
In recent years, circuit reliability in modern high-performance processors has become increasingly important. Shrinking feature sizes and diminishing supply voltages have made circuits more sensitive to microprocessor supply voltage ﬂuctuations. These ﬂuctuations result from the natural variation of processor activity as workloads execute, but when left unattended, these voltage ﬂuctuations can lead to timing violations or even transistor lifetime issues. In this paper, we present a hardware-software collaborative approach to mitigate voltage ﬂuctuations. A checkpoint-recovery mechanism rectiﬁes errors when voltage violates maximum tolerance settings, while a run-time software layer reschedules the program’s instruction stream to prevent recurring violations at the same program location. The run-time layer, combined with the proposed code rescheduling algorithm, removes 60% of all violations with minimal overhead, thereby signiﬁcantly improving overall performance. Our solution is a radical departure from the ongoing industry standard approach to circumvent the issue altogether by optimizing for the worst case voltage ﬂux, which compromises power and performance eﬃciency severely, especially looking ahead to future technology generations. Existing conservative approaches will have severe implications on the ability to deliver eﬃcient microprocessors. The proposed technique reassembles a traditional reliability problem as a runtime performance optimization problem, thus allowing us to design processors for typical case operation by building intelligent algorithms that can prevent recurring violations.
Recent research has shown the potential benefits of subthreshold or near-threshold operation, which gives up a substantial degree of speed in order to reduce energy per operation. This is an excellent trade-off for many tasks, such as cyberphysical systems. This prolegomenon summarizes the benefits and challenges of subthreshold or near-threshold operation.
Recent years have seen an increased interest in Micro Air Vehicles (MAVs) with applications ranging from search-and-rescue to mimicking insect behavior. MAVs have several challenging design requirements that impact processor design. These include real time processing demands and severe power/weight budgets. In this paper, we describe the characteristics of MAV applications and propose hardware acceleration to improve the power, performance, and portability of MAV system designs.
A circuit having dynamically controllable power. The circuit comprises a plurality of pipelined stages, each of the pipelined stages comprising two clocking domains, a plurality of switching circuits, each switching circuit being connected to one of the pipelined stages, first and second power sources connected to each of the plurality of pipelined stages through the switching circuits, the first power source supplying a first voltage and the second power source supplying a second voltage, wherein the first and second power sources each may be applied to a pipelined stage independently of other pipelined stages, first and second complementary clocks, and a plurality of latches connected to the first and second complementary clocks and to the plurality of pipelined stages for proving latch-based clocking to control the first and second clocking domains and to enable time-borrowing across the plurality of switching circuits. The first voltage differs from the second voltage and the plurality of pipelined stages interpolates between the first and second voltages to provide differing effective voltages between the first and second voltages.
Hardware acceleration can increase performance and reduce energy consumption. To maximize these beneﬁts, accelerator- based systems that emphasize computation on accelerators (rather than on general purpose cores) should be used. We introduce the “accelerator store,” a structure for sharing memory between accelerators in these accelerator-based systems. The accelerator store simpliﬁes accelerator I/O and reduces area by mapping memory to accelerators when needed at runtime. Preliminary results demonstrate a 30% system area reduction with no energy overhead and less than 1% performance overhead in contrast to conventional DMA schemes.
Shrinking feature size and diminishing supply voltage are making circuits more sensitive to supply voltage fluctuations within a microprocessor. If left unattended, voltage fluctuations can lead to timing violations or even transistor lifetime issues. A mechanism that dynamically learns to predict dangerous voltage fluctuations based on program and microarchitectural events can help steer the processor clear of danger.