Hardware Accelerators

gem5-Aladdin

Increasing demand for power-efficient, highperformance computing has spurred a growing number and diversity of hardware accelerators in mobile and server Systems on Chip (SoCs). We argue that the co-design of the accelerator microarchitecture with the system in which it belongs is critical to balanced, efficient accelerator microarchitectures. Data movement and coherence management for accelerators are significant yet often unaccounted components of total accelerator runtime, resulting in misleading performance predictions and inefficient accelerator designs. To explore the design space of accelerator-system co-design, we develop gem5-Aladdin, an SoC simulator that captures dynamic interactions between accelerators and the SoC platform, and validate it to within 6% against real hardware. Our co-design studies show that the optimal energy-delay-product (EDP) of an accelerator microarchitecture can improve by up to 7.4× when system-level effects are considered compared to optimizing accelerators in isolation..

gem5-Aladdin has been released! To download the source code, click here.

People: Sophia Shao and Sam Xi

Publications

    • Yakun Sophia Shao, Sam (Likun) Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, David Brooks (2016): Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin. In: International Symposium on Microarchitecture (MICRO), 2016.

aladdin_full

Customized architectures composed of CPUs, GPUs, and accelerators can be seen in mobile systems and are beginning to emerge in servers and desktops. We envision the integration of more of these hardware accelerators to address the dark silicon problem and support further improvements in computing performance. However, current systems that pull together hard IP blocks into a single integrated substrate—simply continuing the multi-decade trend of SoC integration—cannot leverage higher-level coordination and optimization between traditional general-purpose cores, accelerators, and shared resources such as cache hierarchies and on-chip networks. Given the importance of hardware acceleration in future systems, there is a clear need for a design methodology that facilitates broad design space exploration of next-generation customized architectures.

To address this need, we developed Aladdin, a pre-RTL, power-performance simulator designed to enable rapid design space search of accelerator-centric SoCs. This framework takes high-level language descriptions of algorithms as inputs, and uses dynamic data dependence graphs (DDDG) as a representation of an accelerator without having to generate low-level RTL. Starting with an unconstrained program DDDG, which corresponds to an initial representation of accelerator hardware, Aladdin applies optimizations as well as constraints to the graph to create a realistic model of accelerator activity with validated models of performance, power, and area (with errors within 7%). We can then combine this high-level representation of accelerators with existing architectural simulators to architect efficient systems that consider and balance system-wide constraints. Aladdin uncovers significant, high-level, design trade-offs by including the impact of the system’s memory hierarchy.

To download Aladdin, click here.

People: Sophia Shao and Brandon Reagen

Publications

    • Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, David Brooks (2014): Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. In: International Symposium on Computer Architecture (ISCA), 2014.
    • Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, David Brooks (2013): Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware. In: International Symposium on Low Power Electronics and Design (ISLPED), 2013.

MachSuite

Recent high-level synthesis and accelerator-related architecture papers show a great disparity in workload selection among projects and research groups. To provide standardization within the accelerator research community, we present MachSuite, a benchmark suite for high-level synthesis tools and accelerator-centric architectures. MachSuite is the compilation of carefully selected workloads to cover a diverse application space and algorithm choices. All the benchmarks in MachSuite are implemented to be well suited for high-level synthesis. A thorough characterization further demonstrates the diverse behaviors among benchmarks, representative of different customization challenges. MachSuite enables commensurability across research projects while mitigating the burden of accelerator implementation and workload selection.

MachSuite source code is available on GitHub.

People: Brandon Reagen, Robert Adolf, Sophia Shao

Publications:

    • Brandon Reagen, Robert Adolf, Sophia Yakun Shao, Gu-Yeon Wei, David Brooks (2014): MachSuite: Benchmarks for Accelerator Design and Customized Architectures. In: IEEE International Symposium on Workload Characterization (IISWC), 2014.

Other projects

WIICA: An ISA-independent workload characterization tool for accelerators.

Die photo analysis: Analysis of commercial SoC die photos to gain insights into their architecture.