Chip Gallery

Chip prototyping provides several important benefits for our research. Silicon implementation provides the opportunity to learn about power and variability issues with real measurements in ways that simulations alone cannot provide, and our chip prototypes allow us to more convincingly demonstrate the benefits of our proposed approaches. In addition, the design process instills an appreciation of complexity, testing, and validation issues encountered when creating real hardware. Our group has designed prototype chips for several projects.

We thank IBM, TSMC, SRC, and UMC for fabrication support for these projects.

16nm always-on processor for IoT DNN inference tasks

16nm always-on processor for IoT DNN inference tasks featuring:

Calibration-free automatic voltage/frequency scaling via tracking of razor timing error rates.
Multi-cycle banked SRAM scheme to relax the SRAM read cycle time.
Fast-adaptive clocking scheme provides a safety-net to allow for operation with minimal margins.

A 16-nm always-on DNN processor with adaptive clocking and multi-cycle banked SRAMs

25mm2 SoC in 16nm FinFET targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms

25mm2 SoC in 16nm FinFET targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on subsystem, a dual-core Arm A53 CPU cluster, an embedded FPGA array (eFPGA), and a quad-core cache coherent accelerator (CCA) cluster. Measurement results demonstrate the following observations:

Accelerator flexibility-efficiency (GOPS/W) range spans from 3.1x (A53+SIMD), to 16.5x (eFPGA), to 54.5x (CCA) compared to the dual-core CPU baseline on comparable tasks.
Energy per inference on MobileNet-128 CNN shows a peak improvement of 47.6x.

A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators

SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices

16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks

16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks that performs Bayesian inference on probabilistic models mapped onto a 2D Markov Random Field, using MCMC.

Exploiting two degrees of parallelism, it performs Gibbs sampling inference at up to 1380x faster with 1965x less energy than an Arm CortexA53 on the same SoC, and 1.5x faster with 6.3x less energy than an embedded FPGA in the same technology.

A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm

A Scalable Bayesian Inference Accelerator for Unsupervised Learning

16-Core Voltage-Stacked System.

A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC–DC Converter

A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications

Integrated 300Volts drive stage for piezoelectric actuators used in micro-robotic systems

An integrated 300Volts drive stage for piezoelectric actuators used in micro-robotic systems.

Design and analysis of an integrated driver for piezoelectric actuators

Wireless sensor network test chip, ULP-1, .18um IBM CMOS

Wireless sensor network test chip, ULP-1, .18um IBM CMOS. 1st Prize in SRC SoC Design Contest. Implementation of our architecture that appears in Hempstead et al, ISCA 2005.

Voltage Interpolation/Variable Latency FPU test chip, .13um UMC CMOS.

Voltage Interpolation/Variable Latency FPU test chip, .13um UMC CMOS.

Measurement results appear in Liang et al, ISSCC 2008 and provides a companion to our ISCA 2008 architectural studies.

Three level DC-DC converter that demonstrates fast voltage scaling

A three level DC-DC converter that demonstrates fast voltage scaling (on the order of nanoseconds) for fast DVFS. Implemented in UMC 130nm CMOS.

A fully-integrated 3-level DC-DC converter for nanosecond-scale DVFS.

Prototype near-threshold voltage stacking test-chip

Prototype near-threshold voltage stacking test-chip comprising of 3×3 array of power-consuming cores fabricated in MIT Lincoln Lab’s 150nm near-threshold optimized fully-depleted SOI (FDSOI) process.

Evaluation of voltage stacking for near-threshold multicore computing

Fully Integrated Battery-Connected Switched-Capacitor 4:1 Voltage Regulator

A Fully Integrated Battery-Connected Switched-Capacitor 4:1 Voltage Regulator with 70% Peak Efficiency Using Bottom-Plate Charge Recycling

The prototype microrobotic SoC designed for the RoboBee contains a fully integrated high efficiency switched-capacitor voltage regulator, a 32-bit ARM Cortex-M0 general-purpose processor with 128 KB on-chip memories, a programmable voltage-tracking adaptive-frequency clock, and a low-power high-precision frequency reference.

Supply-Noise Resilient Adaptive Clocking for Battery-Powered Aerial Microrobotic System-on-Chip in 40nm CMOS

Programmable DNN Classifier for IoT

Programmable DNN Classifier for IoT featuring:

Parallelism/reuse: 8-way SIMD, 10X data reuse @ 128b/cycle BW.
Small data-types: 8-bit weights, -30% energy.
Sparse activation data: +4X throughput and -4X energy.
Algorithmic resilience: +50% throughput or -30% energy.

A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

DNN ENGINE: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications

Harvard Architecture, Circuits and Compilers

Research group of Prof. David Brooks and Prof. Gu-Yeon Wei

Chip Gallery

7b31d72cd65b3801ac95f03689475737

d116f06a510b609055e4c6771dc22b81