Chip Gallery

Chip prototyping provides several important benefits for our research. Silicon implementation provides the opportunity to learn about power and variability issues with real measurements in ways that simulations alone cannot provide, and our chip prototypes allow us to more convincingly demonstrate the benefits of our proposed approaches. In addition, the design process instills an appreciation of complexity, testing, and validation issues encountered when creating real hardware. Our group has designed prototype chips for several projects.

We thank IBM, TSMC, SRC, and UMC for fabrication support for these projects.

16nm always-on processor for IoT DNN inference tasks

16nm always-on processor for IoT DNN inference tasks featuring:  

  • Calibration-free automatic voltage/frequency scaling via tracking of razor timing error rates.
  • Multi-cycle banked SRAM scheme to relax the SRAM read cycle time.
  • Fast-adaptive clocking scheme provides a safety-net to allow for operation with minimal margins.

A 16-nm always-on DNN processor with adaptive clocking and multi-cycle banked SRAMs

25mm2 SoC in 16nm FinFET targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms

25mm2 SoC in 16nm FinFET targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on subsystem,  a dual-core Arm A53 CPU cluster, an embedded FPGA array (eFPGA), and a quad-core cache coherent accelerator (CCA) cluster. Measurement results demonstrate the following observations:

  • Accelerator flexibility-efficiency (GOPS/W) range spans from 3.1x (A53+SIMD), to 16.5x (eFPGA), to 54.5x (CCA) compared to the dual-core CPU baseline on comparable tasks.  
  • Energy per inference on MobileNet-128 CNN shows a peak improvement of 47.6x.

A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators

SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices

16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks

16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks that performs Bayesian inference on probabilistic models mapped onto a 2D Markov Random Field, using MCMC.

Exploiting two degrees of parallelism, it performs Gibbs sampling inference at up to 1380x faster with 1965x less energy than an Arm CortexA53 on the same SoC, and 1.5x faster with 6.3x less energy than an embedded FPGA in the same technology.

A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm

A Scalable Bayesian Inference Accelerator for Unsupervised Learning

Voltage Interpolation/Variable Latency FPU test chip, .13um UMC CMOS.

Voltage Interpolation/Variable Latency FPU test chip, .13um UMC CMOS.

Measurement results appear in Liang et al, ISSCC 2008 and provides a companion to our ISCA 2008 architectural studies.

Three level DC-DC converter that demonstrates fast voltage scaling

A three level DC-DC converter that demonstrates fast voltage scaling (on the order of nanoseconds) for fast DVFS. Implemented in UMC 130nm CMOS.

A fully-integrated 3-level DC-DC converter for nanosecond-scale DVFS.

Prototype near-threshold voltage stacking test-chip

Prototype near-threshold voltage stacking test-chip comprising of 3×3 array of power-consuming cores fabricated in MIT Lincoln Lab’s 150nm near-threshold optimized fully-depleted SOI (FDSOI) process.

Evaluation of voltage stacking for near-threshold multicore computing

Fully Integrated Battery-Connected Switched-Capacitor 4:1 Voltage Regulator

A Fully Integrated Battery-Connected Switched-Capacitor 4:1 Voltage Regulator with 70% Peak Efficiency Using Bottom-Plate Charge Recycling

zhang_cicc_2013-150x150.png

The prototype microrobotic SoC designed for the RoboBee contains a fully integrated high efficiency switched-capacitor voltage regulator, a 32-bit ARM Cortex-M0 general-purpose processor with 128 KB on-chip memories, a programmable voltage-tracking adaptive-frequency clock, and a low-power high-precision frequency reference.

Supply-Noise Resilient Adaptive Clocking for Battery-Powered Aerial Microrobotic System-on-Chip in 40nm CMOS

Programmable DNN Classifier for IoT

Programmable DNN Classifier for IoT featuring:

  • Parallelism/reuse: 8-way SIMD, 10X data reuse @ 128b/cycle BW.
  • Small data-types: 8-bit weights, -30% energy.
  • Sparse activation data: +4X throughput and -4X energy.
  • Algorithmic resilience: +50% throughput or -30% energy.

A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

DNN ENGINE: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications