#  Chip Gallery 

 



Chip prototyping provides several important benefits for our research. Silicon implementation provides the opportunity to learn about power and variability issues with real measurements in ways that simulations alone cannot provide, and our chip prototypes allow us to more convincingly demonstrate the benefits of our proposed approaches. In addition, the design process instills an appreciation of complexity, testing, and validation issues encountered when creating real hardware. Our group has designed prototype chips for several projects.

We thank IBM, TSMC, SRC, and UMC for fabrication support for these projects.



 

  [### 16nm always-on processor for IoT DNN inference tasks featuring: 

 ](/publications/16-nm-always-dnn-processor-adaptive-clocking-and-multi-cycle-banked-srams)- Calibration-free automatic voltage/frequency scaling via tracking of razor timing error rates.
- Multi-cycle banked SRAM scheme to relax the SRAM read cycle time.
- Fast-adaptive clocking scheme provides a safety-net to allow for operation with minimal margins.

[**A 16-nm always-on DNN processor with adaptive clocking and multi-cycle banked SRAMs**](/publications/16-nm-always-dnn-processor-adaptive-clocking-and-multi-cycle-banked-srams)



 

   ![16nm always-on processor for IoT DNN inference tasks](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/img-chip-16nm.png?itok=-BfTtu3-) 

 

 

 

  [### 25mm2 SoC in 16nm FinFET

 ](/publications/16nm-25mm2-soc-545x-flexibility-efficiency-range-dual-core-arm-cortex-a53)targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on subsystem, a dual-core Arm A53 CPU cluster, an embedded FPGA array (eFPGA), and a quad-core cache coherent accelerator (CCA) cluster. Measurement results demonstrate the following observations:

- Accelerator flexibility-efficiency (GOPS/W) range spans from 3.1x (A53+SIMD), to 16.5x (eFPGA), to 54.5x (CCA) compared to the dual-core CPU baseline on comparable tasks.
- Energy per inference on MobileNet-128 CNN shows a peak improvement of 47.6x.

[**A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators**](/publications/16nm-25mm2-soc-545x-flexibility-efficiency-range-dual-core-arm-cortex-a53)

[**SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IoT Devices**](/publications/smiv-16nm-soc-efficient-and-flexible-dnn-acceleration-intelligent-iot-devices)



 

   ![25mm2 SoC in 16nm FinFET targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/img-chip-25mm.png?itok=hu1uJ8WW) 

 

 

 

  [### 16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks

 ](/publications/3mm2-programmable-bayesian-inference-accelerator-unsupervised-machine)that performs Bayesian inference on probabilistic models mapped onto a 2D Markov Random Field, using MCMC.

Exploiting two degrees of parallelism, it performs Gibbs sampling inference at up to 1380x faster with 1965x less energy than an Arm CortexA53 on the same SoC, and 1.5x faster with 6.3x less energy than an embedded FPGA in the same technology.

[**A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm**](/publications/3mm2-programmable-bayesian-inference-accelerator-unsupervised-machine)

[**A Scalable Bayesian Inference Accelerator for Unsupervised Learning**](/publications/scalable-bayesian-inference-accelerator-unsupervised-learning)



 

   ![16nm programmable accelerator (PGMA) for unsupervised probabilistic machine perception tasks](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/img-chip-pgma.png?itok=BlrWDucq) 

 

 

 

  

 

 

 

  [### A 16-Core Voltage-Stacked System With Adaptive Clocking and an Integrated Switched-Capacitor DC–DC Converter

 ](/publications/16-core-voltage-stacked-system-adaptive-clocking-and-integrated-switched)[**A Fully Integrated Reconfigurable Switched-Capacitor DC-DC Converter With Four Stacked Output Channels for Voltage Stacking Applications**](https://ieeexplore.ieee.org/document/7516564)



 

   ![die_photo_v2-150x150.png](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/die_photo_v2-150x150.png?itok=cn3RuE_g) 

 

 

 

  [### An integrated 300Volts drive stage

 ](/publications/design-and-analysis-integrated-driver-piezoelectric-actuators)for piezoelectric actuators used in micro-robotic systems.

[**Design and analysis of an integrated driver for piezoelectric actuators**](/publications/design-and-analysis-integrated-driver-piezoelectric-actuators)



 

   ![ntegrated 300Volts drive stage for piezoelectric actuators used in micro-robotic systems](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/img-chip-lok_ecce.png?itok=RR4KldmL) 

 

 

 

  [### Wireless sensor network test chip, ULP-1, .18um IBM CMOS

 ](https://ieeexplore.ieee.org/document/1431558)1st Prize in SRC SoC Design Contest. Implementation of our architecture that appears in Hempstead et al, [**ISCA 2005**](https://ieeexplore.ieee.org/xpl/conhome/9793/proceeding?isnumber=30879&pageNumber=2).



 

   ![hempstead_isca_2005-150x150.png](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/hempstead_isca_2005-150x150.png?itok=z4q7jLkv) 

 

 

 

  

 

 

 

  [### Voltage Interpolation/Variable Latency FPU test chip, .13um UMC CMOS.

 ](https://ieeexplore.ieee.org/document/4494664)Measurement results appear in [**Liang et al, ISSCC 2008**](https://ieeexplore.ieee.org/document/4494664) and provides a companion to our ISCA 2008 architectural studies.



 

   ![liang_isscc_2008-150x150.jpg](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/liang_isscc_2008-150x150.jpg?itok=RZr40Gw6) 

 

 

 

  [### A three level DC-DC converter that demonstrates fast voltage scaling 

 ](/publications/fully-integrated-3-level-dc-dc-converter-nanosecond-scale-dvfs)(on the order of nanoseconds) for fast DVFS. Implemented in UMC 130nm CMOS.

[**A fully-integrated 3-level DC-DC converter for nanosecond-scale DVFS**](/publications/fully-integrated-3-level-dc-dc-converter-nanosecond-scale-dvfs)**.**



 

   ![wonyoung_jsscc_2012-150x150.png](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/wonyoung_jsscc_2012-150x150.png?itok=1sk0A_5R) 

 

 

 

  [### Prototype near-threshold voltage stacking test-chip

 ](https://dl.acm.org/doi/10.1145/2333660.2333746)comprising of 3×3 array of power-consuming cores fabricated in MIT Lincoln Lab’s 150nm near-threshold optimized fully-depleted SOI (FDSOI) process.

[**Evaluation of voltage stacking for near-threshold multicore computing**](https://dl.acm.org/doi/10.1145/2333660.2333746)



 

   ![sklee_islped_2012-150x150.png](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/sklee_islped_2012-150x150.png?itok=DJ2sKNht) 

 

 

 

  

 

 

 

  [### A Fully Integrated Battery-Connected Switched-Capacitor 4:1 Voltage Regulator

 ](/publications/fully-integrated-battery-connected-switched-capacitor-41-voltage-regulator-70)with 70% Peak Efficiency Using Bottom-Plate Charge Recycling



 

   ![tong_cicc_2013-150x150.png](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/tong_cicc_2013-150x150.png?itok=-_7F4HLf) 

 

 

 

  [### The prototype microrobotic SoC designed for the RoboBee

 ](/publications/supply-noise-resilient-adaptive-clocking-battery-powered-aerial-microrobotic)contains a fully integrated high efficiency switched-capacitor voltage regulator, a 32-bit ARM Cortex-M0 general-purpose processor with 128 KB on-chip memories, a programmable voltage-tracking adaptive-frequency clock, and a low-power high-precision frequency reference.

[**Supply-Noise Resilient Adaptive Clocking for Battery-Powered Aerial Microrobotic System-on-Chip in 40nm CMOS**](/publications/supply-noise-resilient-adaptive-clocking-battery-powered-aerial-microrobotic)



 

   ![Prototype microrobotic SoC designed for the RoboBee](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/zhang_cicc_2013-150x150.png?itok=ISSmgmQ3) 

 

 

 

  [### Programmable DNN Classifier for IoT featuring:

 ](/publications/143-28nm-soc-12-ghz-568njprediction-sparse-deep-neural-network-engine-01)- Parallelism/reuse: 8-way SIMD, 10X data reuse @ 128b/cycle BW.
- Small data-types: 8-bit weights, -30% energy.
- Sparse activation data: +4X throughput and -4X energy.
- Algorithmic resilience: +50% throughput or -30% energy.

[**A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with &gt;0.1 Timing Error Rate Tolerance for IoT Applications**](/publications/143-28nm-soc-12-ghz-568njprediction-sparse-deep-neural-network-engine-01)

[**DNN ENGINE: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications**](/publications/dnn-engine-28-nm-timing-error-tolerant-sparse-deep-neural-network-processor)



 

   ![Programmable DNN Classifier for IoT](/sites/g/files/omnuum11281/files/styles/hwp_1_1__360x360_scale/public/vlsiarch/files/img-chip-dnn.png?itok=XE8sv7aO)