#  “Agile” Chip Building (CRAFT, ChipKit, Chip Gallery) 

 



 ##  

  expand\_more  

 
  

 

## Agile and Rapid Design of Research Test Chips

 ![agile_chip_building_fig1.png](/sites/g/files/omnuum11281/files/vlsiarch/files/agile_chip_building_fig1.png)

 

Research test chips are the ultimate experiment to demonstrate the true value of novel computer architecture innovations. They are always very highly regarded by reviewers as the most honest evaluation of a new hardware proposal. In addition, there is a huge pedagogical value in taping out test chips, as it offers insight on the impact of real hardware and microarchitecture details that are critical in guiding higher level architecture decisions and trade-offs. Nonetheless, despite all this, taping out test chips remains a challenge for those who are following this path for the first time. Traditionally, research chips have been time-consuming to design, fabricate and test, and often error prone - potentially requiring re-spins to fix problems.

To help lower the entry barrier for chip tape-outs, we have pioneered an open-source framework, CHIPKIT, centered on agile and reusable themes. Emphasizing reuse greatly reduces development cost and at the same time minimizes the opportunity for silicon bugs, freeing the designer to focus on differentiating features. While agile design seeks to follow a methodology where changes can be readily implemented late into the design cycle, without significant disruption or risk. A full-chip validation methodology, covering the entire design flow, is then adapted onto this system-on-chip scaffold in order to ensure functional correctness. Following the CHIPKIT framework has allowed steady and new tape-outs (a subset illustrated in the gallery below) to be developed with very low-risk, high success rate, and with design and verification efforts reduced by orders of magnitude.



 

##  Select Publications 

 



  Download 6 citations  download- [BibTeX](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=bibtex)
- [EndNote X3 XML](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=endnote8)
- [EndNote 7 XML](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=endnote7)
- [Endnote tagged](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=tagged)
- [Marc](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=marc)
- [PubMedId](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=pubmed_id)
- [RIS](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172620&&&format=ris)
 


 

### 2022

Thierry Tambe, David Brooks, and Gu-Yeon Wei. 2022. “[Learnings from a HLS-Based High-Productivity Digital VLSI Flow](/publications/learnings-hls-based-high-productivity-digital-vlsi-flow)”. In Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE’22)



 

 

Thierry Tambe, David Brooks, and Gu-Yeon Wei. 2022. “[Learnings from a HLS-Based High-Productivity Digital VLSI Flow](/publications/learnings-hls-based-high-productivity-digital-vlsi-flow)”. In Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE’22)



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://capra.cs.cornell.edu/latte22/paper/6.pdf)
- [ picture\_as\_pdfLearnings from a HLS-base...](/sites/g/files/omnuum11281/files/vlsiarch/files/6.pdf)
 
 Thetwilight of Dennardscalinghasactivatedaglobaltrendtowards application-based hardware specialization. This trend is currently accelerating due to the surging democratization and deployment of machine learning on mobile and IoT compute platforms. At the... 

 

 

- [ descriptionPublisher's Version](https://capra.cs.cornell.edu/latte22/paper/6.pdf)
- [ picture\_as\_pdfLearnings from a HLS-base...](/sites/g/files/omnuum11281/files/vlsiarch/files/6.pdf)
 
 

 



### 2020

Glenn Ko, Yuji Chai, Marco Donato, Paul Whatmough, Tambe Thierry, Rob Rutenbar, Gu Wei, and Gu Wei. 2020. “[A Scalable Bayesian Inference Accelerator for Unsupervised Learning](/publications/scalable-bayesian-inference-accelerator-unsupervised-learning)”. In IEEE Hot Chips 31 Symposium. Palo Alto, CA, USA



 

 

Glenn Ko, Yuji Chai, Marco Donato, Paul Whatmough, Tambe Thierry, Rob Rutenbar, Gu Wei, and Gu Wei. 2020. “[A Scalable Bayesian Inference Accelerator for Unsupervised Learning](/publications/scalable-bayesian-inference-accelerator-unsupervised-learning)”. In IEEE Hot Chips 31 Symposium. Palo Alto, CA, USA



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.1109/HCS49909.2020.9220686)
- [ picture\_as\_pdfA Scalable Bayesian Infer...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_scalable_bayesian_inference_accelerator_for_unsupervised_learning.pdf)
 
 This article consists only of a collection of slides from the author's conference presentation. 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.1109/HCS49909.2020.9220686)
- [ picture\_as\_pdfA Scalable Bayesian Infer...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_scalable_bayesian_inference_accelerator_for_unsupervised_learning.pdf)
 
 

Glenn Ko, Yuji Chai, Marco Donato, Paul Whatmough, Thierry Tambe, Rob Rutenbar, David Brooks, and Gu-Yeon Wei. 2020. “[A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception Using Parallel Gibbs Sampling in 16nm](/publications/3mm2-programmable-bayesian-inference-accelerator-unsupervised-machine)”. In IEEE Symposium on VLSI Circuits (VLSI)



 

 

Glenn Ko, Yuji Chai, Marco Donato, Paul Whatmough, Thierry Tambe, Rob Rutenbar, David Brooks, and Gu-Yeon Wei. 2020. “[A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception Using Parallel Gibbs Sampling in 16nm](/publications/3mm2-programmable-bayesian-inference-accelerator-unsupervised-machine)”. In IEEE Symposium on VLSI Circuits (VLSI)



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.1109/VLSICircuits18222.2020.9162784)
- [ picture\_as\_pdfA 3mm2 Programmable Bayes...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_3mm2_programmable_bayesian_inference_accelerator_for_unsupervised_machine_perception_using_parallel_gibbs_sampling_in_16nm.pdf)
 
 This paper describes a 16nm programmable accelerator for unsupervised probabilistic machine perception tasks that performs Bayesian inference on probabilistic models mapped onto a 2D Markov Random Field, using MCMC. Exploiting two degrees of parallelism... 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.1109/VLSICircuits18222.2020.9162784)
- [ picture\_as\_pdfA 3mm2 Programmable Bayes...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_3mm2_programmable_bayesian_inference_accelerator_for_unsupervised_machine_perception_using_parallel_gibbs_sampling_in_16nm.pdf)
 
 

 



### 2019

Sae Lee, Paul Whatmough, David Brooks, and Gu Wei. 2019. “[A 16-Nm Always-on DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs](/publications/16-nm-always-dnn-processor-adaptive-clocking-and-multi-cycle-banked-srams)”. IEEE Journal of Solid-State Circuits, 54, 7, Pp. 1982-92



 

 

Sae Lee, Paul Whatmough, David Brooks, and Gu Wei. 2019. “[A 16-Nm Always-on DNN Processor With Adaptive Clocking and Multi-Cycle Banked SRAMs](/publications/16-nm-always-dnn-processor-adaptive-clocking-and-multi-cycle-banked-srams)”. IEEE Journal of Solid-State Circuits, 54, 7, Pp. 1982-92



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/document/8715387)
- [ picture\_as\_pdfA 16-nm always-on DNN pro...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_16-nm_always-on_dnn_processor_with_adaptive_clocking_and_multi-cycle_banked_srams.pdf)
 
 Always-on subsystems in mobile/Internet of Things (IoT) SoCs process a variety of real-time sensor data deep neural network (DNN) classification workloads in a heavily constrained energy budget. This can be achieved with robust, low-voltage circuits, and... 

 

 

- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/document/8715387)
- [ picture\_as\_pdfA 16-nm always-on DNN pro...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_16-nm_always-on_dnn_processor_with_adaptive_clocking_and_multi-cycle_banked_srams.pdf)
 
 

Paul Whatmough, Sae Lee, Marco Donato, Hsea Hsueh, Sam Xi, Udit Gupta, Lillian Pentecost, Glenn Ko, David Brooks, and Gu Wei. 2019. “[A 16nm 25mm2 SoC With a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to EFPGA and Cache-Coherent Accelerators](/publications/16nm-25mm2-soc-545x-flexibility-efficiency-range-dual-core-arm-cortex-a53)”. Symposium on VLSI Circuits



 

 

Paul Whatmough, Sae Lee, Marco Donato, Hsea Hsueh, Sam Xi, Udit Gupta, Lillian Pentecost, Glenn Ko, David Brooks, and Gu Wei. 2019. “[A 16nm 25mm2 SoC With a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to EFPGA and Cache-Coherent Accelerators](/publications/16nm-25mm2-soc-545x-flexibility-efficiency-range-dual-core-arm-cortex-a53)”. Symposium on VLSI Circuits



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/abstract/document/8778002/authors#authors)
- [ picture\_as\_pdfA 16nm 25mm2 SoC with a 5...](/sites/g/files/omnuum11281/files/vlsiarch/files/smiv.pdf)
 
 This paper presents a 25mm^2 SoC in 16nm FinFET technology targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on sub-system, a dual-core Arm A53 CPU cluster, an embedded FPGA array... 

 

 

- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/abstract/document/8778002/authors#authors)
- [ picture\_as\_pdfA 16nm 25mm2 SoC with a 5...](/sites/g/files/omnuum11281/files/vlsiarch/files/smiv.pdf)
 
 

 



### 2018

Sae Lee, Paul Whatmough, Niamh Mulholland, Patrick Hansen, David Brooks, and Gu Wei. 2018. “[A Wide Dynamic Range Sparse FC-DNN Processor With Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET](/publications/wide-dynamic-range-sparse-fc-dnn-processor-multi-cycle-banked-sram-read-and)”. ESSCIRC 2018-IEEE 44th European Solid State Circuits Conference



 

 

Sae Lee, Paul Whatmough, Niamh Mulholland, Patrick Hansen, David Brooks, and Gu Wei. 2018. “[A Wide Dynamic Range Sparse FC-DNN Processor With Multi-Cycle Banked SRAM Read and Adaptive Clocking in 16nm FinFET](/publications/wide-dynamic-range-sparse-fc-dnn-processor-multi-cycle-banked-sram-read-and)”. ESSCIRC 2018-IEEE 44th European Solid State Circuits Conference



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/abstract/document/8494245)
- [ picture\_as\_pdfA wide dynamic range spar...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_wide_dynamic_range_sparse_fc-dnn_processor_with_multi-cycle_banked_sram_read_and_adaptive_clocking_in_16nm_finfet.pdf)
 
 Always-on classifiers for sensor data require a very wide operating range to support a variety of real-time workloads and must operate robustly at low supply voltages. We present a 16nm always-on wake-up controller with a fully-connected (FC) Deep Neural... 

 

 

- [ descriptionPublisher's Version](https://ieeexplore.ieee.org/abstract/document/8494245)
- [ picture\_as\_pdfA wide dynamic range spar...](/sites/g/files/omnuum11281/files/vlsiarch/files/a_wide_dynamic_range_sparse_fc-dnn_processor_with_multi-cycle_banked_sram_read_and_adaptive_clocking_in_16nm_finfet.pdf)
 
 

 



 

 

 

 [ See all project publications arrow\_circle\_right ](https://prod-vlsiarch.drupalsites.harvard.edu/publications?f%5B0%5D=bibcite_reference_hwp_c_project123456%3A172620)