#  Speech and NLP 

 



 ##  

  expand\_more  

 
  

 

   ![speech_and_nlp.jpg](/sites/g/files/omnuum11281/files/styles/hwp_1_1__960x960_scale/public/vlsiarch/files/speech_and_nlp.jpg?itok=kvjO-RcE) 

 

We explore efficient computations of natural language processing (NLP) and speech recognition applications from software and hardware perspectives. Software solutions generally include optimization on neural network (NN) structures and compressions on model parameters. As for hardware acceleration, we exploit specialized datapath, controller and memory systems to enable massively parallel computations of NNs, reduce data accesses from DRAM, and support customized datatype and coding schemes. With considerations on both software and hardware, we develop various NN compression schemes and implement accelerators that flexibly and efficiently run different NLP models.

The picture shows an example design flow of NLP accelerator. We first study the computational behavior of the target seq2seq (RNN) model and brainstorm how we can effectively map model weights and schedule computations on a customized hardware. We eventually choose 8-bit floating point as the hardware datatype and design datapath that directly accelerates matrix-vector multiplication. To reduce on-chip data movement, we also implement customized interconnections based on the data dependencies of RNN models. The hardware is coded in SystemC with the Catapult HLS tool, which allows us to shorten design cycles and makes testing easier.



 

##  Select Publications 

 



  Download 4 citations  download- [BibTeX](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=bibtex)
- [EndNote X3 XML](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=endnote8)
- [EndNote 7 XML](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=endnote7)
- [Endnote tagged](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=tagged)
- [Marc](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=marc)
- [PubMedId](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=pubmed_id)
- [RIS](/bibcite/export?pager_style=no_pager&number_of_items=6&sort_field=bibcite_year--desc&taxonomy_filters%5Bfield_hwp_c_peoplepublications%5D&taxonomy_filters%5Bfield_hwp_c_project123456%5D%5B0%5D%5Btarget_id%5D=172614&&&format=ris)
 


 

### 2021

Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, Marco Donato, Paul N. Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “[A 25mm2 SoC for IoT Devices With 18ms Noise Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET](/publications/25mm2-soc-iot-devices-18ms-noise-robust-speech-text-latency-bayesian-speech)”. International Solid-State Circuits Conference (ISSCC’21)



 

 

Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, Marco Donato, Paul N. Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “[A 25mm2 SoC for IoT Devices With 18ms Noise Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET](/publications/25mm2-soc-iot-devices-18ms-noise-robust-speech-text-latency-bayesian-speech)”. International Solid-State Circuits Conference (ISSCC’21)



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.1109/ISSCC42613.2021.9366062)
- [ picture\_as\_pdfA 25mm2 SoC for IoT Devic...](/sites/g/files/omnuum11281/files/vlsiarch/files/tambe_isscc_2021_presentation.pdf)
 
 Automatic speech recognition (ASR) using deep learning is essential for user interfaces on IoT devices. However, previously published ASR chips [4-7] do not consider realistic operating conditions, which are typically noisy and may include more than one... 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.1109/ISSCC42613.2021.9366062)
- [ picture\_as\_pdfA 25mm2 SoC for IoT Devic...](/sites/g/files/omnuum11281/files/vlsiarch/files/tambe_isscc_2021_presentation.pdf)
 
 

Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “[EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference](/publications/edgebert-sentence-level-energy-optimizations-latency-aware-multi-task-nlp)”. IEEE/ACM/International/Symposium/on/Microarchitecture/(MICRO/2021)



 

 

Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “[EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference](/publications/edgebert-sentence-level-energy-optimizations-latency-aware-multi-task-nlp)”. IEEE/ACM/International/Symposium/on/Microarchitecture/(MICRO/2021)



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.48550/arXiv.2011.14203)
 
 Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource... 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.48550/arXiv.2011.14203)
 
 

 



### 2020

Thierry Tambe, En-Yang, Zishen Wan, Yuntian Deng, Vijay Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “[Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference](/publications/algorithm-hardware-co-design-adaptive-floating-point-encodings-resilient-deep)”. In . San Francisco, CA, USA: Design Automation Conference (DAC 2020)



 

 

Thierry Tambe, En-Yang, Zishen Wan, Yuntian Deng, Vijay Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “[Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference](/publications/algorithm-hardware-co-design-adaptive-floating-point-encodings-resilient-deep)”. In . San Francisco, CA, USA: Design Automation Conference (DAC 2020)



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.1109/DAC18072.2020.9218516)
- [ picture\_as\_pdfAlgorithm-Hardware Co-Des...](/sites/g/files/omnuum11281/files/vlsiarch/files/b1743_030_5_1594671163.pdf)
 
 Conventional hardware-friendly quantization methods, such asfixed-point or integer, tend to perform poorly at very low preci-sion as their shrunken dynamic ranges cannot adequately capturethe wide data distributions commonly seen in sequence transduc-tion... 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.1109/DAC18072.2020.9218516)
- [ picture\_as\_pdfAlgorithm-Hardware Co-Des...](/sites/g/files/omnuum11281/files/vlsiarch/files/b1743_030_5_1594671163.pdf)
 
 

 



### 2019

Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu Wei, and David Brooks. 2019. “[MASR: A Modular Accelerator for Sparse RNNs](/publications/masr-modular-accelerator-sparse-rnns)”. In International Conference on Parallel Architectures and Compilation Techniques



 

 

Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu Wei, and David Brooks. 2019. “[MASR: A Modular Accelerator for Sparse RNNs](/publications/masr-modular-accelerator-sparse-rnns)”. In International Conference on Parallel Architectures and Compilation Techniques



 

 

 

- add\_circle\_outline do\_not\_disturb\_on Abstract
- [ descriptionPublisher's Version](https://doi.org/10.48550/arXiv.1908.08976)
- [ picture\_as\_pdfMASR: A Modular Accelerat...](/sites/g/files/omnuum11281/files/vlsiarch/files/1908.08976.pdf)
 
 Recurrent neural networks (RNNs) are becoming the de facto solution for speech recognition. RNNs exploit long-term temporal relationships in data by applying repeated, learned transformations. Unlike fully-connected (FC) layers with single vector matrix... 

 

 

- [ descriptionPublisher's Version](https://doi.org/10.48550/arXiv.1908.08976)
- [ picture\_as\_pdfMASR: A Modular Accelerat...](/sites/g/files/omnuum11281/files/vlsiarch/files/1908.08976.pdf)
 
 

 



 

 

 

 [ See all project publications arrow\_circle\_right ](https://prod-vlsiarch.drupalsites.harvard.edu/publications?f%5B0%5D=bibcite_reference_hwp_c_project123456%3A172614)