Speech and NLP

speech_and_nlp.jpg

We explore efficient computations of natural language processing (NLP) and speech recognition applications from software and hardware perspectives. Software solutions generally include optimization on neural network (NN) structures and compressions on model parameters. As for hardware acceleration, we exploit specialized datapath, controller and memory systems to enable massively parallel computations of NNs, reduce data accesses from DRAM, and support customized datatype and coding schemes. With considerations on both software and hardware, we develop various NN compression schemes and implement accelerators that flexibly and efficiently run different NLP models.

The picture shows an example design flow of NLP accelerator. We first study the computational behavior of the target seq2seq (RNN) model and brainstorm how we can effectively map model weights and schedule computations on a customized hardware. We eventually choose 8-bit floating point as the hardware datatype and design datapath that directly accelerates matrix-vector multiplication. To reduce on-chip data movement, we also implement customized interconnections based on the data dependencies of RNN models. The hardware is coded in SystemC with the Catapult HLS tool, which allows us to shorten design cycles and makes testing easier.

Select Publications

2021

Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, Marco Donato, Paul N. Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “A 25mm2 SoC for IoT Devices With 18ms Noise Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET”. International Solid-State Circuits Conference (ISSCC’21)
Thierry Tambe, En-Yu Yang, Glenn G. Ko, Yuji Chai, Coleman Hooper, Marco Donato, Paul N. Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “A 25mm2 SoC for IoT Devices With 18ms Noise Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET”. International Solid-State Circuits Conference (ISSCC’21)
Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference”. IEEE/ACM/International/Symposium/on/Microarchitecture/(MICRO/2021)
Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul Whatmough, Alexander M. Rush, David Brooks, and Gu-Yeon Wei. 2021. “EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference”. IEEE/ACM/International/Symposium/on/Microarchitecture/(MICRO/2021)

2020

Thierry Tambe, En-Yang, Zishen Wan, Yuntian Deng, Vijay Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference”. In . San Francisco, CA, USA: Design Automation Conference (DAC 2020)
Thierry Tambe, En-Yang, Zishen Wan, Yuntian Deng, Vijay Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference”. In . San Francisco, CA, USA: Design Automation Conference (DAC 2020)

2019

Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu Wei, and David Brooks. 2019. “MASR: A Modular Accelerator for Sparse RNNs”. In International Conference on Parallel Architectures and Compilation Techniques
Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander Rush, Gu Wei, and David Brooks. 2019. “MASR: A Modular Accelerator for Sparse RNNs”. In International Conference on Parallel Architectures and Compilation Techniques