A 16nm SoC for noise-robust speech recognition via bayesian denoising and attention-based DNNs.

Presentation Date: 

Friday, April 2, 2021

This work presents a 16nm SoC that executes a full speech-enhancing ASR pipeline in hardware, with the following key contributions: 1) unsupervised speech denoising using a Markov Source Separation Engine (MSSE) and 2) a reconfigurable accelerator (FlexASR) that demonstrates large-vocabulary sequence-to-sequence (seq2seq) ASR using bidirectional RNNs with attention. The full ASR pipeline pre-processes the incoming speech using an Arm Cortex-A53, then denoises the signal (up to 7.3dB SDR) in the MSSE accelerator, and finally accelerates a bidirectional attention-based speech-to-text model in the FlexASR accelerator. The test chip consumes 2.24mJ of energy per frame while achieving end-to-end latency of 18ms − enabling real-time throughput.