We explore efficient computations of natural language processing (NLP) and speech recognition applications from software and hardware perspectives. Software solutions generally include optimization on neural network (NN) structures and compressions on model parameters. As for hardware acceleration, we exploit specialized datapath, controller and memory systems to enable massively parallel computations of NNs, reduce data accesses from DRAM, and support customized datatype and coding schemes. With considerations on both software and hardware, we develop various NN compression schemes and implement accelerators that flexibly and efficiently run different NLP models.
The picture shows an example design flow of NLP accelerator. We first study the computational behavior of the target seq2seq (RNN) model and brainstorm how we can effectively map model weights and schedule computations on a customized hardware. We eventually choose 8-bit floating point as the hardware datatype and design datapath that directly accelerates matrix-vector multiplication. To reduce on-chip data movement, we also implement customized interconnections based on the data dependencies of RNN models. The hardware is coded in SystemC with the Catapult HLS tool, which allows us to shorten design cycles and makes testing easier.