%0 Conference Paper %B IEEE 51st Asilomar Conference on Signals, Systems, and Computers %D 2017 %T Sub-uJ Deep Neural Networks for Embedded Applications %A Paul Whatmough %A Sae Lee %A Gu Wei %A David Brooks %K accelerators %K dnn %K iot %K machine learning %X To intelligently process sensor data on internet of things (IoT) devices, we require powerful classifiers that can operate at sub-uJ energy levels. Previous work has focused on spiking neural network (SNN) algorithms, which are well suited to VLSI implementation due to the single-bit connections between neurons in the network. In contrast, deep neural networks (DNNs) are not as well suited to hardware implementation, because the compute and storage demands are high. In this paper, we demonstrate that there are a variety of optimizations that can be applied to DNNs to reduce the energy consumption such that they outperform SNNs in terms of energy and accuracy. Six optimizations are surveyed and applied to a SIMD accelerator architecture. The accelerator is implemented in a 28nm SoC test chip. Measurement results demonstrate ~10X aggregate improvement in energy efficiency, with a minimum energy of 0.36uJ/inference at 667MHz clock frequency. Compared to previously published spiking neural network accelerators, we demonstrate an improvement in energy efficiency of more than an order of magnitude, across a wide energy-accuracy trade-off range. %B IEEE 51st Asilomar Conference on Signals, Systems, and Computers %C Pacific Grove, CA, USA %G eng %U https://doi.org/10.1109/ACSSC.2017.8335697