On-Chip Deep Neural Network Storage with Multi-Level eNVM

Publication information:

Marco Donato, Brandon Reagen, Lillian Pentecost, Udit Gupta, David Brooks, and Gu Wei. 2018. “On-Chip Deep Neural Network Storage With Multi-Level ENVM”. In Design Automation Conference (DAC)

Abstract

One of the biggest performance bottlenecks of today’s neural network (NN) accelerators is off-chip memory accesses. In this paper, we propose a method to use multi-level, embedded non-volatile memory (eNVM) to eliminate all off-chip weight accesses. The use of multi-level memory cells increases the probability of faults. Therefore, we co-design the weights and memories such that their properties complement each other and the faults result in no noticeable NN accuracy loss. In the extreme case, the weights in fully connected layers can be stored using a single transistor. With weight pruning and clustering, we show our technique reduces the memory area by over an order of magnitude compared to an SRAM baseline. In the case of VGG16 (130M weights), we are able to store all the weights in 4.9 mm2, well within the area allocated to SRAM in modern NN accelerators.