MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation

Citation:

Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu Wei, and David Brooks. 10/1/2019. “MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation.” In MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Pp. 769–781. Publisher's Version

Download

2.57 MB

Abstract:

Deeply embedded applications require low-power, low-cost hardware that fits within stringent area constraints. Deep learning has many potential uses in these domains, but introduces significant inefficiencies stemming from off-chip DRAM accesses of model weights. Ideally, models would fit entirely on-chip. However, even with compression, memory requirements for state-of-the-art mod- els make on-chip inference impractical. Due to increased density, emerging eNVMs are one promising solution. We present MaxNVM, a principled co-design of sparse encodings, protective logic, and fault-prone MLC eNVM technologies (i.e.,RRAM and CTT) to enable highly-efficient DNN inference. We find bit reduction techniques (e.g., clustering and sparse compression) increase weight vulnerability to faults. This limits the capabilities of MLC eNVM. To circumvent this limitation, we improve storage den- sity (i.e., bits-per-cell) with minimal overhead using protective logic. Tradeoffs between density and reliability result in a rich design space. We show that by balancing these techniques, the weights of large networks are able to reasonably fit on-chip. Compared to a naive, single-level-cell eNVM solution, our highly-optimized MLC memory systems reduce weight area by up to 29×. We compare our technique against NVDLA, a state-of-the-art industry-grade CNN accelerator, and demonstrate up to 3.2× reduced power and up to 3.5× reduced energy per ResNet50 inference.

Harvard Architecture, Circuits and Compilers

Research group of Prof. David Brooks and Prof. Gu-Yeon Wei

MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation

Citation:

Abstract:

Search Publications

Browse by Year

Browse by Project

Browse by Author

Browse by Publication Type

7b31d72cd65b3801ac95f03689475737

d116f06a510b609055e4c6771dc22b81

9dfeed5fdb471663ad5d190f6c859077