L. Pentecost, A. Hankin, M. Donato, M. Hempstead, G.-Y. Wei, and D. Brooks. 4/2/2022. “NVMExplorer: A Framework for Cross-Stack Comparisons of Embedded Non-Volatile Memories.” In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). Seoul, South Korea. Publisher's VersionAbstract
Repeated off-chip memory accesses to DRAM drive up operating power for data-intensive applications, and SRAM technology scaling and leakage power limits the efficiency of embedded memories. Future on-chip storage will need higher density and energy efficiency, and the actively expanding field of emerging, embeddable non-volatile memory (eNVM) technologies is providing many potential candidates to satisfy this need. Each technology proposal presents distinct trade-offs in terms of density, read, write, and reliability characteristics, and we present a comprehensive framework for navigating and quantifying these design trade-offs alongside realistic system constraints and application-level impacts. This work evaluates eNVM-based storage for a range of application and system contexts including machine learning on the edge, graph analytics, and general purpose cache hierarchy, in addition to describing a freely available (this http URL) set of tools for application experts, system designers, and device experts to better understand, compare, and quantify the next generation of embedded memory solutions.
M. M. Sharifi, L. Pentecost, R. Rajaei, A. Kazemi, Q. Lou, G.-Y. Wei, D. Brooks, K. Ni, X. S. Hu, M. Niemier, and M. Donato. 7/26/2021. “Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories.” In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’21. Boston, MA, USA. Publisher's VersionAbstract
The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device characteristics, programming schemes, and memory array architecture, and explore different design choices to optimize performance, energy, area, and accuracy metrics for critical data-intensive workloads. From our cross-stack design exploration, we find that we can store DNN weights and social network graphs at a density of over 8MB/mm 2 and sub-2ns read access latency without loss in application accuracy.
Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories
Marco Donato, Lillian Pentecost, David Brooks, and Gu Wei. 10/4/2019. “MEMTI: Optimizing On-Chip Nonvolatile Storage for Visual Multitask Inference at the Edge.” IEEE MICRO, 39, 6. Publisher's VersionAbstract
The combination of specialized hardware and embedded nonvolatile memories (eNVM) holds promise for energy-efficient deep neural network (DNN) inference at the edge. However, integrating DNN hardware accelerators with eNVMs still presents several challenges. Multilevel programming is desirable for achieving maximal storage density on chip, but the stochastic nature of eNVM writes makes them prone to errors and further increases the write energy and latency. In this article, we present MEMTI, a memory architecture that leverages a multitask learning technique for maximal reuse of DNN parameters across multiple visual tasks. We show that by retraining and updating only 10% of all DNN parameters, we can achieve efficient model adaptation across a variety of visual inference tasks. The system performance is evaluated by integrating the memory with the open-source NVIDIA deep learning architecture.
MEMTI: Optimizing On-Chip Nonvolatile Storage for Visual Multitask Inference at the Edge
Lillian Pentecost, Marco Donato, Brandon Reagen, Udit Gupta, Siming Ma, Gu Wei, and David Brooks. 10/1/2019. “MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation.” In MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Pp. 769–781. Publisher's VersionAbstract
Deeply embedded applications require low-power, low-cost hardware that fits within stringent area constraints. Deep learning has many potential uses in these domains, but introduces significant inefficiencies stemming from off-chip DRAM accesses of model weights. Ideally, models would fit entirely on-chip. However, even with compression, memory requirements for state-of-the-art mod- els make on-chip inference impractical. Due to increased density, emerging eNVMs are one promising solution. We present MaxNVM, a principled co-design of sparse encodings, protective logic, and fault-prone MLC eNVM technologies (i.e.,RRAM and CTT) to enable highly-efficient DNN inference. We find bit reduction techniques (e.g., clustering and sparse compression) increase weight vulnerability to faults. This limits the capabilities of MLC eNVM. To circumvent this limitation, we improve storage den- sity (i.e., bits-per-cell) with minimal overhead using protective logic. Tradeoffs between density and reliability result in a rich design space. We show that by balancing these techniques, the weights of large networks are able to reasonably fit on-chip. Compared to a naive, single-level-cell eNVM solution, our highly-optimized MLC memory systems reduce weight area by up to 29×. We compare our technique against NVDLA, a state-of-the-art industry-grade CNN accelerator, and demonstrate up to 3.2× reduced power and up to 3.5× reduced energy per ResNet50 inference.
MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation
Marco Donato, Brandon Reagen, Lillian Pentecost, Udit Gupta, David Brooks, and Gu Wei. 6/24/2018. “On-Chip Deep Neural Network Storage with Multi-Level eNVM.” In Design Automation Conference (DAC). Publisher's VersionAbstract
One of the biggest performance bottlenecks of today’s neural network (NN) accelerators is off-chip memory accesses. In this paper, we propose a method to use multi-level, embedded non-volatile memory (eNVM) to eliminate all off-chip weight accesses. The use of multi-level memory cells increases the probability of faults. Therefore, we co-design the weights and memories such that their properties complement each other and the faults result in no noticeable NN accuracy loss. In the extreme case, the weights in fully connected layers can be stored using a single transistor. With weight pruning and clustering, we show our technique reduces the memory area by over an order of magnitude compared to an SRAM baseline. In the case of VGG16 (130M weights), we are able to store all the weights in 4.9 mm2, well within the area allocated to SRAM in modern NN accelerators.
On-Chip Deep Neural Network Storage with Multi-Level eNVM