TY - JOUR T1 - RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance JF - MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture Y1 - 2021 A1 - Udit Gupta A1 - Hsia, Samuel A1 - Jeff Zhang A1 - Wilkening, Mark A1 - Javin Pombra A1 - Hsien-Hsin S. Lee A1 - Gu-Yeon Wei A1 - Carole-Jean Wu A1 - David Brooks AB - Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs).While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAc-cel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Com-pared to prior-art and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3x and 6x. UR - https://doi.org/10.48550/arXiv.2105.08820 ER -