Bayesian models and inference is a class of machine learning that is useful for solving problems where the amount of data is scarce and prior knowledge about the application allows you to draw better conclusions. However, Bayesian models often requires computing high-dimensional integrals and finding the posterior distribution can be intractable. One of the most commonly used approximate methods for Bayesian inference is Gibbs sampling, which is a Markov chain Monte Carlo (MCMC) technique to estimate target stationary distribution. The idea in Gibbs sampling is to generate posterior samples by iterating through each of the variables to sample from its conditional given all the other variables fixed. While Gibbs sampling is a popular method for probabilistic graphical models such as Markov Random Field (MRF), the plain algorithm is slow as it goes through each of the variables sequentially. In this work, we describe a binary label MRF Gibbs sampling inference architecture and extend it to 64-label version capable of running multiple perceptual applications, such as sound source separation and stereo matching. The described accelerator employs a chromatic scheduling of variables to parallelize all the conditionally independent variables to 257 samplers, imple- mented on the FPGA portion of a CPU-FPGA SoC. For real-time streaming sound source separation task, we show the hybrid CPU- FPGA implementation is 230x faster than a commercial mobile processor, while maintaining a recommended latency under 50 ms. The 64-label version showed 137x and 679x speedups for binary label MRF Gibbs sampling inference and 64 labels, respectively.
Many consider one of the key components to the success of deep learning as its compatibility with existing accelerators, mainly GPU. While GPUs are great at handling linear algebra kernels commonly found in deep learning, they are not the optimal architecture for handling unsupervised learning methods such as Bayesian models and inference. As a step towards, achieving better understanding of architectures for probabilistic models, Gibbs sampling, one of the most commonly used algorithms for Bayesian inference, is studied with a focus on parallelism that converges to the target distribution and parameterized components. We propose FlexGibbs, a reconfigurable parallel Gibbs sampling inference accelerator for structured graphs. We designed an architecture optimal for solving Markov Random Field tasks using an array of parallel Gibbs samplers, enabled by chromatic scheduling. We show that for sound source separation application, FlexGibbs configured on the FPGA fabric of Xilinx Zync CPU-FPGA SoC achieved Gibbs sampling inference speedup of 1048x and 99.85% reduction in energy over running it on ARM Cortex-A53.
The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, Weightless, can compress DNN weights by up to 496x with the same model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.