Publications by Author: Kevin Brownell

2017
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu Wei, and David Brooks. 12/2017. “Automatically accelerating non-numerical programs by architecture-compiler co-design.” Communications of the ACM, 60, 12, Pp. 88–97. Publisher's VersionAbstract
Because of the high cost of communication between processors, compilers that parallelize loops automatically have been forced to skip a large class of loops that are both critical to performance and rich in latent parallelism. HELIX-RC is a compiler/microprocessor co-design that opens those loops to parallelization by decoupling communication from thread execution in conventional multicore architecures. Simulations of HELIX-RC, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.
Automatically accelerating non-numerical programs by architecture-compiler co-design
2016
Gu Wei, David Brooks, Simone Campanoni, Kevin Brownell, and Svilen Kanev. 2016. “Methods and apparatus for parallel processing”.
2014
Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy Jones, Gu Wei, and David Brooks. 6/14/2014. “HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs.” In International Symposium on Computer Architecture (ISCA), 3rd ed., 42: Pp. 217–228. Publisher's VersionAbstract
Data dependences in sequential programs limit paralleliza- tion because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.
HELIX-RC: An Architecture-Compiler Co-Design for Automatic Parallelization of Irregular Programs
Simone Campanoni, Svilen Kanev, Kevin Brownell, Gu Wei, and David Brooks. 2014. “Breaking Cyclic-Multithreading Parallelization with XML Parsing.” In International Workshop on Parallelism in Mobile Platforms (PRISM). Publisher's VersionAbstract
HELIX-RC, a modern re-evaluation of the cyclic-multithreading (CMT) compiler technique, extracts threads from sequential code automatically. As a CMT approach, HELIX-RC gains performance by running iterations of the same loop on different cores in a multicore. It successfully boosts performance for several SPEC CINT benchmarks previously considered unparallelizable. However, this paper shows there are workloads with different characteristics, which even idealized CMT cannot parallelize. We identify how to overcome an inherent limitation of CMT for these workloads. CMT techniques only run iterations of a single loop in parallel at any given time. We propose exploiting parallelism not only within a single loop, but also among multiple loops. We call this execution model Multiple CMT (MCMT), and show that it is crucial for auto-parallelizing a broader class of workloads. To highlight the need for MCMT, we target a workload that is naturally hard for CMT – parsing XML-structured data. We show that even idealized CMT fails on XML parsing. Instead, MCMT extracts speedups up to 3.9x on 4 cores.
Breaking Cyclic-Multithreading Parallelization with XML Parsing
2011
Kevin Brownell, Ali Khan, Gu Wei, and David Brooks. 3/1/2011. “Automating design of voltage interpolation to address process variations.” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 99, Pp. 1–14. Publisher's VersionAbstract

Post-fabrication tuning provides a promising design approach to mitigate the performance and power overheads of process variation in advanced fabrication technologies. This paper explores design considerations and VLSI-CAD support for a recently proposed post-fabrication tuning knob called voltage interpolation. Successful implementation of this technique requires examination of the design tradeoffs between circuit tuning range and static power overheads within the synthesis flow of the design process, in addition to the implications of place and route. Results from the exploration of the scheme for a 64-core chip-multiprocessor machine using industrial-grade design blocks show that the scheme can be used to mitigate overhead arising from random and correlated within-die process variations. A design using voltage interpolation can match the nominal delay target with a 16% power cost, or for the same power budget, incur only a 13% delay overhead after variations.

Automating design of voltage interpolation to address process variations
2009
Kevin Brownell, Ali Khan, David Brooks, and Gu Wei. 3/16/2009. “Place and route considerations for voltage interpolated designs.” In Quality of Electronic Design, 3/16/2009. ISQED 3/16/2009. Quality Electronic Design, Pp. 594–600. IEEE. Publisher's VersionAbstract

Voltage interpolation is a promising post fabrication technique for combating the effects of process variations. The benefits of voltage interpolation are well understood. Its implementation in a VLSI-CAD flow has been considered through the synthesis stage. In this paper we study the implications of place and route on voltage interpolation. We evaluate multiple placement strategies, and conclude that a hybridization of forced placement and cluster boxing techniques results in minimum overhead.

Place and route considerations for voltage interpolated designs
2008
Kevin Brownell, Gu Wei, and David Brooks. 3/2008. “Evaluation of voltage interpolation to address process variations.” In Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design, Pp. 529–536. IEEE Press.Abstract

Abstract — Post-fabrication tuning provides a promising design approach to mitigate the performance and power overheads of process variation in advanced fabrication technologies. This paper explores design considerations and VLSI-CAD support for a recently proposed postfabrication tuning knob called voltage interpolation. The paper discusses design tradeoffs between circuit tuning range and static power overheads that can be performed within the synthesis flow of the design process. The paper explores the scheme for a 64-core chip-multiprocessor machine using industrial-grade design blocks and shows that the scheme can be used to mitigate overhead arising from random and correlated within-die process variations. The analysis shows that the scheme can match the nominal delay target with a 10 % power cost, or for the same power budget, incur only a 9 % delay overhead after variations. I.

Evaluation of voltage interpolation to address process variations