Robotics applications have hard time constraints and heavy computational burdens that can greatly benefit from domain-specific hardware accelerators. For the latency-critical problem of robot motion planning and control, there exists a performance gap of at least an order of magnitude between joint actuator response rates and state-of-the-art software solutions. Hardware acceleration can close this gap, but it is essential to define automated hardware design flows to keep the design process agile as applications and robot platforms evolve. To address this challenge, we introduce robomorphic computing: a methodology to transform robot morphology into a customized hardware accelerator morphology. We (i) present this design methodology, using robot topology and structure to exploit parallelism and matrix sparsity patterns in accelerator hardware; (ii) use the methodology to generate a parameterized accelerator design for the gradient of rigid body dynamics, a key kernel in motion planning; (iii) evaluate FPGA and synthesized ASIC implementations of this accelerator for an industrial manipulator robot; and (iv) describe how the design can be automatically customized for other robot models. Our FPGA accelerator achieves speedups of 8× and 86× over CPU and GPU when executing a single dynamics gradient computation. It maintains speedups of 1.9× to 2.9× over CPU and GPU, including computation and I/O round-trip latency, when deployed as a coprocessor to a host CPU for processing multiple dynamics gradient computations. ASIC synthesis indicates an additional 7.2× speedup for single computation latency. We describe how this principled approach generalizes to more complex robot platforms, such as quadrupeds and humanoids, as well as to other computational kernels in robotics, outlining a path forward for future robomorphic computing accelerators.
Computational efficiency is a critical constraint for a variety of cutting-edge real-time applications. In this work, we identify an opportunity to speed up the end-to-end runtime of two such compute bound applications by incorporating approximate linear algebra techniques. Particularly, we apply approximate matrix multiplication to artificial Neural Networks (NNs) for image classification and to the robotics problem of Distributed Simultaneous Localization and Mapping (DSLAM). Expanding upon recent sampling-based Monte Carlo approximation strategies for matrix multiplication, we develop updated theoretical bounds, and an adaptive error prediction strategy. We then apply these techniques in the context of NNs and DSLAM increasing the speed of both applications by 15-20% while maintaining a 97% classification accuracy for NNs running on the MNIST dataset and keeping the average robot position error under 1 meter (vs 0.32 meters for the exact solution). However, both applications experience variance in their results. This suggests that Monte Carlo matrix multiplication may be an effective technique to reduce the memory and computational burden of certain algorithms when used carefully, but more research is needed before these techniques can be widely used in practice.
The marketplace for general-purpose microprocessors offers hundreds of functionally similar models, differing by traits like frequency, core count, cache size, memory bandwidth, and power consumption. Their performance depends not only on microarchitecture, but also on the nature of the workloads being executed. Given a set of intended workloads, the consumer needs both performance and price information to make rational buying decisions. Many benchmark suites have been developed to measure processor performance, and their results for large collections of CPUs are often publicly available. However, repositories of benchmark results are not always helpful when consumers need performance data for new processors or new workloads. Moreover, the aggregate scores for benchmark suites designed to cover a broad spectrum of workload types can be misleading. To address these problems, we have developed a deep neural network (DNN) model, and we have used it to learn the relationship between the specifications of Intel CPUs and their performance on the SPEC CPU2006 and Geekbench 3 benchmark suites. We show that we can generate useful predictions for new processors and new workloads. We also cross-predict the two benchmark suites and compare their performance scores. The results quantify the self-similarity of these suites for the first time in the literature. This work should discourage consumers from basing purchasing decisions exclusively on Geekbench 3, and it should encourage academics to evaluate research using more diverse workloads than the SPEC CPU suites alone.
We demonstrate a fully integrated system-on-chip (SoC) optimized for insect-scale flapping-wing pico-aerial vehicles. The SoC is able to meet the stringent weight, power, and real-time performance demands of autonomous flight for a bee-sized robot. The entire integrated system with embedded voltage regulation, data conversion, clock generation, as well as both general-purpose and accelerated computing units, weighs less than 3 mg after die thinning. It is self-contained and can be powered directly off of a lithium battery. Measured results show open-loop wing flapping controlled by the SoC and improved energy efficiency through the use of hardware acceleration and supply resilience through the use of adaptive clocking.
This paper describes a power electronics unit (PEU) for an insect-scale flapping-wing robot. Three power saving techniques used in the actuator driver of the PEU — envelope tracking, dynamic common mode, and charge sharing — reduce power consumption while retaining weight benefits of an inductor-less linear driver. A pair of actuator driver ICs energize four 15nF capacitor loads, which represent the piezoelectric actuators of a flapping-wing robot. The PEU consumes 290mW, which translates to 37% lower power compared to a design without these power saving techniques.
We demonstrate a battery-powered multi-chip system optimized for insect-scale flapping wing robots that meets the tight weight limit and real-time performance demands of autonomous flight. Measured results show open-loop wing flapping driven by a power electronics unit and energy efficiency improvements via hardware acceleration.
Efficient actuation control of flapping-wing microrobots requires a low-power frequency reference with good absolute accuracy. To meet this requirement, we designed a fully-integrated 10MHz relaxation oscillator in a 40nm CMOS process. By adaptively biasing the continuous-time comparator, we are able to achieve a power consumption of 20μW, a 68% reduction to the conventional fixed bias design. A built-in self-calibration controller enables fast post-fabrication calibration of the clock frequency. Measurements show a frequency drift of 1.2% as the battery voltage changes from 3V to 4.1V.
A battery-powered aerial microrobotic System-on-Chip (SoC) has stringent weight and power budgets, which requires fully-integrated solutions for both clock generation and voltage regulation. Supply-noise resilience is important yet challenging for such SoC systems due to a non-constant battery discharge profile and load current variability. This paper proposes an adaptive-frequency clocking scheme that can tolerate supply noise and improve performance when implemented with an integrated voltage regulator (IVR). Measurements from a `brain' SoC, implemented in 40nm CMOS, demonstrate 2× performance improvement with adaptive-frequency clocking over conventional fixed-frequency clocking. Combining adaptive-frequency clocking with open-loop IVR extends error-free operation to a wider battery voltage range (2.8 to 3.8V) with higher average performance.
Recent years have seen an increased interest in Micro Air Vehicles (MAVs) with applications ranging from search-and-rescue to mimicking insect behavior. MAVs have several challenging design requirements that impact processor design. These include real time processing demands and severe power/weight budgets. In this paper, we describe the characteristics of MAV applications and propose hardware acceleration to improve the power, performance, and portability of MAV system designs.