Efficient computing for autonomous navigation using algorithm-and-hardware co-design

Zhengdong Zhang

Electrical Engineering & Computer Science (EECS)
Energy-Efficient Multimedia Systems Group (EEMS)

Watch the video.
Note, this video is part of the MTL Seminar Series.

Abstract

Autonomous navigation of drones and robots require several key functions, such as visual perception, localization, mapping, and path planning. The state-of-the-art algorithms for these functions typically have high computational complexity and require powerful computers to provide real-time throughput. As a result, deploying them in miniature robots and drones with limited battery and computation power can be challenging.

This thesis proposes various forms of algorithm and hardware co-design that enable these algorithms to be computed on embedded platforms that can be mounted on small drones and vehicles. The defense will focus on one of the tasks, robotic exploration, which studies how to rapidly reduce environment uncertainty and shorten the exploration time. Unfortunately, principled algorithms based on rigorous information-theoretic metrics, such as maximizing Shannon mutual information along the exploration path, are computationally extremely demanding.

We first show a novel algorithm called Fast computation of Shannon Mutual Information (FSMI) that computes the Shannon mutual information between perspective range measurements and the environment, which is a widely used principled objective to guide the search for efficient exploration trajectories. The FSMI algorithm avoids the numerical integration in the previously known algorithm. It also introduces a few approximation techniques, such as Gaussian truncation and tabulation of computation. As a result, FSMI is three orders of magnitude faster than the previous state-of-the-art algorithm for the computation of Shannon mutual information.

Second, we extend the FSMI algorithm to compute the mutual information in a compressed 3D environment represented by OctoMap. Instead of uncompressing the input first and then running the FSMI algorithm, the new algorithm directly performs computation on the compressed input that has a significantly shorter length, leading to an order of magnitude acceleration of FSMI computation in a typical 3D environment.

Finally, we present an FPGA architecture designed for the FSMI algorithm. The design consists of a novel memory banking method that increases the bandwidth of the memory so that multiple FSMI cores can run in parallel while maintaining high utilization. A novel arbiter is proposed to resolve the memory read conflicts between multiple cores within one clock cycle. The final design on FPGA achieves more than 100x higher throughput compared with a CPU while consuming less than 1/10 of the power.