SCIENCE CHINA Information Sciences, Volume 62, Issue 11: 212102(2019) https://doi.org/10.1007/s11432-019-9932-3

## Accelerating DNN-based 3D point cloud processing for mobile computing

• AcceptedJun 3, 2019
• PublishedSep 19, 2019
Share
Rating

### Abstract

3D point cloud data, which are produced by various 3D sensors such as LIDAR and stereo cameras, have been widely deployed by industry leaders such as Google, Uber, Tesla, and Mobileye, for mobile robotic applications such as autonomous driving and humanoid robots. Point cloud data, which are composed of reliable depth information, can provide accurate location and shape characteristics for scene understanding, such as object recognition and semantic segmentation. However, deep neural networks (DNNs), which directly consume point cloud data, are particularly computation-intensive because they have to not only perform multiplication-and-accumulation (MAC) operations but also search neighbors from the irregular 3D point cloud data. Such a task goes beyond the capabilities of general-purpose processors in real-time to figure out the solution as the scales of both point cloud data and DNNs increase from application to application. We present the first accelerator architecture that dynamically configures the hardware on-the-fly to match the computation of both neighbor point search and MAC computation for point-based DNNs. To facilitate the process of neighbor point search and reduce the computation costs, a grid-based algorithm is introduced to search neighbor points from a local region of grids. Evaluation results based on the scene recognition and segmentation tasks show that the proposed design harvests 16.4$\times$ higher performance and saves 99.95% of energy than an NVIDIA Tesla K40 GPU baseline in point cloud scene understanding applications.

### References

[1] Gallardo N, Gamez N, Rad P, et al. Autonomous decision making for a driver-less car. In: Proceedings of IEEE System of Systems Engineering Conference (SoSE), Waikoloa, 2017. 1--6. Google Scholar

[2] Lin S C, Zhang Y, Hsu C H, et al. The architectural implications of autonomous driving: constraints and acceleration. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2018. 751--766. Google Scholar

[3] Kuindersma S, Deits R, Fallon M. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton Robot, 2016, 40: 429-455 CrossRef Google Scholar

[4] Wang X J, Zhou Y F, Pan X, et al. A robust 3D point cloud skeleton extraction method (in Chinese). Sci Sin Inform, 2017, 47: 832--845. Google Scholar

[5] Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1: 4. Google Scholar

[6] Qi C R, Yi L, Su H, et al. Pointnet+: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Neural Information Processing Systems, 2017. 5099--5108. Google Scholar

[7] Vazou N, Seidel E L, Jhala R. Refinement types for Haskell. SIGPLAN Not, 2014, 49: 269-282 CrossRef Google Scholar

[8] Chen Y H, Emer J, Sze V. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of ACM SIGARCH Computer Architecture News, 2016. 367--379. Google Scholar

[9] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[10] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[11] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 945--953. Google Scholar

[12] Arsalan Soltani A, Huang H, Wu J, et al. Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1511--1519. Google Scholar

[13] Qi C R, Su H, Niessner M, et al. Volumetric and multi-view cnns for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5648--5656. Google Scholar

[14] Zhou Y, Tuzel O. Voxelnet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4490--4499. Google Scholar

[15] Hua B S, Tran M K, Yeung S K. Pointwise convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 984--993. Google Scholar

[16] Song L, Wang Y, Han Y, et al. C-brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In: Proceedings of Design Automation Conference (DAC), 2016. 1--6. Google Scholar

[17] Wu Z, Song S, Khosla A, et al. 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1912--1920. Google Scholar

[18] Armeni I, Sener O, Zamir A R, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1534--1543. Google Scholar

[19] Muralimanohar N, Balasubramonian R, Jouppi N P. CACTI 6.0: a tool to model large caches. HP Laboratories, 2009, 22--31. Google Scholar

• Figure 1

(Color online) Illustrative example of point-based DNNs for point cloud data. (a) An instance of mobile robotic applications; (b) a point-based DNN.

• Figure 2

(Color online) Neighbor pixels/points. (a) Neighbor pixels are regular in conventional Conv layers. (b) Irregular neighbors within a radius $r$ in a convolution-like layer. (c) Irregular neighbors within a kernel size $K$ in a pointwise layer. Both (b) and (c) are illustrated for a 2D example and can be smoothly extended to 3D metric space according to the formulations in (b) and (c), respectively.

•

Algorithm 1 Grid-based neighbor point search

ELSIF(Pointwise) $k_x$= 1 : $K_x$; $k_y$ = 1 : $K_y$ $p_{gx}=p_x-K_x/2+k_x$, $p_{gy}=p_y-K_y/2+k_y$;

$x_i=(-1/(2g)):(1/(2g$)), $y_i=(-1/(2g)):(1/(2g$)) $Mn_x~\leq~p_{gx}+x_i$.$g~\leq~Mx_x$, $Mn_y$ $\leq$ $p_{gy}+y_i$.$g~\leq~Mx_y$ $addr=(p_{gx}+x_i$.$g-Mn_x)/g+((p_{gy}+y_i$.$g-Mn_y)/g$).($Mx_x-Mn_x)/g$;

$t\_{\rm~out}$ =

Retrieve (addr, $S$, $C$, $G$); out$+=t\_{\rm~out}$; cnt+ = Count($t\_{\rm~out}$); $i=1:$ Count($t\_{\rm~out}$) $\vert$$p-t\_{\rm~out}(i)\vert$$\leq$$K$/2 out+ = $t\_{\rm~out}(i)$; cnt+;

//Initialization

Inputs: $P$: input points, $g$: a grid size;

Find the minmun/maximum boundaries of $P$ ($\langle~Mn_x,Mx_x~\rangle,\langle~Mn_y,Mx_y\rangle$);

Build grids based on $P$, $Mn$/$Mx$, and $g$;

Store grids of points, the start address and count of each grid;

//Retrieval

Inputs: $G$: grid based points; $\langle~S,~C\rangle$: start address and count ofgrids; $p$($p_x$,$p_y$): a center point; $g$: a grid size; $r$: a radius size;$K$($K_x$,$K_y$): a kernel size;

Outputs: out: neighbor results (out = 0), cnt: their count (cnt = 0);

if (Ball Query) then

for $x_i=(-r$/$g$) : $r$/$g$; $y_i=(-r$/$g$) : $r$/$g$

$p_{gx}=p_x+g$.$x_i$, $p_{gy}=p_y+g$.$y_i$;

if $Mn_x~\leq~p_{gx}~\leq~Mx_x$, $Mn_y~\leq~p_{gy}~\leq~Mx_y$ then

addr = ($p_{gx}-Mn_x$)/$g~+((p_{gy}-Mn_y$)/$g$).($Mx_x-Mn_x)/g$; $t\_{\rm~out}$ = Retrieve(addr, $S$, $C$, $G$);

for $i$=1 : Count($t\_{\rm~out}$)

if $\Vert~p-t\_{\rm~out}$($i$)$\Vert<r$ then

out$+=t\_{\rm~out}$($i$); cnt+;

end if

end for

end if

end for

• Table 1   Characteristics of benchmarks
 Description Network Total Neighbor Input Dataset abbreviation layer search layers points PointNet in scene recognition [5] PN$_{-}$r 7 0 1024 ModelNet40 [17] PointNet in semantic segmentation [5] PN$_{-}$s 8 0 2048 ShapeNet [17] PointNet+ in scene recognition [6] PNpp$_{-}$r 7 2 1024 ModleNet40 PointNet+ in semantic segmentation [6] PNpp$_{-}$s 10 2 2048 ShapeNet Pointwise CNN in scene recognition [15] Pw$_{-}$r 6 4 2048 ModelNet40 Pointwise CNN in semantic segmentation [15] Pw$_{-}$s 5 5 4096 S3DIS [18]
• Table 2   Normalized energy consumption compared with GPU
 PN$_{-}$r PN$_{-}$s PNpp$_{-}$r PNpp$_{-}$s Pw$_{-}$r Pw$_{-}$s Gmean GPU 1 1 1 1 1 1 1 PointPU 0.1% 0.1% 0.1% 6.3E$-$5 0.1% 1.5E$-$5 0.05%
• Table 3   Detailed characteristics of PointPU against to the GPU baseline
 Platform NVIDIA Tesla K40 PointPU Technology 28 nm 45 nm, 1.1 V Frequency (MHz) 745 700 Average power (W) 89 0.726 Area – 2.67 mm$^{2}$

Citations

• #### 0

Altmetric

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有