SCIENCE CHINA Information Sciences, Volume 62, Issue 11: 212102(2019) https://doi.org/10.1007/s11432-019-9932-3

Accelerating DNN-based 3D point cloud processing for mobile computing

More info
  • ReceivedApr 13, 2019
  • AcceptedJun 3, 2019
  • PublishedSep 19, 2019


3D point cloud data, which are produced by various 3D sensors such as LIDAR and stereo cameras, have been widely deployed by industry leaders such as Google, Uber, Tesla, and Mobileye, for mobile robotic applications such as autonomous driving and humanoid robots. Point cloud data, which are composed of reliable depth information, can provide accurate location and shape characteristics for scene understanding, such as object recognition and semantic segmentation. However, deep neural networks (DNNs), which directly consume point cloud data, are particularly computation-intensive because they have to not only perform multiplication-and-accumulation (MAC) operations but also search neighbors from the irregular 3D point cloud data. Such a task goes beyond the capabilities of general-purpose processors in real-time to figure out the solution as the scales of both point cloud data and DNNs increase from application to application. We present the first accelerator architecture that dynamically configures the hardware on-the-fly to match the computation of both neighbor point search and MAC computation for point-based DNNs. To facilitate the process of neighbor point search and reduce the computation costs, a grid-based algorithm is introduced to search neighbor points from a local region of grids. Evaluation results based on the scene recognition and segmentation tasks show that the proposed design harvests 16.4$\times$ higher performance and saves 99.95% of energy than an NVIDIA Tesla K40 GPU baseline in point cloud scene understanding applications.


[1] Gallardo N, Gamez N, Rad P, et al. Autonomous decision making for a driver-less car. In: Proceedings of IEEE System of Systems Engineering Conference (SoSE), Waikoloa, 2017. 1--6. Google Scholar

[2] Lin S C, Zhang Y, Hsu C H, et al. The architectural implications of autonomous driving: constraints and acceleration. In: Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2018. 751--766. Google Scholar

[3] Kuindersma S, Deits R, Fallon M. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton Robot, 2016, 40: 429-455 CrossRef Google Scholar

[4] Wang X J, Zhou Y F, Pan X, et al. A robust 3D point cloud skeleton extraction method (in Chinese). Sci Sin Inform, 2017, 47: 832--845. Google Scholar

[5] Qi C R, Su H, Mo K, et al. Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 1: 4. Google Scholar

[6] Qi C R, Yi L, Su H, et al. Pointnet+: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Neural Information Processing Systems, 2017. 5099--5108. Google Scholar

[7] Vazou N, Seidel E L, Jhala R. Refinement types for Haskell. SIGPLAN Not, 2014, 49: 269-282 CrossRef Google Scholar

[8] Chen Y H, Emer J, Sze V. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: Proceedings of ACM SIGARCH Computer Architecture News, 2016. 367--379. Google Scholar

[9] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems, 2012. 1097--1105. Google Scholar

[10] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770--778. Google Scholar

[11] Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 945--953. Google Scholar

[12] Arsalan Soltani A, Huang H, Wu J, et al. Synthesizing 3D shapes via modeling multi-view depth maps and silhouettes with deep generative networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 1511--1519. Google Scholar

[13] Qi C R, Su H, Niessner M, et al. Volumetric and multi-view cnns for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 5648--5656. Google Scholar

[14] Zhou Y, Tuzel O. Voxelnet: end-to-end learning for point cloud based 3D object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 4490--4499. Google Scholar

[15] Hua B S, Tran M K, Yeung S K. Pointwise convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 984--993. Google Scholar

[16] Song L, Wang Y, Han Y, et al. C-brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In: Proceedings of Design Automation Conference (DAC), 2016. 1--6. Google Scholar

[17] Wu Z, Song S, Khosla A, et al. 3D shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 1912--1920. Google Scholar

[18] Armeni I, Sener O, Zamir A R, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 1534--1543. Google Scholar

[19] Muralimanohar N, Balasubramonian R, Jouppi N P. CACTI 6.0: a tool to model large caches. HP Laboratories, 2009, 22--31. Google Scholar

  • Figure 1

    (Color online) Illustrative example of point-based DNNs for point cloud data. (a) An instance of mobile robotic applications; (b) a point-based DNN.

  • Figure 2

    (Color online) Neighbor pixels/points. (a) Neighbor pixels are regular in conventional Conv layers. (b) Irregular neighbors within a radius $r$ in a convolution-like layer. (c) Irregular neighbors within a kernel size $K$ in a pointwise layer. Both (b) and (c) are illustrated for a 2D example and can be smoothly extended to 3D metric space according to the formulations in (b) and (c), respectively.


    Algorithm 1 Grid-based neighbor point search

    ELSIF(Pointwise) $k_x$= 1 : $K_x$; $k_y$ = 1 : $K_y$ $p_{gx}=p_x-K_x/2+k_x$, $p_{gy}=p_y-K_y/2+k_y$;

    $x_i=(-1/(2g)):(1/(2g$)), $y_i=(-1/(2g)):(1/(2g$)) $Mn_x~\leq~p_{gx}+x_i$.$g~\leq~Mx_x$, $Mn_y$ $\leq$ $p_{gy}+y_i$.$g~\leq~Mx_y$ $addr=(p_{gx}+x_i$.$g-Mn_x)/g+((p_{gy}+y_i$.$g-Mn_y)/g$).($Mx_x-Mn_x)/g$;

    $t\_{\rm~out}$ =

    Retrieve (addr, $S$, $C$, $G$); out$+=t\_{\rm~out}$; cnt+ = Count($t\_{\rm~out}$); $i=1:$ Count($t\_{\rm~out}$) $\vert$$p-t\_{\rm~out}$($i$)$\vert$$\leq$$K$/2 out+ = $t\_{\rm~out}(i)$; cnt+;


    Inputs: $P$: input points, $g$: a grid size;

    Find the minmun/maximum boundaries of $P$ ($\langle~Mn_x,Mx_x~\rangle,\langle~Mn_y,Mx_y\rangle$);

    Build grids based on $P$, $Mn$/$Mx$, and $g$;

    Store grids of points, the start address and count of each grid;


    Inputs: $G$: grid based points; $\langle~S,~C\rangle$: start address and count ofgrids; $p$($p_x$,$p_y$): a center point; $g$: a grid size; $r$: a radius size;$K$($K_x$,$K_y$): a kernel size;

    Outputs: out: neighbor results (out = 0), cnt: their count (cnt = 0);

    if (Ball Query) then

    for $x_i=(-r$/$g$) : $r$/$g$; $y_i=(-r$/$g$) : $r$/$g$

    $p_{gx}=p_x+g$.$x_i$, $p_{gy}=p_y+g$.$y_i$;

    if $Mn_x~\leq~p_{gx}~\leq~Mx_x$, $Mn_y~\leq~p_{gy}~\leq~Mx_y$ then

    addr = ($p_{gx}-Mn_x$)/$g~+((p_{gy}-Mn_y$)/$g$).($Mx_x-Mn_x)/g$; $t\_{\rm~out}$ = Retrieve(addr, $S$, $C$, $G$);

    for $i$=1 : Count($t\_{\rm~out}$)

    if $\Vert~p-t\_{\rm~out}$($i$)$\Vert<r$ then

    out$+=t\_{\rm~out}$($i$); cnt+;

    end if

    end for

    end if

    end for

  • Table 1   Characteristics of benchmarks
    Description Network Total Neighbor Input Dataset
    abbreviationlayer search layerspoints
    PointNet in scene recognition [5] PN$_{-}$r 7 0 1024 ModelNet40 [17]
    PointNet in semantic segmentation [5] PN$_{-}$s 8 0 2048 ShapeNet [17]
    PointNet+ in scene recognition [6] PNpp$_{-}$r 7 2 1024 ModleNet40
    PointNet+ in semantic segmentation [6] PNpp$_{-}$s 10 2 2048 ShapeNet
    Pointwise CNN in scene recognition [15] Pw$_{-}$r 6 4 2048 ModelNet40
    Pointwise CNN in semantic segmentation [15] Pw$_{-}$s 5 5 4096 S3DIS [18]
  • Table 2   Normalized energy consumption compared with GPU
    PN$_{-}$r PN$_{-}$s PNpp$_{-}$r PNpp$_{-}$s Pw$_{-}$r Pw$_{-}$s Gmean
    GPU 1 1 1 1 1 1 1
    PointPU 0.1% 0.1% 0.1% 6.3E$-$5 0.1% 1.5E$-$5 0.05%
  • Table 3   Detailed characteristics of PointPU against to the GPU baseline
    Platform NVIDIA Tesla K40 PointPU
    Technology 28 nm 45 nm, 1.1 V
    Frequency (MHz) 745 700
    Average power (W) 89 0.726
    Area 2.67 mm$^{2}$

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备18024590号-1       京公网安备11010102003388号