(JBE Vol. 24, No. 4, July 2019) (Special Paper) 24 4, (JBE Vol. 24, No. 4, July 2019) ISSN

(JBE Vol. 24, No. 4, July 2019) (Special Paper) 24 4, 2019 7 (JBE Vol. 24, No. 4, July 2019) https://doi.org/10.5909/jbe.2019.24.4.580 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) Sensor Fusion a), b), b) Camera and LiDAR Sensor Fusion for Improving Object Detection Jongseo Lee a), Mangyu Kim b), and Hakil Kim b) late fusion. one-stage YOLOv3, perspective matrix, K-means. calibration PnP- RANSAC,. Intersection over union (IoU),, IoU,. 5%. Abstract This paper focuses on to improving object detection performance using the camera and LiDAR on autonomous vehicle platforms by fusing detected objects from individual sensors through a late fusion approach. In the case of object detection using camera sensor, YOLOv3 model was employed as a one-stage detection process. Furthermore, the distance estimation of the detected objects is based on the formulations of Perspective matrix. On the other hand, the object detection using LiDAR is based on K-means clustering method. The camera and LiDAR calibration was carried out by PnP-Ransac in order to calculate the rotation and translation matrix between two sensors. For Sensor fusion, intersection over union(iou) on the image plane with respective to the distance and angle on world coordinate were estimated. Additionally, all the three attributes i.e; IoU, distance and angle were fused using logistic regression. The performance evaluation in the sensor fusion scenario has shown an effective 5% improvement in object detection performance compared to the usage of single sensor. Keyword : Object detection, deep learning, sensor fusion, camera-lidar calibration a) (The department of future vehicle engineering, Inha University) b) (The department of information and communication engineering, Inha University) Corresponding Author : (Hakil Kim) E-mail: hikim@inha.ac.kr Tel: +82-32-860-7385 ORCID: https://orcid.org/0000-0003-4232-3804 IPIU 2019. This work was supported by Korea Institute for Advancement of Technologe(KIAT) grant funded by the Korea Government(MOTIE)(N0002428, The Competency Development Program for Industry Specialist) Manuscript received May 7, 2019; Revised July 17, 2019; Accepted July 17, 2019. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection).. Convolution Neural Network(CNN) PeleeNet [1], RetinaNet [2], YOLOv3 [3], MobileNet [4]. PointSeg [5], Pixor [6] end-to-end.,... deep fusion. Ming Liang feature bird s eye view(bev) space projection 3D detection [7]. MV3D [8]. late fusion.. Pan Wei CNN, support vector machine(svm), score fuzzy logic [9]. 1,.,.,... 1, 1. Table 1. Comparison of sensor performance by environment Object detection Object classification Distance estimation Range of visibility Functionality in bad weather Functionality in bad lighting Camera LiDAR Radar 1. Fig. 1. Camera and LiDAR sensor mount position,,.. 1.,. one-stage network YOLO, SSD [10], RetinaNet two-stage network R-CNN [11]. One-stage network two-stage network

(JBE Vol. 24, No. 4, July 2019) 2. CNN [14] Fig. 2. Object tracking process using CNN with Hungarian data association.. One-stage detector localization classification. One-stage YOLO feature sparse grid classification bounding box regression. SSD anchor box. RetinaNet cross entropy loss focal loss. Focal loss,. one-stage two-stage. Two-stage. region proposal bounding box regression. R-CNN CNN., forward pass Fast R-CNN [12]. Region of Interest Pooling(RoIPool) subregion forward pass. Selective search. region proposal Faster R-CNN. Faster R-CNN [13] forward pass feature. Region Proposal Network convolution layer sliding window feature map anchor box bounding box score. R-CNN one-stage.. CNN Hungarian 2..,., -..,.,.

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection) 2. calibration calibration..,. calibration.,. calibration. calibration,.,,. calibration. calibration rotation translation matrix (1) projection. S scale factor, pc (2). K. R T. pw (3). projection. s p 3. [ ] c = K R T pw [ ] pc = u v 1 T [ ] pw = x y z.,. 1 T.,. point cloud point point cloud, classification.. K-means cloud point K. K,. Hierarchical., K-means. 4. 3 raw data early fusion, late fusion, deep fusion. Early fusion: raw data. Camera LiDAR RGB point cloud feature level RGB-D. Late fusion:,., Bayesian classifier, Global nearest neighbor, fuzzy logical model, Dempster Shafer theory. Deep fusion: raw data early fusion.

(JBE Vol. 24, No. 4, July 2019) 3. Fig. 3. Architectures of different sensor fusion schemes neural network. Early fusion Deep fusion,. Late fusion computing power. s.t. ln ( _ ) odds ratio = z P odds _ ratio = 1 - P 1 Z - P = P e - 1 P = Z 1 + e -. 5. Logistic Regression., X Y.,.. 0 1. (5) odds ratio (4) z (6) P (7). 1. 4. late fusion. raw. IoU, 3D. IoU 3D bounding box projection 2D bounding box bounding box IoU....., IoU.

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection) 2... one-stage YOLO v3. COCO dataset dataset. YOLOv3 Pascal Titan X 30FPS, H/W Nvidia Jetson Xavier Nvidia GTX 1080ti.. 4. Fig. 4. Overview of the proposed fusion system 3.,. x, y, z., x, y, z. perspective transform matrix.,. 5 point cloud 5. Fig. 5. Example of measuring sensor values used in matrix calculations.... calibration

(JBE Vol. 24, No. 4, July 2019) 6. Fig. 6. Distance estimation and correction on images rotation matrix translation matrix.,. x, y, z projection (8) (9). ( - origin estimated ) v v v æ ö D x = ç ç estimated size è ø ( - origin estimated ) u u u æ ö D y = ç ç estimated size è ø x y x, y. u size v size. u origin v origin x y u estimated v estimated x, y, z projection. x estimated y estimated x, y. x, y x, y x, y. 6 perspective transform matrix projection, projection. 4.. 7 cloud point. classification., raw, 10m, 30m point cloud.,.,. cloud point K- means point.,. 8 7. Fig. 7. Object detection process using LiDAR

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection) 8. point cloud Fig. 8. Arrangement of point clouds according to vehicle location 9. Fig. 9. Define distance and angle in world coordinates.,. 5. -. IoU,. 3D bounding box projection IoU, 2D bounding box. IoU,.,. bounding box 2D. bounding box, bounding box x, y, z 10. Fig. 10. Projection detected object to the image plane using LiDAR. x, y, z x, y, z 9., IoU. bounding box 3D. 2D bounding box 10,

(JBE Vol. 24, No. 4, July 2019) 11. layers Fig. 11. Layers used for logistic regression 12. ( ). ( ), ( ) Fig. 12. Camera object detection (left), LiDAR object detection (middle), sensor fusion result (right) 2D., projection IoU.,. 11. IoU,. 0 1. layer 3 fully connected layer.. i30. SEKONIX SF3324-10X(120 ), Ouster os1 64. Intel(R) Core(TM) i7-7700k CPU @ 4.20GHz Nvidia GTX 1080ti. Robot Operating Sys- tem [15] (ROS) ROS bag.,.. 12, 12..

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection) 13. Fig. 13. Case of miss detection 2. Table 2. Performance analysis of camera and LiDAR detection and fusion precision recall Camera 0.901 0.901 LiDAR 0.824 0.8 Fusion 0.952 0.925,. bag. (10) (11),, recall precision. 2316. 2 precision 5%, recall 2.2%. 0 2316 230. 13 1... 68 408.,.,. 68, 111.,,. 30fps, cloud points 20Hz raw data. 1 3 61fps. 18Hz cloud points. 18Hz...,

(JBE Vol. 24, No. 4, July 2019)..,. raw data deep fusion. (References) [1] Robert J. Wang, Xiang Li and Charles X. Ling, Pelee: A Real-Time Object Detection System on Mobile Devices, Advances in neural information processing system, Montreal, Canada, pp.1963-1972, 2018. [2] Tsung-Yi Lin, Priya Goyal and Ross Girshick, et al., Focal Loss for Dense Object Detection, Proceedings of the IEEE international conference on computer vision, Venice, Italy, pp.2980-2988, 2017. [3] Joseph Redmon and Ali Farhadi, YOLOv3: An Incremental Improvement, arxiv preprint arxiv:1804.02767, 2018. [4] Mark Sandler, Andrew Howard and Menglong Zhu, et al., Mobile NetV2: Inverted Residuals and Linear Bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp.4510-4520, 2018. [5] Yuan Wang, Tianyue Shi and Peng Yun, et al., PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud, arxiv preprint arxiv:1807.06288, 2018. [6] Bin Yang, Wenjie Luo and Raquel Urtasum, PIXOR: Real-time 3D Object Detection from Point Clouds, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, pp.7652-7660, 2018. [7] Ming Liang, Bin Yang and Shenlong Wang, et al., Deep Continuous Fusion for Multi-Sensor 3D Object Detection, Proceedings of the European Conference on Computer Vision, Munich, Germany, pp.641-656, 2018. [8] Xiaozhi Chen, Huimin Ma, Ji Wan, et al., Multi-View 3D Object Detection Network for Autonomous Driving, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, pp.1907-1915, 2017 [9] Pan Wei, Lucas Cagle and Tasmia Reza, et al., LiDAR and Camera Detection Fusion in a Real-Time Industrial Multi-Sensor Collision Avoidance System, Electronics, Vol.7, No.84, 2018 [10] Wei Liu, Dragomir Anguelov and Dumiru Erhan, et al., SSD: Single Shot MultiBox Detector, Proceedings of the European Conference on Computer Vision, Amsterdam, Netherlands, pp.21-37, 2016. [11] Ross Girshick, Jeff Donahue, Trevor Darrell, et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, pp.580-587, 2014. [12] Ross Girshick, Fast R-CNN, Proceedings of the IEEE international conference on computer vision, Santiago, Chile, pp.1440-1448, 2015. [13] Shaoqing Ren, Kaiming He, Ross Girshick, et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Advances in neural information processing system, Montreal, Canada, pp.91-99, 2015. [14] Liu Mingjie, Cheng Bin Jin and Xuenan Cui, et al., Online multiple object tracking using confidence score-based appearance model learning and hierarchical data association, IET Computer Vision, Vol.13, No.3, pp.312-318, April 2019. [15] Morgan Quigley, Brian Gerkey and Ken Conley, et al., ROS: an open-source Robot Operating System, ICRA workshop on open source software, Vol.3, No.3.2, p.5, 2009. - 2017 : - 2018 ~ : - ORCID : https://orcid.org/0000-0003-3812-7675 - :,,,,

2 : Sensor Fusion (Jongseo Lee et al.: Camera and LiDAR Sensor Fusion for Improving Object Detection) - 2017 : - 2017 ~ : - ORCID : https://orcid.org/0000-0003-4813-4182 - :,,,, - 1983 : () - 1985 : Purdue /( ) - 1990 : Purdue /( ) - 1990 ~ : - ORCID : https://orcid.org/0000-0003-4232-3804 - :,,,,