(Regular Paper) 24 3, 2019 5 (JBE Vol. 24, No. 3, May 2019) https://doi.org/10.5909/jbe.2019.24.3.495 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) a), a), a), b), b), b), a) Object Tracking Method using Deep Learning and Kalman Filter Gicheol Kim a), Sohee Son a), Minseop Kim a), Jinwoo Jeon b), Injae Lee b), Jihun Cha b), and Haechul Choi a) CNN(Convolutional Neural Networks), RNN(Recurrent Neural Networks). CNN., CNN R-CNN, YOLO(You Only Look Once), SSD(Single Shot Multi-box Detector)... YOLO v2, YOLO v2 7.7% IoU FHD 20 fps. Abstract Typical algorithms of deep learning include CNN(Convolutional Neural Networks), which are mainly used for image recognition, and RNN(Recurrent Neural Networks), which are used mainly for speech recognition and natural language processing. Among them, CNN is able to learn from filters that generate feature maps with algorithms that automatically learn features from data, making it mainstream with excellent performance in image recognition. Since then, various algorithms such as R-CNN and others have appeared in object detection to improve performance of CNN, and algorithms such as YOLO(You Only Look Once) and SSD(Single Shot Multi-box Detector) have been proposed recently. However, since these deep learning-based detection algorithms determine the success of the detection in the still images, stable object tracking and detection in the video requires separate tracking capabilities. Therefore, this paper proposes a method of combining Kalman filters into deep learning-based detection networks for improved object tracking and detection performance in the video. The detection network used YOLO v2, which is capable of real-time processing, and the proposed method resulted in 7.7% IoU performance improvement over the existing YOLO v2 network and 20 fps processing speed in FHD images. Keyword : YOLO, Kalman filter, Object tracking, CNN, Deep learning Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
(JBE Vol. 24, No. 3, May 2019). ICT(Information and Communications Technologies),. 2020 115, 10% 2020 26. [1][2].,,.,.,.,.,.. a) (Information of Departments, Hanbat National University) b) (ETRI) Corresponding Author : (Haechul Choi) E-mail: choihc@hanbat.ac.kr Tel: +82-42-821-1149 ORCID: http://orcid.org/0000-0002-7594-0828 2019 () (No. 19PCRD-C139687-03, EO/IR ). This research was supported by a grant from Police Science and Technology R&D Program funded by Korean National Police Agency. [No.19PCRD-C139687-03, Development and Field Demonstration Test of Surveillance System using radar and EO/IR for detecting illegal Flight of UAVs] Manuscript received April 15, 2019; Revised May 24, 2019; Accepted May 24, 2019. [3][4]..,,. YOLO(You Only Look Once) v2 (Kalman filter). YOLO v2 (Deep learning) [8][9]. YOLO v2,. GPU 40-90 fps.,.. CNN(Convolutional Neural Networks) [5][6][7] YOLO v2, [10][11][12][13].,. YOLO v2.. 2 CNN. 3, 4. 5.
. 1. [14][15]. 1950 XOR, [16], RBM (Restricted Boltzmann Machine) [17]. CNN (Feature Extraction).,.,. CNN.. CNN 2014 R-CNN(Region-CNN) [18]. R-CNN SIFT (Scale Invariant Feature Transform) [19], HOG(Histogram of Oriented Gradient) [20], Optical Flow [21], haar-like features [22], low-level feature. R-CNN,. R-CNN R-CNN, SPP-Net [23], Fast R-CNN [24], Faster R-CNN [25]. R-CNN.,,. SSD(Single Shot Multi-box Detector) [26], YOLO [8], YOLO v2 [9]. YOLO v2. YOLO v2. YOLO v2 1. Table 1. The detection system network Layer Filters Size Input Output 0 conv 32 3 3/1 416 416 3 416 416 32 1 max 2 2/2 416 416 32 208 208 32 2 conv 64 3 3/ 1 208 208 32 208 208 64 3 max 2 2/2 208 208 64 104 104 64 4 conv 128 3 3/1 104 104 64 104 104 128 5 conv 64 11/1 104 104 128 104 104 64 6 conv 128 3 3/1 104 104 64 104 104 128 7 max 2 2/2 104 104 128 52 52 128 8 conv 256 3 3/1 52 52 128 52 52 256 9 conv 128 1 1/1 52 52 256 52 52 128 10 conv 256 3 3/1 52 52 128 52 52 256 11 max 2 2/2 52 52 256 26 26 256 12 conv 512 3 3/1 26 26 256 26 26 512 13 conv 256 1 1/1 26 26 512 26 26 256 14 conv 512 3 3/1 26 26 256 26 26 512 15 conv 256 1 1/1 26 26 512 26 26 256 16 conv 512 3 3/1 26 26 256 26 26 512 17 max 2 2/2 26 26 512 13 13 512 18 conv 1024 3 3/1 13 13 512 13 13 1024 19 conv 512 1 1/1 13 13 1024 13 13 512 20 conv 1024 3 3/1 13 13 512 13 13 1024 21 conv 512 1 1/1 13 13 1024 13 13 512 22 conv 1024 3 3/1 13 13 512 13 13 1024 23 conv 1024 3 3/1 13 13 1024 13 13 1024 24 conv 1024 3 3/1 13 13 1024 13 13 1024 25 route 16 26 conv 64 1 1/1 26 26 512 26 26 64 27 reorg /2 26 26 64 13 13 256 28 route 27 24 29 conv 1024 3 3/1 13 13 1280 13 13 1024 30 conv 30 1 1/1 13 13 1024 13 13 30 31 detection
(JBE Vol. 24, No. 3, May 2019),,. YOLO v2 1.,., 6.. ( ). 2..,,..,.,..,, 1., 1 1 ( ) ( ) ( ) ( ).. 2. 3 4. 1 1. Fig. 1. Overall flowchart of Kalman filter
2 H 1, ( ) (R) ( ).,. 3 2. ( ) 0 1, ( ) ( ).,.,,. 1, 2. 4.. 1...,.... 2. CNN YOLO v2. 2., YOLO v2.., 2. Fig. 2. System flowchart
(JBE Vol. 24, No. 3, May 2019) 3. Fig. 3. Flowchart of proposed method.. 3. YOLO v2 confidence. 0.4..,..,. 4,000, 500. 4 FHD, DJI Phantom 4 Professional. 16.04 Intel i7-6850k 3.6GHz CPU 32GB RAM NVIDIA geforce GTX. YOLO v2,,, 4. Fig. 4. Examples of training data
1080 Ti GPU, python 3.5.5, tensorflow [28] 1.10.0, opencv 3.4.0 [29]. IoU(Intersection over Union). IoU, 5., (ground truth) IoU. IoU 3, 5 IoU 0.5. 5. IoU IoU Fig. 5. Calculation of IoU and evaluation of the various IoU values 6 YOLO v2, IoU 0.595 FHD((Full High Definition) 24 fps. 7 YOLO v2 IoU. YOLO v2. 500, YOLO v2 75, 36. 6. YOLO v2 IoU Fig. 6. IoU results of YOLO v2 network 7. YOLO v2 IoU Fig. 7. Comparison of proposed method and YOLO v2 with IoU results
(JBE Vol. 24, No. 3, May 2019) 8. YOLO v2 Fig. 8. Examples of detection failure (YOLO v2) and tracking success (Proposed method). YOLO v2. IoU 0.672 FHD 20 fps. 8 YOLO v2. YOLO v2 YOLO v2.. CNN YOLO v2.. YOLO v2.,.,., YOLO v2 7.7%. FHD
YOLO v2 4 fps 20 fps. YOLO v2... DB.,. YOLO v2, ( 1km )..,.,. (References) [1] Teal Group, 2014 Market Profile and Forecast, World Unmanned aerial Vehicle Systems, 2014 [2] Choi Youngchul, Ahn Hyosung. (2015). Dron's current and technology development trends and prospects. The world of electricity, 64(12), 20-25. [3] Eric N. Johnson, Anthony J. Calise, Yoko Watanabe, Jincheol Ha, and James C. Neidhoefer, 2007, Real-Time Vision-Based Relative Aircraft Navigation, Journal of Aerospace Computing, Information, and Communication, Vol.4, pp.707-738 [4] John Lai, Luis Mejias, and Jason J. Ford, 2011, Airborne Vision-Based Collision-Detection System, Journal of Field Robotics, Vol.28, Issue 2, pp.137-157. [5] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. [6] Schmidhuber, Jurgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117. [7] Gidaris, Spyros, and Nikos Komodakis. "Object detection via a multi-region and semantic segmentation-aware cnn model." Proceedings of the IEEE International Conference on Computer Vision. 2015. [8] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [9] Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. [10] Brown, Robert Grover, and Patrick YC Hwang. Introduction to random signals and applied Kalman filtering. Vol. 3. New York: Wiley, 1992. [11] Ristic, Branko, Sanjeev Arulampalam, and Neil Gordon. "Beyond the Kalman filter." IEEE Aerospace and Electronic Systems Magazine 19.7 (2004): 37-38. [12] Haykin, Simon. Kalman filtering and neural networks. Vol. 47. John Wiley & Sons, 2004. [13] Peterfreund, Natan. "Robust tracking of position and velocity with Kalman snakes." IEEE transactions on pattern analysis and machine intelligence 21.6 (1999): 564-569. [14] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436. [15] Deng, Li, and Dong Yu. "Deep learning: methods and applications." Foundations and Trends in Signal Processing 7.3-4 (2014): 197-387. [16] Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, 2013. [17] Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." Proceedings of the 27th international conference on machine learning (ICML-10). 2010. [18] Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. [19] Lowe, David G. "Object recognition from local scale-invariant features." Computer vision, 1999. The proceedings of the seventh IEEE international conference on. Vol. 2. Ieee, 1999. [20] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. [21] Bouguet, Jean-Yves. "Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm." Intel Corporation 5.1-10 (2001): 4. [22] Lienhart, Rainer, and Jochen Maydt. "An extended set of haar-like features for rapid object detection." Proceedings. International Conference on Image Processing. Vol. 1. IEEE, 2002. [23] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916. [24] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international
(JBE Vol. 24, No. 3, May 2019) conference on computer vision. 2015. [25] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. [26] Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016. [27] Everingham, Mark, et al. "The pascal visual object classes (voc) challenge." International journal of computer vision 88.2 (2010): 303-338. [28] Abadi, Martín, et al. "Tensorflow: A system for large-scale machine learning." 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 2016. [29] Bradski, Gary, and Adrian Kaehler. Learning OpenCV: Computer vision with the OpenCV library." O'Reilly Media, Inc.", 2008. - 2017 2 : ( ) - 2019 2 : ( ) - ORCID : https://orcid.org/0000-0002-2091-4841 - :,,, - 2015 2 : ( ) - 2017 2 : ( ) - 2017 9 ~ : - ORCID : http://orcid.org/0000-0003-2499-492x - :,,, - 2018 2 : ( ) - 2018 3 ~ : - ORCID : https://orcid.org/0000-0003-4877-6388 - :,, - 2015 : - 2017 : - 2017 ~ : - ORCID : https://orcid.org/0000-0001-9934-1187 - :,,,
- 1997 : - 2001 : - 2001 ~ : - ORCID : https://orcid.org/0000-0002-1975-1838 - :,,, - 1993 : - 1996 : - 2002 : - 2003 ~ : / - ORCID : http://orcid.org/0000-0002-5257-014x - :,,, - 1997 : - 1999 : - 2004 : - 2004 9 ~ 2010 2 : (ETRI) - 2010 3 ~ : - ORCID : http://orcid.org/0000-0002-7594-0828 - :,,,