(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

(Special Paper) 23 2, 2018 3 (JBE Vol. 23, No. 2, March 2018) https://doi.org/10.5909/jbe.2018.23.2.246 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) CNN a), a), a) CNN-Based Hand Gesture Recognition for Wearable Applications Hyeon-Chul Moon a), Anna Yang a), and Jae-Gon Kim a) NUI(Natural User Interface). MPEG IoT(Internet of Things) IoMT(Internet of Media Things). IoMT.,. IoMT (use case) CNN(Convolutional Neural Network). (depth), CNN,. 95%. Abstract Hand gestures are attracting attention as a NUI (Natural User Interface) of wearable devices such as smart glasses. Recently, to support efficient media consumption in IoT (Internet of Things) and wearable environments, the standardization of IoMT (Internet of Media Things) is in the progress in MPEG. In IoMT, it is assumed that hand gesture detection and recognition are performed on a separate device, and thus provides an interoperable interface between these modules. Meanwhile, deep learning based hand gesture recognition techniques have been recently actively studied to improve the recognition performance. In this paper, we propose a method of hand gesture recognition based on CNN (Convolutional Neural Network) for various applications such as media consumption in wearable devices which is one of the use cases of IoMT. The proposed method detects hand contour from stereo images acquisitioned by smart glasses using depth information and color information, constructs data sets to learn CNN, and then recognizes gestures from input hand contour images. Experimental results show that the proposed method achieves the average 95% hand gesture recognition rate. Keyword : MPEG-IoMT, Hand Gesture, Hand Contour, CNN, Gesture Recognition a) (Korea Aerospace University, School of Electronics and Information Engineering) Corresponding Author : (Jae-Gon Kim) E-mail: jgkim@kau.ac.kr Tel: +82-2-300-0414 ORCID: http://orcid.org/0000-0003-3686-4786 [10077958]. 2017. This work was supported by National Standards Technology Promotion Program of Korean Agency for Technology and Standards (KATS) grant funded by the Ministry of Trade, Industry and Energy (MOTIE, Korea) (10077958). Parts of this work have been published in the 2017 Fall Conference of the Korean Institute of Broadcasting and Media Engineers. Manuscript received January 12, 2018; Revised February 5, 2018; Accepted February 14, 2018. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.

2 : CNN (Hyeon-Chul Moon et al.: CNN-Based Hand Gesture Recognition for Wearable Applications). NUI (Natural User Interface). MPEG IoT(Internet of Thing) IoMT(Internet of Media Things), (use case). NUI [1][2][3]. [4][5], (Deep Learning) [5][6]. MPEG IoMT CNN(Convolutional Neural Network). MPEG IoMT, API(Application Programming Interface) [5]. CNN.. CNN [6].. 2 IoMT (use case), 3 CNN. 4, 5.. IoMT 1 [2][3]. NUI [8]. IoMT 1 (User), MThings(Media Things), (Processing Unit: PU),. 1. Fig 1. A scenario hand gesture based wearable applications 1. (Detection Module). [1].,,.. IoMT PU PU API

. XML API, CNN. III. CNN 1.. 2, (RGB ) (depth). (30 ~ 50cm),. (morphology). 2. 3 [1][2][3]. 2. Fig 2. Procedure of hand contour detection 3. Fig 3. An example of the detected hand contour 2. CNN CNN. CNN, (feature) (classification). CNN (con- volution), (Pooling), (Fully-Connected: FC) [9]. 4 CNN. C1, C2 Convolution 5 5 1(stride= 1),. S1, S2 Pooling Max Pooling 2 2,. Max Pooling 5, 2 2. F1, F2 Fully-Connected,. 6 Fully- Connected, P1, P2, P3. 6, 7, P1~P10. Softmax

2 : CNN (Hyeon-Chul Moon et al.: CNN-Based Hand Gesture Recognition for Wearable Applications) 4. CNN Fig 4. Proposed CNN structure,( 1 ) 9 0.. (gradient descent), (momentum). (loss function),. [10]. 5. Max Pooling Fig 5. Example of Max Pooling 6. Fully-Connected Fig 6. Classification process of fully-connected layer 3. CNN CNN (weight),. (1), (2) t gradient, learning rate,, t. (1), gradient., gradient. (2), t+1..

4. Tensorflow, Theano, MXnet, Keras, [11]. S/W R. R MXnetR, Deepnet, H20,., MXnet, Deepnet. H20,.. IV. 8. 7 10 5 6,000, 3 1,000.,., 1 MxnetR, 2 3 MxnetR. 1. Table 1. Recognition accuracy comparisons among deep learning frameworks Framework Accuracy (%) MxnetR 95 Deepnet 94.7 H20 94.4 2., 5, 3., 7,000. 0.3%,. 2. Table 2. Recognition accuracy comparisons according to the size of data set and the application of momentum Methods Accuracy (%) Data set = 2,800 Momentum: not applied Data set = 5,600 Momentum: not applied Data set = 7,000 Momentum: not applied Data set = 7,000 Momentum: applied 77.2 88.4 94.7 95 7. Fig 7. A set of hand gestures used in the experiments 1., 1,000. 3 6. 3 Three 98.3%, Rice 91.7%. 0.5 ~ 6.6%, 10 95%.

2 : CNN (Hyeon-Chul Moon et al.: CNN-Based Hand Gesture Recognition for Wearable Applications) 3. Table 3. Recognition accuracy of each gesture label 1 (One) 2 (Two) 3 (Three) 4 (Four) 5 (Five) 93.1 96.1 98.3 95.6 95.2, CNN. 95%, MPEG IoMT. Accuracy (%) 6 (Okay) 7 (Promise) 8 (Rice) 9 (Scissor) 10 (Victory) 94.2 96.1 91.7 93.7 94.6 (References) Average accuracy = 95 % 4 CNN, CNN [12].,. [12] 93.8%, 1.2%. 4. Table 4. Comparison of recognition accuracy with the existing method Method Accuracy (%) Existing method [12] 93.8 Proposed method 95 V.. CNN., CNN., [1] A. Yang, S. Chun, H. Ko, J. G. Kim, Hand gesture description for wearable applications in M-IoTW, ISO/IEC JTC1/SC29/WG11 M38526, Geneva, Swiss, May. 2016. [2] A. Yang, S. Chun, and J-G. Kim, Detection and recognition of hand gesture for wearable applications in IoMTW, In Proc. ICACT 2017, pp. 598 601, Feb. 2017. [3] A. Yang, S. Chun, and J.-G. Kim, Detection and Recognition of Hand Gesture for Wearable Applications in IoMTW, ICACT Trans. Advanced Communications Technology (TACT), vol. 6, no. 5, pp. 1046-1053, Sep. 2017. [4] S. Mitra and T. Acharya, Gesture recognition: A survey, IEEE Trans. Syst., Man, Cybern. C, vol. 37, no. 3, pp. 311 324, 2007. [5] S. Byun, S. Lee, G. Kim, S. Han, Gesture recognition with wearable device based on deep learning, Broadcasting and Media Magazine, Vol.22, No.1, pp. 58-66, Jan.2017 [6] H. Moon, A. Yang, S. Chun, and J-G. Kim, CNN-based hand gesture recognition for wearable applications, The Korean Institute of Broadcase and Media Engineers Conference, Seoul, Korea, pp. 58-59, 2017. [7] Y. LeCun, K. Koray, and F. Clément, Convolutional networks and applications in vision, In Proc. ISCAS 2010. [8] M. Mitrea, Working draft 2.0 of ISO/IEC 23093-1 IoMT Architecture, ISO/IEC JTC1/SC29/WG11 N17094, Torino, July 2017. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagent classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 2012. [10] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, In Proc, Int, Conf. Machine Learning (ICML), pp 1139-1147, Feb, 2013. [11] Y. Lee, P. Moon, A comparison and Analysis of deep learning framework, J. Korea Institute of Electron. Communi. Science (KIECS), vol 12, no.1, pp. 115-122, Feb, 2017. [12] M. Han et al., Visual hand gesture recognition with convolution neural network, In Proc. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2016.

- 2018 2 : - 2018 3 ~ : - ORCID : http://orcid.org/0000-0002-1672-2345 - :,, - 2014 7 : - 2017 2 : - 2018 3 ~ : - ORCID : https://orcid.org/0000-0003-4957-9589 - :, IoT, - 1990 2 : - 1992 2 : KAIST - 1992 2 : KAIST - 1992 3 ~ 2007 2 : (ETRI) / - 2001 9 ~ 2002 7 : - 2015 12 ~ 2016 1 : UC San Diego, Visiting Scholar - 2007 9 ~ : - ORCID : http://orcid.org/0000-0003-3686-4786 - :,,, UHD/