(Special Paper) 22 2, 2017 3 (JBE Vol. 22, No. 2, March 2017) https://doi.org/10.5909/jbe.2017.22.2.162 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) Convolutional Neural Network a), b), a), a), Facial Expression Classification Using Deep Convolutional Neural Network In-kyu Choi, Hyok Song b, Sangyong Lee a, and Jisang Yoo CNN(Convolutional Neural Network)..,,,,, data-set. (data augmentation). CNN convolutional layer fullyconnected layer node CNN. CNN 96.88%. Abstract In this paper, we propose facial expression recognition using CNN (Convolutional Neural Network), one of the deep learning technologies. To overcome the disadvantages of existing facial expression databases, various databases are used. In the proposed technique, we construct six facial expression data sets such as 'expressionless', 'happiness', 'sadness', 'angry', 'surprise', and 'disgust'. Pre-processing and data augmentation techniques are also applied to improve efficient learning and classification performance. In the existing CNN structure, the optimal CNN structure that best expresses the features of six facial expressions is found by adjusting the number of feature maps of the convolutional layer and the number of fully-connected layer nodes. Experimental results show that the proposed scheme achieves the highest classification performance of 96.88% while it takes the least time to pass through the CNN structure compared to other models. Keyword : Convolutional neural network, face expression, data augmentation, data-set a) (Department of Electrical Engineering, KwangWoon University) b) (Department of Electronic Engineering) Corresponding Author : (Jisang Yoo) E-mail: jsyoo@kw.ac.kr Tel: +82-02)940-5112 ORCID: http://orcid.org/0000-0002-3766-9854 Manuscript received January 10, 2017; Revised February 28, 2017; Accepted March 20, 2017. Copyright 2017 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network).,.. (Human-Computer Interaction, HCI),,,.... (deep learning). (deep neural networks). CNN(convolutional neural networks). ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2012 CNN, 2015 Microsoft Research ResNet 1000 top5 3.54% [1]. CNN 97.25% (97.53%). CNN convolutional layer fully-connected layer. convolutional layer. Fully-connected layer. CNN data-set ILSVRC [2] landmark [3]. data-set. landmark. CNN,, [4] CNN Convolutional Autoencoder(CAE) [5]. data-set(ck+, JAFFE). CNN. Kaggle FER2013 data-set. 10k US Adult Faces Database [6], Indian Movie Face database(imfdb) [7], Cohn-Kanade AU-Coded Facial Expression(CK+) [8], Chicago Face Database [9], ESRC 3D Face Database [10], Amsterdam Dynamic Facial Expression Set(ADFES) [11], Karolinska Directed Emotional Faces(KDEF) [12], EU- Emotion Stimulus Set [13], Warsaw Set of Emotional Facial Expression Pictures(WSEFEP) [14] 9 data-set data-set. Data-set,,,,,. 1 AlexNet [15]. convolutional layer, fully-connected layer.
1. AlexNet [13] Fig. 1. AlexNet structure [13]. 2 data-set,. 3, 4. 5.. Data-set. 2 FER2013 data-set.. FER2013 data-set data-set 9 data-set. 1. DB CNN data-set. Data-set. 2013 kaggle Facial Expression Recognition Challenge' (FER2013 data-set) 37,000 7 [16]. 48x48,. CNN. (blur). 2. FER2013 data-set Fig. 2. Examples of images classified as incorrect faces in FER 2013 dataset
3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) data-set. 10k US Adult Faces Database: 2,222 10,168.. Indian Movie Face database: 100 34,512.,,,,,, 7. Cohn-Kanade AU-Coded Facial Expression: 18 30 123 593. 309,,,,,,. Chicago Face Database: 17-65 597. 597 158,,. ESRC 3D Face Database: 45 54 4.,,,,. Amsterdam Dynamic Facial Expression Set: 10 12,,,,,,,,. Karolinska Directed Emotional Faces: 20 30 35 35-90, -40, 0, +45, +90 4,900.,,,,,, 7. EU-Emotion Stimulus Set: 10-70 19,,,,,.. Warsaw Set of Emotional Facial Expression Pictures: 30,,,,,,. data-set,,,,,,,. data-set (' ). 1 data-set. 1. data-set Table 1. Number of images per facial expression of collected data-set Neutral [NE] Happy [HA] Sad [SA] Angry [AN] Surprise [SU] Disgust [DI] Total 1,000 1,008 465 553 569 501 4,096 2. (augmentation). Haar [17].,,,. 3. 3. Fig. 3. The result of converting cut-out face region image into black and white image
CNN (over-fitting) (data augmentation). Data-set, 3., 5, 10, 15. 14 [5]. 4. 2. ZFNet AlexNet ImageNet data-set [18].. CNN convolution. 5 K, M, M. 학습파라미터수 연산량 4. Fig. 4. The result of applying data augmentation technique 3. CNN CNN, CNN AlexNet. AlexNet 100 1000.. convolutional layer layer. Fully-connected layer layer node. ZFNet [17] AlexNet convolutional layer 11 5, 4 5. convolutional layer Fig. 5. Computational relationship between consecutive convolutional layers AlexNet convolutional layer. fully-connected layer. Convolutional layer fully-connected layer CNN C1-C5 layer FC6-FC8 layer. AlexNet (96, 256, 384, 384, 256, 4096, 4096, 1000).
3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network). 2. AlexNet AlexNet 1/2, 1/4.. 1/2, 1/4.. 2 (24, 64, 96, 96, 128, 1024, 1024).. 2 fully-connected. 2. Table 2. Candidate model configuration and recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) 96 256 384 384 256 4096 4096 95.1 48 128 192 192 256 2048 2048 95.6 24 64 96 96 128 1024 1024 95.6 3. Table 3. In the first reference model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) 24 64 96 96 128 1024 1024 95.6 12 64 96 96 128 1024 1024 94.3 24 32 96 96 128 1024 1024 95.1 24 64 48 96 128 1024 1024 94.3 24 64 96 48 128 1024 1024 94.5 24 64 96 96 64 1024 1024 94.8 24 64 96 96 128 512 512 94.0 2 1/2. 3 2. 2 (48 128, 192, 192, 256, 4096, 4096), (36, 96,144, 96, 128, 1024, 1024) 2/3 1/2. 3 FC6, 7 512 FC6, 7 1024. 4 C4 96 (36, 96, 144, 96, 128, 1024, 1024). 4. Table 4. In the second criterion model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) 36 96 144 144 128 1024 1024 96.1 24 96 144 144 128 1024 1024 95.6 36 64 144 144 128 1024 1024 94.5 36 96 96 144 128 1024 1024 94.3 36 96 144 96 128 1024 1024 96.9 36 96 144 144 64 1024 1024 95.6.. 5 C4,5, FC6,7 C1-C3. (36, 96, 144, 96, 128, 1024, 1024).
5. Table 5. In the third reference model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) 36 96 144 96 128 1024 1024 96.9 24 96 144 96 128 1024 1024 94.8 36 64 144 96 128 1024 1024 95.3 36 96 96 96 128 1024 1024 96.1 6. 1 convolutional layer GPU 2 AlexNet. (-15, -10, -5, +5, +10, +15 ). 3-. 3-1-.. 6 3 (36, 96, 144, 96, 128, 1024, 1024). 3-1-. convolutional layer. 1-6.. 6. Fig. 6. The proposed optimal structure 4. Geforce GTX980 TI GPU Theano tool. (training) (test) data-set 9:1. 128 batch stochastic gradient descent. epoch 60 learning-rate 0.01 epoch 20, 40 1/10. 2 3-1-. 6 Table 6 Effects of data preprocessing and augmentation techniques Preprocessing Method Accuracy (%) 1-channel gray image 88.80 3-channel color image 88.92 1-channel gray image + data augmentation 3-channel color image + data augmentation 96.88 95.33 CNN AlexNet, VGGNet(11-layer) [19], OverFeat(fast model) [20], inception GoogleNet [21],. 1000 6.
3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) VGGNet batch 128 32... VGGNet batch 32 batch 32. 7. 7. (batch : 32) Table 7. Learning and testing time for each model (batch : 32) Model Training time (sec / batch) Test time (sec / batch) AlexNet 0.107 0.023 OverFeat 0.194 0.040 VGGNet 0.597 0.141 Inception Module 0.111 0.033 Proposed 0.031 0.008.. 8.. 8. Table 8. Recognition rate for each model Model (%) AlexNet 95.05 OverFeat 95.83 VGGNet 96.35 Inception Module 95.83 Proposed 96.88 2 data-set. 9 13 confusion matrix. Confusion matrix,.. 92.73%,,,. 9. confusion matrix (%) Table 9. The confusion matrix of the proposed structure (%) NE HA SA AN SU DI NE 93.18 3.41 1.14 2.27 0.00 0.00 HA 0.00 100.00 0.00 0.00 0.00 0.00 SA 4.35 0.00 95.65 0.00 0.00 0.00 AN 0.00 1.82 1.82 92.73 0.00 3.64 SU 0.00 0.00 0.00 0.00 100.00 0.00 DI 0.00 0.00 0.00 0.00 0.00 100.00 10. AlexNet confusion matrix (%) Table 10. The confusion matrix of the AlexNet NE HA SA AN SU DI NE 95.45 2.27 1.14 1.14 0.00 0.00 HA 0.00 100.00 0.00 0.00 0.00 0.00 SA 2.17 2.17 95.65 0.00 0.00 0.00 AN 1.82 3.64 1.82 83.64 0.00 9.09 SU 0.00 0.00 0.00 0.00 100.00 0.00 DI 0.00 2.00 2.00 4.00 0.00 92.00
11. VGGNet confusion matrix (%) Table 11. The confusion matrix of the VGGNet (%) NE HA SA AN SU DI NE 98.86 1.14 0.00 0.00 0.00 0.00 HA 1.14 97.73 0.00 0.00 1.14 0.00 SA 0.00 0.00 97.83 2.17 0.00 0.00 AN 1.82 1.82 1.82 92.73 0.00 1.82 SU 0.00 0.00 0.00 0.00 100.00 0.00 DI 0.00 4.00 6.00 2.00 0.00 88.00 12. OverFeat confusion matrix (%) Table 12. The confusion matrix of the OverFeat (%) NE HA SA AN SU DI NE 93.18 0.00 2.27 4.55 0.00 0.00 HA 1.14 98.86 0.00 0.00 0.00 0.00 SA 0.00 2.17 95.65 2.17 0.00 0.00 AN 1.85 1.85 1.85 92.59 0.00 1.85 SU 0.00 0.00 0.00 1.75 98.25 0.00 DI 0.00 0.00 0.00 2.00 0.00 98.00 13. Inception module confusion matrix (%) Table 13. The confusion matrix of the Inception module (%) NE HA SA AN SU DI NE 94.32 3.41 1.14 1.14 0.00 0.00 HA 0.00 100.00 0.00 0.00 0.00 0.00 SA 6.52 0.00 93.48 0.00 0.00 0.00 AN 7.27 1.82 1.82 87.27 0.00 1.82 SU 0.00 0.00 0.00 0.00 100.00 0.00 DI 0.00 2.00 0.00 0.00 0.00 98.00. data-set. 1-.. CNN convolutional layer fully-connected layer., CNN. (References) [1] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arxiv preprint arxiv:1512.03385, 2015. [2] Mollahosseini, Ali, David Chan, and Mohammad H. Mahoor, "Going deeper in facial expression recognition using deep neural networks." Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016. [3] Jung, Heechul, et al. "Joint fine-tuning in deep neural networks for facial expression recognition." Proceedings of the IEEE International Conference on Computer Vision, 2015. [4] Lopes, Andre Teixeira, Edilson de Aguiar, and Thiago Oliveira-Santos, "A facial expression recognition system using convolutional networks," Graphics, Patterns and Images (SIBGRAPI), 2015 28th SIBGRAPI Conference on, IEEE, 2015. [5] Hamester, Dennis, Pablo Barros, and Stefan Wermter. "Face expression recognition with a 2-channel convolutional neural network," Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015. [6] W. Bainbridge, P. Isola, and A. Oliva, The intrinsic memorability of face photographs, Journal of Experimental Psychology: General, 142(4):1323 1334, 2013. [7] S. Setty and et al, Indian Movie Face Database: A Benchmark for FaceRecognition Under Wide Variation, In NCVPRIPG, 2013. [8] P. Lucey, J. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, in Proceedings of the IEEE Workshop on CVPR for Human Communicative Behavior Analysis, 2010. [9] Ma DS, Correll J, Wittenbrink B, The Chicago Face Database: A Free Stimulus Set of Faces and Norming Data, Behavior Research Methods, 47, 1122-1135. [10] ESRC 3D Face Database, http://pics.stir.ac.uk/esrc/
3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) [11] J. Van der Schalk, S. T. Hawk, A. H. Fischer, and B. J. Doosje, Moving faces, looking places: The Amsterdam Dynamic Facial Expressions Set (ADFES), Emotion, 11, 907-920. DOI: 10.1037/ a0023853, 2011. [12] D. Lundqvist, A. Flykt, and A.Öhman (1998), The Karolinska Directed Emotional Faces - KDEF, CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, ISBN 91-630-7164-9. [13] H. O'Reilly, D. Pigat, S. Fridenson, S. Berggren, S. Tal, O. Golan, S. B"olte, S. Baron-Cohen and D. Lundqvist, The EU-Emotion Stimulus Set: A Validation Study, Behavior Research Methods. DOI: 10.3758/s13428-015-0601-4, 2015. [14] M. Olszanowski, G. Pochwatko, K. Kuklinski, M. Scibor-Rylski, P. Lewinski and RK. Ohme, Warsaw Set of Emotional Facial Expression Pictures: A validation study of facial display photographs, Front. Psychol, 5:1516. doi: 10.3389/fpsyg.2014.01516, 2015. [15] A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, 2012. [16] Learn facial expressions from an image, https://www.kaggle.com/ c/challenges-in-representation-learning-facial-expression-recognitionchallenge/data [17] Viola and Jones, "Rapid object detection using a boosted cascade of simple features," Computer Vision and Pattern Recognition, 2001. [18] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," In European Conference on Computer Vision, Springer International Publishing, pp. 818-833, September 2014. [19] K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," In Proc. International Conference on Learning Representations, http://arxiv.org/abs/1409.1556 (2014). [20] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus and Y. LeCun, "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks." In Proc. ICLR, 2014. [21] P. Burkert, F. Trier, M. Z. Afzal, A. Dengel, and M. Liwicki. Dexpression: "Deep convolutional neural network for expression recognition,".corr, abs/1509.05371, 2015. - 2014 2 : - 2016 2 : - 2016 3 ~ : - ORCID : http://orcid.org/orcid.org/0000-0002-4239-1762 - :,, - 1999 2 : - 2001 2 : - 2013 2 : - 2000 ~ : - ORCID : http://orcid.org/0000-0003-0376-9467 - :,, - 1987 : - 2001 : (MBA) - 1987 ~ 2001 : - 2001 ~ 2008 : (BSI) - 2007 ~ 2015 : CJ HelloVision CTO & COO - ORCID : http://orcid.org/0000-0003-0210-3591 - : Digital Media Center, Smart Home, Cloud Broadcasting Platform
- 1985 2 : - 1987 2 : - 1993 5 : Purdue Univ. EE, ph.d - 1997 9 ~ : - ORCID : http://orcid.org/0000-0002-3766-9854 - :,,