1 (Special Paper) 22 2, (JBE Vol. 22, No. 2, March 2017) ISSN (Online) ISSN (Print) Convolutional Neural Network a), b), a), a), Facial Expression Classification Using Deep Convolutional Neural Network In-kyu Choi, Hyok Song b, Sangyong Lee a, and Jisang Yoo CNN(Convolutional Neural Network)..,,,,, data-set. (data augmentation). CNN convolutional layer fullyconnected layer node CNN. CNN 96.88%. Abstract In this paper, we propose facial expression recognition using CNN (Convolutional Neural Network), one of the deep learning technologies. To overcome the disadvantages of existing facial expression databases, various databases are used. In the proposed technique, we construct six facial expression data sets such as 'expressionless', 'happiness', 'sadness', 'angry', 'surprise', and 'disgust'. Pre-processing and data augmentation techniques are also applied to improve efficient learning and classification performance. In the existing CNN structure, the optimal CNN structure that best expresses the features of six facial expressions is found by adjusting the number of feature maps of the convolutional layer and the number of fully-connected layer nodes. Experimental results show that the proposed scheme achieves the highest classification performance of 96.88% while it takes the least time to pass through the CNN structure compared to other models. Keyword : Convolutional neural network, face expression, data augmentation, data-set a) (Department of Electrical Engineering, KwangWoon University) b) (Department of Electronic Engineering) Corresponding Author : (Jisang Yoo) jsyoo@kw.ac.kr Tel: ) ORCID: Manuscript received January 10, 2017; Revised February 28, 2017; Accepted March 20, Copyright 2017 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
2 3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network).,.. (Human-Computer Interaction, HCI),,,.... (deep learning). (deep neural networks). CNN(convolutional neural networks). ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2012 CNN, 2015 Microsoft Research ResNet 1000 top5 3.54% [1]. CNN 97.25% (97.53%). CNN convolutional layer fully-connected layer. convolutional layer. Fully-connected layer. CNN data-set ILSVRC [2] landmark [3]. data-set. landmark. CNN,, [4] CNN Convolutional Autoencoder(CAE) [5]. data-set(ck+, JAFFE). CNN. Kaggle FER2013 data-set. 10k US Adult Faces Database [6], Indian Movie Face database(imfdb) [7], Cohn-Kanade AU-Coded Facial Expression(CK+) [8], Chicago Face Database [9], ESRC 3D Face Database [10], Amsterdam Dynamic Facial Expression Set(ADFES) [11], Karolinska Directed Emotional Faces(KDEF) [12], EU- Emotion Stimulus Set [13], Warsaw Set of Emotional Facial Expression Pictures(WSEFEP) [14] 9 data-set data-set. Data-set,,,,,. 1 AlexNet [15]. convolutional layer, fully-connected layer.
3 1. AlexNet [13] Fig. 1. AlexNet structure [13]. 2 data-set,. 3, Data-set. 2 FER2013 data-set.. FER2013 data-set data-set 9 data-set. 1. DB CNN data-set. Data-set kaggle Facial Expression Recognition Challenge' (FER2013 data-set) 37,000 7 [16]. 48x48,. CNN. (blur). 2. FER2013 data-set Fig. 2. Examples of images classified as incorrect faces in FER 2013 dataset
4 3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) data-set. 10k US Adult Faces Database: 2,222 10,168.. Indian Movie Face database: ,512.,,,,,, 7. Cohn-Kanade AU-Coded Facial Expression: ,,,,,,. Chicago Face Database: ,,. ESRC 3D Face Database: ,,,,. Amsterdam Dynamic Facial Expression Set: 10 12,,,,,,,,. Karolinska Directed Emotional Faces: , -40, 0, +45, +90 4,900.,,,,,, 7. EU-Emotion Stimulus Set: ,,,,,.. Warsaw Set of Emotional Facial Expression Pictures: 30,,,,,,. data-set,,,,,,,. data-set (' ). 1 data-set. 1. data-set Table 1. Number of images per facial expression of collected data-set Neutral [NE] Happy [HA] Sad [SA] Angry [AN] Surprise [SU] Disgust [DI] Total 1,000 1, , (augmentation). Haar [17].,,, Fig. 3. The result of converting cut-out face region image into black and white image
5 CNN (over-fitting) (data augmentation). Data-set, 3., 5, 10, [5] ZFNet AlexNet ImageNet data-set [18].. CNN convolution. 5 K, M, M. 학습파라미터수 연산량 4. Fig. 4. The result of applying data augmentation technique 3. CNN CNN, CNN AlexNet. AlexNet convolutional layer layer. Fully-connected layer layer node. ZFNet [17] AlexNet convolutional layer 11 5, 4 5. convolutional layer Fig. 5. Computational relationship between consecutive convolutional layers AlexNet convolutional layer. fully-connected layer. Convolutional layer fully-connected layer CNN C1-C5 layer FC6-FC8 layer. AlexNet (96, 256, 384, 384, 256, 4096, 4096, 1000).
6 3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network). 2. AlexNet AlexNet 1/2, 1/4.. 1/2, 1/4.. 2 (24, 64, 96, 96, 128, 1024, 1024).. 2 fully-connected. 2. Table 2. Candidate model configuration and recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) Table 3. In the first reference model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) / (48 128, 192, 192, 256, 4096, 4096), (36, 96,144, 96, 128, 1024, 1024) 2/3 1/2. 3 FC6, FC6, C4 96 (36, 96, 144, 96, 128, 1024, 1024). 4. Table 4. In the second criterion model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) C4,5, FC6,7 C1-C3. (36, 96, 144, 96, 128, 1024, 1024).
7 5. Table 5. In the third reference model the structure that reduces the number of channels and nodes and the corresponding recognition rate C1 C2 C3 C4 C5 FC6 FC7 (%) convolutional layer GPU 2 AlexNet. (-15, -10, -5, +5, +10, +15 ) (36, 96, 144, 96, 128, 1024, 1024) convolutional layer Fig. 6. The proposed optimal structure 4. Geforce GTX980 TI GPU Theano tool. (training) (test) data-set 9: batch stochastic gradient descent. epoch 60 learning-rate 0.01 epoch 20, 40 1/ Table 6 Effects of data preprocessing and augmentation techniques Preprocessing Method Accuracy (%) 1-channel gray image channel color image channel gray image + data augmentation 3-channel color image + data augmentation CNN AlexNet, VGGNet(11-layer) [19], OverFeat(fast model) [20], inception GoogleNet [21],
8 3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) VGGNet batch VGGNet batch 32 batch (batch : 32) Table 7. Learning and testing time for each model (batch : 32) Model Training time (sec / batch) Test time (sec / batch) AlexNet OverFeat VGGNet Inception Module Proposed Table 8. Recognition rate for each model Model (%) AlexNet OverFeat VGGNet Inception Module Proposed data-set confusion matrix. Confusion matrix, %,,,. 9. confusion matrix (%) Table 9. The confusion matrix of the proposed structure (%) NE HA SA AN SU DI NE HA SA AN SU DI AlexNet confusion matrix (%) Table 10. The confusion matrix of the AlexNet NE HA SA AN SU DI NE HA SA AN SU DI
9 11. VGGNet confusion matrix (%) Table 11. The confusion matrix of the VGGNet (%) NE HA SA AN SU DI NE HA SA AN SU DI OverFeat confusion matrix (%) Table 12. The confusion matrix of the OverFeat (%) NE HA SA AN SU DI NE HA SA AN SU DI Inception module confusion matrix (%) Table 13. The confusion matrix of the Inception module (%) NE HA SA AN SU DI NE HA SA AN SU DI data-set CNN convolutional layer fully-connected layer., CNN. (References) [1] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arxiv preprint arxiv: , [2] Mollahosseini, Ali, David Chan, and Mohammad H. Mahoor, "Going deeper in facial expression recognition using deep neural networks." Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, [3] Jung, Heechul, et al. "Joint fine-tuning in deep neural networks for facial expression recognition." Proceedings of the IEEE International Conference on Computer Vision, [4] Lopes, Andre Teixeira, Edilson de Aguiar, and Thiago Oliveira-Santos, "A facial expression recognition system using convolutional networks," Graphics, Patterns and Images (SIBGRAPI), th SIBGRAPI Conference on, IEEE, [5] Hamester, Dennis, Pablo Barros, and Stefan Wermter. "Face expression recognition with a 2-channel convolutional neural network," Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, [6] W. Bainbridge, P. Isola, and A. Oliva, The intrinsic memorability of face photographs, Journal of Experimental Psychology: General, 142(4): , [7] S. Setty and et al, Indian Movie Face Database: A Benchmark for FaceRecognition Under Wide Variation, In NCVPRIPG, [8] P. Lucey, J. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews, The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, in Proceedings of the IEEE Workshop on CVPR for Human Communicative Behavior Analysis, [9] Ma DS, Correll J, Wittenbrink B, The Chicago Face Database: A Free Stimulus Set of Faces and Norming Data, Behavior Research Methods, 47, [10] ESRC 3D Face Database,
10 3 : Convolutional Neural Network (In-kyu Choi et al.: Facial Expression Classification Using Deep Convolutional Neural Network) [11] J. Van der Schalk, S. T. Hawk, A. H. Fischer, and B. J. Doosje, Moving faces, looking places: The Amsterdam Dynamic Facial Expressions Set (ADFES), Emotion, 11, DOI: / a , [12] D. Lundqvist, A. Flykt, and A.Öhman (1998), The Karolinska Directed Emotional Faces - KDEF, CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, ISBN [13] H. O'Reilly, D. Pigat, S. Fridenson, S. Berggren, S. Tal, O. Golan, S. B"olte, S. Baron-Cohen and D. Lundqvist, The EU-Emotion Stimulus Set: A Validation Study, Behavior Research Methods. DOI: /s , [14] M. Olszanowski, G. Pochwatko, K. Kuklinski, M. Scibor-Rylski, P. Lewinski and RK. Ohme, Warsaw Set of Emotional Facial Expression Pictures: A validation study of facial display photographs, Front. Psychol, 5:1516. doi: /fpsyg , [15] A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, [16] Learn facial expressions from an image, c/challenges-in-representation-learning-facial-expression-recognitionchallenge/data [17] Viola and Jones, "Rapid object detection using a boosted cascade of simple features," Computer Vision and Pattern Recognition, [18] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," In European Conference on Computer Vision, Springer International Publishing, pp , September [19] K. Simonyan, and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," In Proc. International Conference on Learning Representations, (2014). [20] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus and Y. LeCun, "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks." In Proc. ICLR, [21] P. Burkert, F. Trier, M. Z. Afzal, A. Dengel, and M. Liwicki. Dexpression: "Deep convolutional neural network for expression recognition,".corr, abs/ , : : ~ : - ORCID : - :,, : : : ~ : - ORCID : - :,, : : (MBA) ~ 2001 : ~ 2008 : (BSI) ~ 2015 : CJ HelloVision CTO & COO - ORCID : - : Digital Media Center, Smart Home, Cloud Broadcasting Platform
11 : : : Purdue Univ. EE, ph.d ~ : - ORCID : - :,,
Received : 2012. 11. 27 Reviewed : 2012. 12. 10 Accepted : 2012. 12. 12 A Clinical Study on Effect of Electro-acupuncture Treatment for Low Back Pain and Radicular Pain in Patients Diagnosed with Lumbar