(JBE Vol. 23, No. 5, September 2018) (Special Paper) 23 5, 2018 9 (JBE Vol. 23, No. 5, September 2018) https://doi.org/10.5909/jbe.2018.23.5.614 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) Generative Adversarial Network a), a), a) Depth Image Restoration Using Generative Adversarial Network John Junyeop Nah a), Chang Hun Sim a), and In Kyu Park a) generative adversarial network (GAN). 3D morphable model convolutional neural network (3DMM CNN) largescale CelebFaces Attribute (CelebA) FaceWarehouse deep convolutional GAN (DCGAN) (generator) Wasserstein distance (discriminator).. Abstract This paper proposes a method of restoring corrupted depth image captured by depth camera through unsupervised learning using generative adversarial network (GAN). The proposed method generates restored face depth images using 3D morphable model convolutional neural network (3DMM CNN) with large-scale CelebFaces Attribute (CelebA) and FaceWarehouse dataset for training deep convolutional generative adversarial network (DCGAN). The generator and discriminator equip with Wasserstein distance for loss function by utilizing minimax game. Then the DCGAN restore the loss of captured facial depth images by performing another learning procedure using trained generator and new loss function. Keyword : Deep learning, generative adversarial network, depth image, depth camera, restoration a) (Inha University, Department of Information and Communication Engineering) Corresponding Author : (In Kyu Park) E-mail: pik@inha.ac.kr Tel: +82-32-860-9190 ORCID: https://orcid.org/0000-0003-4774-7841 IPIU 2018. 2016 ( ) (NRF-2016R1A2B4014731). This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1A2B4014731). Manuscript received May 8, 2018; Revised July 13, 2018; Accepted July 16, 2018. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.
2 : Generative Adversarial Network (John Junyeop Nah et al.: Depth Image Restoration Using Generative Adversarial Network). Time-of-flight (ToF).,.. [1][2]. RGB (inpainting) [3].. DCGAN [4] RGB [5]. DCGAN Jensen-Shannon divergence.. DCGAN [4]. DCGAN Jensen-Shannon divergence (weak) Wasserstein distance [6]. DCGAN GAN [7] convolutional neural network (CNN) [4]. DCGAN 3DMM CNN [8] CelebA FaceWarehouse [9] 23,500... RGB DCGAN Wasserstein distance,. 2. 3 DCGAN, 4. 5.. 1. 3DMM CNN CNN [8] (principal component analysis, PCA) 3 Basel face model (BFM) [10] 3D. CelebA 15,000 3D FaceWarehouse [9] 8,500. 23,500 3D 1 (-1, 1).
(JBE Vol. 23, No. 5, September 2018) (a) (b) 1.. (a) 3DMM CNN CelebA, (b) FaceWarehouse Fig. 1. Depth image used in minimax game. (a) Extracted depth image by using 3DMM CNN and CelebA dataset, (b) Depth image from FaceWarehouse dataset 2. 20cm 150 cm Intel Real- Sense SR300 [11]. 2.. 2. Fig. 2. Facial depth image captured by depth camera. DCGAN 1. -1 1, 23,500., [7]. DCGAN Jensen-Shannon divergence (1) Wasserstein distance [6] 4 (b). 3. Fig. 3. Training procedure using minimax game
2 : Generative Adversarial Network (John Junyeop Nah et al.: Depth Image Restoration Using Generative Adversarial Network) (a) (b) 4.. (a) DCGAN, (b) Wasserstein distance DCGAN (: 0.0002, : 10,000) Fig. 4. Comparison of generated facial depth image after performing minimax game using two different loss functions. (a) Original DCGAN, (b) DCGAN with Wasserstein distance (Learning rate: 0.0002, iteration count: 10,000) P P P, P (joint probability distribution). (lower bound). Wasserstein distance (metric) [6]. 2. 0 1. [5]. [3]. L L log (2)., -1 1,. (3) GAN [7].. (2) (3). L L L (4) 5. Fig. 5. Depth image restoration using trained generator
(JBE Vol. 23, No. 5, September 2018) (5). L (6).. TensorFlow NVIDIA GeForce GTX 1080 Ti GPU 5 DCGAN (hyper parameter) 1. 6 4 (b) 25,000.. 7 1. Table 1. Hyper parameters used in minimax game and restoration Hyper Parameters Values Iteration Number 25,000 Batch Size 64 Regularization Weight ( ) 0.1 Learning Rate Optimization Function 0.0001, 0.00012, 0.00015, 0.00017, 0.0002 Adam optimizer 7.,, Fig. 7. Eye, nose, mouth region of obtained facial depth image (a) (b) (c) (d) (e) 6.. (a) 0.0001, (b) 0.00012, (c) 0.00015, (d) 0.00017, (e) 0.0002 Fig. 6. Generated facial depth image by trained generator using five different learning rates. (a) 0.0001, (b) 0.00012, (c) 0.00015, (d) 0.00017, (e) 0.0002
나준엽 외 2인: Generative Adversarial Network를 이용한 손실된 깊이 영상 복원 (John Junyeop Nah et al.: Depth Image Restoration Using Generative Adversarial Network) 켜 합성 실험 영상을 얻기 위해 눈, 코, 입에 이진 마스크를 어떻게 적용했는 지를 나타낸다. 각 부위는 얼굴 영역의 맨 윗부분을 기준으로 눈은 30% ~ 45% (빨간 부분), 코는 35% ~ 55% (초록 부분), 입은 57% ~ 75% (파란 부분)으로 지정하였으며 각 부위에 대해 이진 마스크의 619 높이를 고정시 너비를 변화시켜가며 손실된 깊이 영상을 생성하였다. 그림 8은 위 그림 7에서 설명한 눈, 코, 입 부분에 각각 이진 마스크를 여러 비율에 따라 적용시켜 합성 실험 영상 키고 (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) 그림 8. 실제 깊이 영상에 각 부위 별 여러 비율의 이진 마스크를 적용하여 영상을 손상시킨 후 다시 복원한 결과. (a) 원본 깊이 영상, (b), (c), (d) 눈 부위에 각각 너비의 40%, 50%, 60%를 이진 마스크 삭제한 뒤 복원한 결과, (e), (f), (g) 코 부위에 각각 너비의 20%, 30%, 40%를 이진 마스크로 삭제한 뒤 복원한 결과, (h), (i), (j) 입 부위에 각각 너비의 20%, 30%, 40%를 이진 마스크로 삭제한 뒤 복원한 결과 (미니맥스 게임에 사용된 학습률: 0.00017, 복원 과정에 사용된 학습률: 0.00015) Fig. 8. Results of depth image restoration after corrupting the original depth image with binary masks with different area. (a) Original depth image, (b), (c), (d) Restoration result after corrupting 40%, 50%, and 60% of eyes part width, (e), (f), (g) Restoration result after corrupting 20%, 30%, and 40% on nose part width, (h), (i), (j) Restoration result after corrupting 20%, 30%, and 40% on mouth part width (Learning rate used in minimax game: 0.00017, learning rate used in restoration: 0.00015)
(JBE Vol. 23, No. 5, September 2018). 40%, 50%, 60%, 20%, 30%, 40%. 0.00017, 0.00015., 8 PSNR 2. PSNR PSNR.. 9. 30cm 40cm 0.. DCGAN Wasserstein distance.. (References) [1] K. Xu, J. Zhou, and Z. Wang, "A method of hole-filling for the depth map generated by Kinect with moving objects detection," Proceeding of IEEE international Symposium on Broadband Multimedia Systems and Broadcasting, pp. 1-5, June 2012. 2. Table 2. Comparison of restoration results and original depth images for each part Eyes Nose Mouth Image Corruption Ratio (%) 40 50 60 20 30 40 20 30 40 PSNR (db) 29.34 30.06 30.11 31.83 31.7 30.39 29.03 29.15 29.6 (a) (b) (c) 9.. (a) RGB, (b), (c) Fig. 9. Restoration result of real corrupted facial depth image. (a) RGB image, (b) Obtained corrupted facial depth image, (c) Restoration result on loss part
나준엽 외 2인: Generative Adversarial Network를 이용한 손실된 깊이 영상 복원 (John Junyeop Nah et al.: Depth Image Restoration Using Generative Adversarial Network) [2] [3] [4] [5] [6] L. Feng, L.-M. Po, X. Xu, K.-H. Ng, C.-H. Cheung, and K.-w. Cheung, An adaptive background biased depth map hole-filling method for Kinect, Proceeding of IEEE Industrial Electronics Society, pp. 2366 2371, November 2013. S. Ikehata, J. Cho, and K. Aizawa, "Depth map inpainting and super-resolution based on internal statistics of geometry and appearance," Proceeding of IEEE International Conference on Image Processing, pp. 938-942, September 2013. A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, Proceeding of International Conference on Learning Representations, May 2016. R. A. Yeh, C. Chen, T. Yian Lim, A. G. Schwing, M. HasegawaJohnson, and M. N. Do, Semantic image inpainting with deep generative models, Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, July 2017. M. Arjovsky, S.Chintala, and L. Bottou, Wasserstein GAN, Proceeding of International Conference on Machine Learning, vol. 70, pp.214-223, August. 2017. 나준엽 [7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial networks, Proceeding of Advances in Neural Information Processing Systems, December 2014. [8] A. T. Tran, T. Hassner, I. Masi, and G. Medioni, Regressing robust and discriminative 3D morphable models with a very deep neural net-work, Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, July 2017. [9] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K.Zhou, FaceWarehouse: a 3D facial expression database for visual computing, IEEE Transaction on Visualization and Computer Graphics, vol. 20, no. 3, pp. 413-425, March 2014. [10] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vetter, A 3D face model for pose and illumination invariant face recognition, Proceeding of IEEE International Conference on Advanced Video and Signal Based Surveillance, October 2009. [11] Intel RealSense Camera SR300, https://software.intel.com/sites/ de-fault/files/managed/ 0c/ec/realsense-sr30 0-pro duct- datasheet-rev-1-0.pdf (accessed August 13, 2018). 저자소개 년 월 인하대학교 정보통신공학과 학사 년 월 현재 인하대학교 정보통신공학과 석사과정 주관심분야 컴퓨터비전 - 2018 2 : - 2018 3 ~ : - ORCID : http://orcid.org/0000-0002-2576-5240 :, deep learning, GPGPU 심창훈 년 월 인하대학교 정보통신공학과 학사 주관심분야 컴퓨터비전 - 2018 2 : - ORCID : http://orcid.org/0000-0002-1367-1896 :, deep learning, GPGPU 박인규 - 년 2월 : 서울대학교 제어계측공학과 학사 년 2월 : 서울대학교 제어계측공학과 석사 년 8월 : 서울대학교 전기컴퓨터공학부 박사 년 9월 ~ 2004년 3월 : 삼성종합기술원 멀티미디어랩 전문연구원 년 1월 ~ 2008년 2월 : Mitsubishi Electric Research Laboratories (MERL) 방문연구원 년 9월 ~ 2015년 8월 : MIT Media Lab 방문부교수 년 7월 ~ 2019년 2월 : University of California San Diego (UCSD) 방문연구원 년 3월 ~ 현재 : 인하대학교 정보통신공학과 교수 : http://orcid.org/0000-0003-4774-7841 주관심분야 : 컴퓨터 그래픽스 및 비젼 (영상기반 3차원 형상 복원, 증강현실, computational photography), GPGPU 1995 1997 2001 2001 2007 2014 2018 2004 ORCID 621