4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information) (Special Paper) 24 3, (JBE Vol. 24, No. 3, May 2019

Similar documents
08김현휘_ok.hwp

2 : (JEM) QTBT (Yong-Uk Yoon et al.: A Fast Decision Method of Quadtree plus Binary Tree (QTBT) Depth in JEM) (Special Paper) 22 5, (JBE Vol. 2

(JBE Vol. 23, No. 5, September 2018) (Regular Paper) 23 5, (JBE Vol. 23, No. 5, September 2018) ISSN

3 : (Won Jang et al.: Musical Instrument Conversion based Music Ensemble Application Development for Smartphone) (Special Paper) 22 2, (JBE Vol

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

(JBE Vol. 21, No. 3, May 2016) HE-AAC v2. DAB+ 120ms..,. DRM+(Digital Radio Mondiale plus) [3] xhe-aac (extended HE-AAC). DRM+ DAB HE-AAC v2 xhe-aac..

2 : (Seungsoo Lee et al.: Generating a Reflectance Image from a Low-Light Image Using Convolutional Neural Network) (Regular Paper) 24 4, (JBE

09권오설_ok.hwp

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE. vol. 29, no. 6, Jun Rate). STAP(Space-Time Adaptive Processing)., -

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE. vol. 29, no. 10, Oct ,,. 0.5 %.., cm mm FR4 (ε r =4.4)

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Nov.; 26(11),

(JBE Vol. 22, No. 2, March 2017) (Regular Paper) 22 2, (JBE Vol. 22, No. 2, March 2017) ISSN

5 : HEVC GOP R-lambda (Dae-Eun Kim et al.: R-lambda Model based Rate Control for GOP Parallel Coding in A Real-Time HEVC Software Encoder) (Special Pa

(JBE Vol. 23, No. 6, November 2018) (Special Paper) 23 6, (JBE Vol. 23, No. 6, November 2018) ISSN 2

<30312DC1A4BAB8C5EBBDC5C7E0C1A4B9D7C1A4C3A52DC1A4BFB5C3B62E687770>

À±½Â¿í Ãâ·Â

DBPIA-NURIMEDIA

(JBE Vol. 23, No. 5, September 2018) (Regular Paper) 23 5, (JBE Vol. 23, No. 5, September 2018) ISSN

(JBE Vol. 23, No. 2, March 2018) (Special Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jun.; 27(6),

(JBE Vol. 21, No. 1, January 2016) (Regular Paper) 21 1, (JBE Vol. 21, No. 1, January 2016) ISSN 228

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Feb.; 29(2), IS

2 : 3 (Myeongah Cho et al.: Three-Dimensional Rotation Angle Preprocessing and Weighted Blending for Fast Panoramic Image Method) (Special Paper) 23 2

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Dec.; 27(12),

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Sep.; 30(9),

(JBE Vol. 23, No. 1, January 2018) (Special Paper) 23 1, (JBE Vol. 23, No. 1, January 2018) ISSN 2287-

1 : UHD (Heekwang Kim et al.: Segment Scheduling Scheme for Efficient Bandwidth Utilization of UHD Contents Streaming in Wireless Environment) (Specia

DBPIA-NURIMEDIA

1 : (Sunmin Lee et al.: Design and Implementation of Indoor Location Recognition System based on Fingerprint and Random Forest)., [1][2]. GPS(Global P

2 : (Jaeyoung Kim et al.: A Statistical Approach for Improving the Embedding Capacity of Block Matching based Image Steganography) (Regular Paper) 22

Software Requirrment Analysis를 위한 정보 검색 기술의 응용

(JBE Vol. 23, No. 1, January 2018). (VR),. IT (Facebook) (Oculus) VR Gear IT [1].,.,,,,..,,.. ( ) 3,,..,,. [2].,,,.,,. HMD,. HMD,,. TV.....,,,,, 3 3,,

8-VSB (Vestigial Sideband Modulation)., (Carrier Phase Offset, CPO) (Timing Frequency Offset),. VSB, 8-PAM(pulse amplitude modulation,, ) DC 1.25V, [2

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Mar.; 28(3),

04 최진규.hwp

02손예진_ok.hwp

3. 클라우드 컴퓨팅 상호 운용성 기반의 서비스 평가 방법론 개발.hwp

3 : ATSC 3.0 (Jeongchang Kim et al.: Study on Synchronization Using Bootstrap Signals for ATSC 3.0 Systems) (Special Paper) 21 6, (JBE Vol. 21

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Mar.; 25(3),

09È«¼®¿µ 5~152s

High Resolution Disparity Map Generation Using TOF Depth Camera In this paper, we propose a high-resolution disparity map generation method using a lo

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jun.; 27(6),

DBPIA-NURIMEDIA

04 김영규.hwp

03이승호_ok.hwp

06_ÀÌÀçÈÆ¿Ü0926

: RTL-SDR (Young-Ju Kim: Implementation of Real-time Stereo Frequency Demodulator Using RTL-SDR) (Regular Paper) 24 3, (JBE Vol. 24, No. 3, May

4 : CNN (Sangwon Suh et al.: Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset) (Regular Paper) 23 6, (J

°í¼®ÁÖ Ãâ·Â

DBPIA-NURIMEDIA

07.045~051(D04_신상욱).fm

<3031B0ADB9CEB1B82E687770>

Journal of Educational Innovation Research 2018, Vol. 28, No. 4, pp DOI: A Study on Organizi

04_이근원_21~27.hwp

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Nov.; 28(11),

1 : 360 VR (Da-yoon Nam et al.: Color and Illumination Compensation Algorithm for 360 VR Panorama Image) (Special Paper) 24 1, (JBE Vol. 24, No

(JBE Vol. 20, No. 2, March 2015) (Special Paper) 20 2, (JBE Vol. 20, No. 2, March 2015) ISSN

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Mar.; 30(3),

03 장태헌.hwp

(JBE Vol. 22, No. 5, September 2017) (Special Paper) 22 5, (JBE Vol. 22, No. 5, September 2017) ISSN

(JBE Vol. 24, No. 2, March 2019) (Regular Paper) 24 2, (JBE Vol. 24, No. 2, March 2019) ISSN

디지털포렌식학회 논문양식

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jul.; 27(7),

인문사회과학기술융합학회

<32382DC3BBB0A2C0E5BED6C0DA2E687770>

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Oct.; 27(10),

. 서론,, [1]., PLL.,., SiGe, CMOS SiGe CMOS [2],[3].,,. CMOS,.. 동적주파수분할기동작조건분석 3, Miller injection-locked, static. injection-locked static [4]., 1/n 그림

10신동석.hwp

¼º¿øÁø Ãâ·Â-1

<4D F736F F D20B1E2C8B9BDC3B8AEC1EE2DC0E5C7F5>

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jul.; 27(7),

(JBE Vol. 24, No. 1, January 2019) (Regular Paper) 24 1, (JBE Vol. 24, No. 1, January 2019) ISSN 2287

<30312DC1A4BAB8C5EBBDC5C7E0C1A4B9D7C1A4C3A528B1E8C1BEB9E8292E687770>

,. 3D 2D 3D. 3D. 3D.. 3D 90. Ross. Ross [1]. T. Okino MTD(modified time difference) [2], Y. Matsumoto (motion parallax) [3]. [4], [5,6,7,8] D/3

11 함범철.hwp

<35335FBCDBC7D1C1A42DB8E2B8AEBDBAC5CDC0C720C0FCB1E2C0FB20C6AFBCBA20BAD0BCAE2E687770>

00내지1번2번

(JBE Vol. 24, No. 2, March 2019) (Special Paper) 24 2, (JBE Vol. 24, No. 2, March 2019) ISSN

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Nov.; 26(11),

(JBE Vol. 24, No. 1, January 2019) (Regular Paper) 24 1, (JBE Vol. 24, No. 1, January 2019) ISSN 2287

2 : 2.4GHz (Junghoon Paik et al.: Medium to Long Range Wireless Video Transmission Scheme in 2.4GHz Band with Beamforming) (Regular Paper) 23 5, 2018

THE JOURNAL OF KOREAN INSTITUTE OF ELECTROMAGNETIC ENGINEERING AND SCIENCE Jan.; 26(1),

방송공학회논문지 제18권 제2호

14.531~539(08-037).fm

4 : WebRTC P2P DASH (Ju Ho Seo et al.: A transport-history-based peer selection algorithm for P2P-assisted DASH systems based on WebRTC) (Special Pape

<30362E20C6EDC1FD2DB0EDBFB5B4EBB4D420BCF6C1A42E687770>

1 : HEVC Rough Mode Decision (Ji Hun Jang et al.: Down Sampling for Fast Rough Mode Decision for a Hardware-based HEVC Intra-frame encoder) (Special P

(JBE Vol. 24, No. 4, July 2019) (Special Paper) 24 4, (JBE Vol. 24, No. 4, July 2019) ISSN

Journal of Educational Innovation Research 2019, Vol. 29, No. 1, pp DOI: (LiD) - - * Way to

DBPIA-NURIMEDIA

(JBE Vol. 7, No. 4, July 0)., [].,,. [4,5,6] [7,8,9]., (bilateral filter, BF) [4,5]. BF., BF,. (joint bilateral filter, JBF) [7,8]. JBF,., BF., JBF,.

3 : S-JND HEVC (JaeRyun Kim et al.: A Perceptual Rate Control Algorithm with S-JND Model for HEVC Encoder) (Regular Paper) 21 6, (JBE Vol. 21,

Journal of Educational Innovation Research 2017, Vol. 27, No. 4, pp DOI: A Study on the Opti

2 : (Juhyeok Mun et al.: Visual Object Tracking by Using Multiple Random Walkers) (Special Paper) 21 6, (JBE Vol. 21, No. 6, November 2016) ht

03-서연옥.hwp

2 : MMT QoS (Bokyun Jo et al. : Adaptive QoS Study for Video Streaming Service In MMT Protocol). MPEG-2 TS (Moving Picture Experts Group-2 Transport S

Microsoft Word - 1-차우창.doc

07변성우_ok.hwp

09오충원(613~623)

, V2N(Vehicle to Nomadic Device) [3]., [4],[5]., V2V(Vehicle to Vehicle) V2I (Vehicle to Infrastructure) IEEE 82.11p WAVE (Wireless Access in Vehicula

Transcription:

4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information) (Special Paper) 24 3, 2019 5 (JBE Vol. 24, No. 3, May 2019) https://doi.org/10.5909/jbe.2019.24.3.387 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) a), a), b), b), a) Audio High-Band Coding based on Autoencoder with Side Information Hyo-Jin Cho a), Seong-Hyeon Shin a), Seung Kwon Beack b), Taejin Lee b), and Hochong Park a). MDCT,,., -. 4 latent 12.. SBR 1/2 SBR. Abstract In this study, a new method of audio high-band coding based on autoencoder with side information is proposed. The proposed method operates in the MDCT domain, and improves the performance by using additional side information consisting of the previous and current low bands, which is different from the conventional autoencoder that only inputs information to be encoded. Moreover, the side information in a time-frequency domain enables the high-band coder to utilize temporal characteristics of the signal. In the proposed method, the encoder transmits a 4-dimensional latent vector computed by the autoencoder and a gain variable using 12 bits for each frame. The decoder reconstructs the high band by applying the decoded low bands in the previous and current frames and the transmitted information to the autoencoder. Subjective evaluation confirms that the proposed method provides equivalent performance to the SBR at approximately half the bit rate of the SBR. Keyword : autoencoder, neural network, audio high-band coding, side information a) (Dept. of Electronics Engineering, Kwangwoon University) b) (Electronics and Telecommunications Research Institute) Corresponding Author : (Hochong Park) E-mail: hcpark@kw.ac.kr Tel: +82-2-940-5104 ORCID: https://orcid.org/0000-0003-1600-6610 2018 2018 ( ) (No.2017-0-00072, AV LF ). The present Research has been conducted by the Research Grant of Kwangwoon University in 2018 and by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. 2017-0-00072002, Development of audio/video coding and light field media fundamental technologies for ultra realistic tera-media). Manuscript received March 15, 2019; Revised April 30, 2019; Accepted April 30, 2019. Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND (http://creativecommons.org/licenses/by-nc-nd/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.

(JBE Vol. 24, No. 3, May 2019)., [1].,., [2].,.,. spectral band replication (SBR) [2]. SBR quadrature mirror filter (QMF) -, (tonality). -,. SBR, QMF QMF [3]. [4,5]., (recurrent neural network) (convolutional neural network, CNN).,. (autoencoder) [6].,. -. MDCT (modified discrete cosine transform), SBR QMF MDCT., MDCT. SBR, SBR 1/2.. 1. 1. (hidden layer), 1. Fig. 1. Basic structure of autoencoder

4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information). latent.,., (encoding network) latent, (decoding network) latent. 1. 2. 1024, 50% 2048 MDCT 1024 MDCT. 14.25 khz, 9.75 ~ 14.25 khz. 48 khz, 608 MDCT 192 MDCT. 2. 1,. 192 MDCT, 3 FCN (fully-connected network) 4 latent X [7]. 7 3.75 ~ 9.75 khz MDCT, 8 256 2 (2D). 3.75 khz.. 3 2D CNN 1 (flatten) FCN 10 latent Y [7]. 2D CNN. latent X Y 14 latent,. 2. Fig. 2. Structure of the proposed autoencoder 1. Table 1. Detail of network structure in the proposed method Encoding network for high band layer function output dim. in high-band MDCT coeff. 192 1 FCN, GLU 96 2 FCN, GLU 24 3 FCN, sigmoid 4 out latent vector 4 Encoding network for side information layer function output dim. filters kernel stride in side-info. MDCT coeff. 8 256 1 2D CNN, GLU 4 128 32 32 [5,5] [2,2] 2 2D CNN, GLU 2 64 64 64 [5,5] [2,2] 3 2D CNN, GLU 1 32 128 128 [5,5] [2,2] 4 flatten, FCN, sigmoid out latent vector 10 10 - - - Decoding network layer function output dim. in latent vector 14 1 FCN, GLU 32 2 FCN, GLU 96 3 FCN, sigmoid 192 out high-band MDCT coeff. 192

(JBE Vol. 24, No. 3, May 2019) latent,.,. Y 10. sigmoid, 3 GLU (gated linear unit) [8]. h t 1 W b z, z tanh sigmoid GLU h t. GLU tanh sigmoid. X. 3. 4 latent X X. 4 X 2, x y 32.,. k- 8-. 8-8-, 1/6., 12. 3. GLU Fig. 3. GLU structure 2. 192 MDCT 8 256, 192., MDCT. ADAM [9]., latent X,. 4. Latent X 2 Fig. 4. 2D scatter diagram of latent vector X 4. MDCT, MDCT (sign) MDCT.,

4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information) MDCT 2. MDCT MDCT, MDCT. MDCT. MDCT MDCT. MDCT 0 ~ 1. MDCT MDCT 1.. MDCT G,. MDCT G MDCT. G 4, k-. MDCT 2,., MDCT 192, G 4., 192 MDCT 4 X, 8., 5, 12, 0.56 kbps.. 7 MDCT 10 Y., Y 4 X 14, G MDCT., intelligent gap filling (IGF) MDCT MDCT MDCT [3]. MDCT, MDCT.. VCTK (voice cloning toolkit) [10], RWC (real world computing) [11],, 57. USAC (unified speech and audio coding) 12, 4 speech, speech-over-music (SoM), music 3 [12]. MDCT MDCT. 48 kbps USAC [13]., 48 kbps USAC MDCT, MDCT 9.75 khz. USAC,., long window. short window. SBR. SBR, 10.125 khz [13]. 10.125 ~ 14.25kHz SBR. SBR 1.08 kbps, 2. 48 kbps USAC., SBR,

(JBE Vol. 24, No. 3, May 2019).. 5, SBR. SBR 1/2 SBR. MUSHRA, 3.5 khz [14]. 5,,,, 0 ~ 100. 6, 95%.. SBR, speech SBR. SBR 1/2. 6. MUSHRA Fig. 6. Result of MUSHRA test...,. 4, 0.56 kbps. SBR 1/2.. (References) 5. (a), (b), (c) SBR Fig. 5. Spectrogram of test data (a) original, (b) decoded signal by proposed method and (c) decoded signal by SBR [1] ISO/IEC 11172-3, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 3, 1993. [2] M. Dietz, L. Liljeryd, K. Kjörling, and O. Kunz, Spectral band replication, a novel approach in audio coding, 112th Conv. Audio Eng. Soc., May 2002. [3] C. R. Helmrich, et al., Spectral envelope reconstruction via IGF for

4 : (Hyo-Jin Cho et al.: Audio High-Band Coding based on Autoencoder with Side Information) audio transform coding, Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brisbane, Australia, pp. 389-393, 2015. [4] L. Jiang, R. Hu, X. Wang, W. Tu, and M. Zhang, Nonlinear prediction with deep recurrent neural networks for non-blind audio bandwidth extension, China Communication, vol. 15, no. 1, pp. 72-85. Jan. 2018. [5] K. Schmidt and B. Edler, Blind bandwidth extension based on convolutional and recurrent deep neural networks, Proc. of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Calgary, Canada, pp. 5444-5448, 2018. [6] G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, 313.5786, pp. 504-507, 2006. [7] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521.7553, pp. 436-444, 2015. [8] Y. N. Dauphin, et al., Language modeling with gated convolutional networks, Proc. of the 34th Int. Conf. on Machine Learning, vol 70, Sydney, Australia, pp. 933-941, 2017. [9] D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, Proc. of Int. Conf. on Learning Representation, San Diego, USA, 2015. [10] C. Veaux, et al., Superseded-CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit, 2016. [11] M. Goto, Development of the RWC music database, Proc. of Int. Congress on Acoustics, vol. 1, pp. 553-556, April 2004. [12] ISO/IEC JTC1/SC29/WG11 N9927, Workplan for subjective testing of Unified Speech and Audio Coding proposals, April 2008. [13] S. Beack, et al., Single-mode-based Unified Speech and Audio Coding by extending the linear prediction domain coding mode, ETRI Journal, vol. 39, no. 3, pp. 310-318, 2017. [14] ITU-R BS.1534-3, Method for the subjective assessment of intermediate quality level of audio systems, 2015. - 2017 2 : - 2017 3 ~ : - ORCID : http://orcid.org/0000-0003-2296-2270 - : /, - 2016 2 : - 2016 3 ~ : - ORCID : http://orcid.org/0000-0002-2343-8983 - : /, - 2005 8 : - 2005 8 ~ : AV - ORCID : https://orcid.org/0000-0002-6254-2062 - : /

(JBE Vol. 24, No. 3, May 2019) - 2014 : - 2002 ~ 2003 : Tokyo Denki University, - 2000 ~ : ETRI AV - :,, - 1986 2 : - 1987 12 : Univ. of Wisconsin-Madison - 1993 5 : Univ. of Wisconsin-Madison - 1993 9 ~ 1997 8 : - 1997 9 ~ : - ORCID : https://orcid.org/0000-0003-1600-6610 - : /, 3D,