(JBE Vol. 20, No. 2, March 2015) (Special Paper) 20 2, 2015 3 (JBE Vol. 20, No. 2, March 2015) http://dx.doi.org/10.5909/jbe.2015.20.2.224 ISSN 2287-9137 (Online) ISSN 1226-7953 (Print) SIMD HEVC RExt a), a), a), a) SIMD Instruction-based Fast HEVC RExt Decoder Jung-Soo Mok a), Yong-Jo Ahn a), Hochan Ryu a), and Donggyu Sim a) HEVC RExt (High Efficiency Video Coding Range Extension) SIMD (Single Instruction Multiple Data). RExt,, -, -, SIMD. RExt,, -, -, SSE (Streaming SIMD Extension)., 256 AVX2 (Advanced Vector extension 2), -,. SIMD HEVC HM 16.0 HEVC RExt 12%. Abstract In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inversetransform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector extension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method. Keyword : HEVC, RExt, HEVC Range extensions, SIMD, Parallelization a) (Dept. of Computer Engineering, Kwangwoon University) Corresponding Author : (Donggyu Sim) E-mail: dgsim@kw.ac.kr Tel: +82-2-941-6470 ORCID: http://orcid.org/0000-0002-2794-9932 2014 ( ) (NRF-2014R1A2A1A11052210) 2014 ( ) ( ). 2014. Manuscript received January 21, 2015; revised March 24, 2015; accepted March 24, 2015.
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder)., ISO/IEC MPEG (Moving Picture Experts Group) JCT-VC (Joint Collaboration Team on Video Coding) H.264/AVC 2 HEVC (High Efficiency Video Coding) [1][2]. HEVC,, YUV 4:2:0 10 2013 1 1 [3]. HEVC 1 HEVC SHVC(Scalable HEVC), MV-HEVC (Multi-view HEVC), HEVC-RExt (HEVC Range Extension). HEVC RExt 16 YUV 4:0:0/ 4:2:0/4:2:2/4:4:4 [4]. HEVC RExt HEVC 1 RExt [5]. HEVC 1 [6] [7][8][9] HEVC 1 RExt., SIMD (Single Instruction Multiple Data) CPU GPU (multi threading). HEVC 2 RExt SIMD. HEVC RExt SIMD MMX (MultiMedia extension) SSE(Streaming SIMD Extension) AVX2 (Advanced Vector extension 2).. 2 MMX SSE HEVC RExt. 3 AVX2 AVX2 4 MMX SSE AVX2. 5.. MMX SSE SIMD HEVC RExt SIMD MMX SSE. SIMD HEVC RExt (Intra prediction), (Interpo- lation), - (Inverse quantization), - (Inverse transform). 1. HEVC DC, PLANAR 33 Angular. SIMD.
(JBE Vol. 20, No. 2, March 2015) 1. PLANAR Fig. 1. Intra PLANAR mode prediction method DC SIMD movdqu. PLANAR 1 (1). TU (Transform Unit). (1) 2 PLANAR 4 4 SIMD. punpcklwd punpckhwd pack pmullw. (S4) (S9). paddw pshw. 33 Angular. HEVC 2. SSE PLANAR Fig. 2. Intra PLANAR prediction using SSE instruction set
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder) (2). (3) (4).. (3) & (4) Angular 3 Angular 2. (3) (4) pshuflw, pshufhw, pshufd 3. (2) 2. HEVC. 8, 4 DCT (Discrete Cosine Transform) 1. SIMD [10]. 4 11 4. 64 10 16. pmaddwd pack phaddd 16. 3. SSE Angular 2 Fig. 3. Mode 2 of Intra Angular prediction using SSE instruction set
(JBE Vol. 20, No. 2, March 2015) 4. SSE Fig. 4. Horizontal interpolation using SSE instruction set 5. SSE Fig. 5. Vertical interpolation using SSE instruction set 5 8.
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder) punpcklwd punpckhwd pack. packssdw 16. (6) 0~5. 6 - (7). (6) 3. - (7) HEVC RExt (Uniform reconstruction quantizer) (5) (C ) (Qstep). (5) 26 24 (16). shift. - SIMD 6 SIMD -. 32 6. SSE 4 4 - Fig. 6. 4 4 Inverse quantization using SSE instruction set
(JBE Vol. 20, No. 2, March 2015) SSE 4 pmulld paddd psrad. packssdw pack - 16. 4. - Chen [11]. SIMD butterfly [12], 1D, 7. pmaddwd phaddd. HEVC 4 4 ~ 32 32 TU (DCT : Discrete Cosine Transform) (DST : Discrete Sine Transform). IDCT (Inverse Discrete Cosine Transform) IDST (Inverse Discrete Sine Transform) - -. HEVC partial butterfly. AVX2 SIMD AVX2 SIMD. HEVC RExt 8 16 MMX SSE 7. SSE 4 4 - Fig. 7. 4 4 Inverse transform using SSE instruction set
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder). AVX2 x86 MMX SSE 128 256 / SIMD SSE 2 [13]. HEVC RExt -,, (clipping) AVX2, AVX2. HEVC RExt AVX2, AVX2 SSE. SSE 128,, pack HEVC. AVX2,, pack. AVX2 256 128 0~127 128~255. 8 SSE AVX2 pack. SSE pack 64 64. AVX2 pack 128 256 64 64. Pack SSE AVX2 pack 128. AVX2 AVX2.. AVX2 pack SSE, AVX2 pack. SSE 8. AVX2 SSE pack Fig. 8. Pack and unpack of AVX2 and SSE instruction set
(JBE Vol. 20, No. 2, March 2015) AVX2 SSE AVX2 19%. AVX2. 9 10. vpmaddwd 9. AVX2 Fig. 9. Horizontal interpolation using AVX2 instruction set 10. AVX2 Fig. 10. Vertical interpolation using AVX2 instruction set
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder). vphaddd 8 32, 16 pack 128 256 64. 32 256. vperm2i128 10 8. - 32-16 SSE 128 32 4. AVX2 256 8 11 AVX2 16 16 TU -. AVX2 11 vpmulld, vpaddd, vpsrad - vpackssdw pack 16 16 -.. AVX2 12. vpaddw vpminsw vpmaxsw. SSE 2. 11. AVX2 8 8 TU - Fig. 11. 8 8 TU inverse quantization using AVX2 instruction set
(JBE Vol. 20, No. 2, March 2015) 12. AVX2 Fig. 12. Clipping using AVX2 instruction set. SIMD HEVC RExt. HM 16.0 ANSI C HEVC RExt 1. HEVC RExt 10 YUV 4:2:2 [14]. (8) (Average Time Saving) SIMD, SIMD. 1. SIMD Table 1. Test environment for SIMD instruction-based decoding Component CPU Clock speed Memory OS Description Intel Core i7 4770K 3.5GHz (Haswell) 3.5GHz 32GB MS Window 7 64 bits Compiler Intel C++ 13.0 2 AVX2 SSE HEVC RExt. SIMD 10. SSE - 38% 47%, AVX2, -, 19%, 80%, 44%. 12%, AVX2, -, -. 3 SSE AVX2 SSE.
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder) 2. SSE AVX2 Table 2. Experiments result using SSE and AVX2 instruction set Sequence EBUHorse EBUKids Soccer Kimono Seeking Total Average QP Intra prediction decoding (SSE) ATS (%) Avg. Interpolation decoding (AVX2) ATS (%) Avg. Inverse transform decoding (SSE) ATS (%) Avg. Inverse quantization decoding (AVX2) ATS (%) Avg. ATS (%) Clipping (AVX2) 22 29.11 16.91 49.63 77.85 43.27 10.96 27 33.89 21.20 47.68 79.91 38.73 10.52 35.46 20.17 47.41 79.70 40.60 32 39.66 22.40 46.18 79.45 40.56 10.04 37 39.20 20.16 46.15 81.60 39.85 9.57 22 29.79 16.29 50.25 79.93 42.83 12.77 27 28.34 17.87 49.49 73.22 40.48 11.44 31.16 18.30 49.01 75.96 39.94 32 31.96 19.31 50.79 73.97 37.15 10.90 37 34.56 19.72 45.51 76.71 39.30 10.65 22 43.64 19.31 46.29 85.42 49.20 16.29 27 48.40 19.02 49.15 88.34 47.11 14.76 48.41 19.19 45.01 87.45 46.80 32 49.56 19.00 41.56 88.60 45.12 13.73 37 52.05 19.41 43.03 87.45 45.76 13.14 22 34.73 17.18 48.75 76.74 54.81 13.98 27 38.02 18.03 49.46 79.57 51.08 13.87 39.66 18.67 47.89 80.50 50.53 32 42.21 19.49 48.94 82.36 47.60 13.45 37 43.67 19.96 44.38 83.34 48.62 13.46 Avg. Total decoding 38.67 19.08 47.33 80.90 44.47 12.47 ATS (%) Avg. 10.27 11.44 14.48 13.69 3. SSE AVX2 Table 3. Time reduction ratio AVX2 instruction set compared SSE instruction set ATS (%) interpolation horizontal interpolation vertical Uni Bi Uni Bi inverse quantization clipping 26.56 23.85 43.80 44.36 22.44 48.03 SSE AVX2 12%. AVX2, -,,, -, -.. HEVC RExt SIMD HEVC RExt. SIMD. HEVC RExt,, -, - SIMD (References) [1] B. Bross, W. Han, G. Sullivan, J. Ohm, and T. Wiegand, High Efficiency Video Coding (HEVC) Text Specification Draft 10, document JCTVC-L1003_v34, Geneva, CH, Jan. 2013. [2] B. Li, G. and G. Sullivan, Comparison of Compression Performance of HEVC Draft 10 with AVC High Profile, JCTVC-M0329, Incheon, Korea, April. 2013. [3] G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, Overview of the High Efficiency Video Coding ( HEVC ) Standard, IEEE Trans. on CSVT., vol. 22, no. 12, pp. 1649-1668, Dec. 2012. [4] J. Boyce, J. Chen, Y. Chen, D. Flynn, M. M. Hannuksela, M. Naccari, C. Rosewarne, K. Sharman, J. Sole, G. J. Sullivan, T. Suzuki, G. Tech,
(JBE Vol. 20, No. 2, March 2015) Y.-K. Wang, K. Wegner, Y. Ye, Draft high efficiency video coding (HEVC) version 2, combined format range extensions (RExt), scalability (SHVC), and multi-view (MV-HEVC) extensions, JCTVC-R1013, Sapporo, JP, July, 2014 [5] C. Rosewarne, K. Sharman, M. Naccari, G. Sullivan, HEVC Range extensions test model 6 encoder description, JCTVC-P1013, San Jose, US, Jan. 2014 [6] Y.J. Ahn, W.J. Han, and D.G. Sim, Study of decoder complexity for HEVC and AVC standards based on tool-by-tool comparison, SPIE Applications of Digital Image Processing XXXV, Proceedings of SPIE, vol. 8499, pp. 8499-32, San Diego, USA, Aug. 2012. [7] C. Chi, M. Alvarez-Mesa, J. Lucas, B. Juurlink, and T. Schierl, Parallel HEVC decoding on multi- and many-core architectures, Journal of Signal Processing Systems, vol. 71, no. 3, pp. 247-260, June, 2013. [8] H. Jo, D. Sim, Hybrid Parallelization for HEVC Decoder, Image and Signal Processing (CISP)., vol. 1, pp. 170-175, Dec. 2013. [9] Chi, C. C., Alvarez-Mesa, M., Bross, B., Juurlink, B., and Schierl, T, SIMD acceleration for HEVC decoding, IEEE Trans. on CSVT., no. 99, Oct. 2014 [10] T. Hwang, Y. Ahn, J. Ryu, D. Sim, Optimized Implementation of Interpolation Filter for HEVC Encoder, Journal of The Institute of Electronics Engineers of Korea, vol. 50, no. 10, pp. 199-203, October, 2013 [11] W-H. Chen, C. H. Smith, and S. C. Fralick, A fast computational algorithm for the discrete cosine transform, IEEE Trans. on Commun., vol. 25, no. 9, pp. 1004-1009, Sep. 1977. [12] T. Hwang, Y. Ahn, D. Sim, SIMD instruction-based HEVC encoder optimization, IPIU, Feb. 2013 [13] Intel, Intel Advanced Vector Extensions Programming Reference, Technical Report 319433-011, Intel, June, 2011. [14] D.Flynn, C.Rosewarnem Common test condition and software reference configurations for HEVC range extensions, document JCTVC-L1006_v2, Geneva, Jan. 2013-2014 2 : - 2014 3 ~ : - ORCID : http://orcid.org/0000-0003-0707-6553 - :, - 2010 2 : - 2012 2 : - 2012 3 ~ : - ORCID : http://orcid.org/0000-0002-0012-0905 - :, - 2013 2 : - 2015 2 : - ORCID : http://orcid.org/0000-0001-7004-7451 - :,
3 : SIMD HEVC RExt (Jung-Soo Mok et al.: SIMD Instruction-based Fast HEVC RExt Decoder) - 1993 2 : - 1995 2 : - 1999 2 : - 1999 3 ~ 2000 8 : - 2000 9 ~ 2002 3 : - 2002 4 ~ 2005 2 : University of Washington Senior research engineer - 2005 3 ~ : - ORCID : http://orcid.org/0000-0002-2794-9932 - :,,