Representation, Encoding and Intermediate View Interpolation Methods for Multi-view Video Using Layered Depth Images The multi-view video is a collection of multiple videos, capturing the same scene at different viewpoints. If we acquire multi-view videos from multiple cameras, it is possible to generate scenes at arbitrary view positions. It means that users can change their viewpoints freely and can feel visible depth with view interaction. Therefore, the multi-view video can be used in a variety of applications including three-dimensional TV (3DTV), free viewpoint TV, and immersive broadcasting. However, since the data size of the multi-view video linearly increases as the number of cameras, it is necessary to develop an effective framework to represent, process, and display multi-view video data. In this paper, we propose a system to represent, encode, and reconstruct multi-view video based on an image-based representation, especially, using the concept of layered depth images (LDI). The proposed framework hierarchically represents various information included in multiview video based on LDI. In addition, we reduce a large amount of multi-view video data to a manageable size by exploiting an encoding technique, reconstruct original multiple viewpoints, and finally generate intermediate images using a view interpolation method of LDI. Keywords: Multiview video coding, Image-based Representation, Layered Depth Image, View interpolation
I II
III
IV
x y z u v C C 1 (x 1, y 1, z 1 ) x 1 (u 1, v 1 ) X(x, y, z) C 2 (x 2, y 2, z 2 ) C 2 (x 2, y 2, z 2 ) x 2 (u 2, v 2 )
C 1 C 2 C 1 V 1 P 1 A 1, C 2 V 2 P 2 A 2 (1) V P A x 1 (u 1, v 1 ) x 2 (u 2, v 2 ) T 1,2 C 2 C 1 1 C 1 (x 1, y 1, z 1 ) C 2 (x 2, y 2, z 2 ) x 1 x 2 w 2 y 1 y 2 w T 2 1,2 (2) z 1 z 2 w 2 1 w 2 z 1 (x 2, y 2 ) w 2
C c 11 c 12 c 13 c 14 C A E c 21 c 22 c 23 c 24 (3) c 31 c 32 c 33 c 34 α x s x 0 R 11 R 12 R 13 T 1 A 0 α y y 0, E R 21 R 22 R 23 T 2 (4) 0 0 1 R 31 R 32 R 33 T 3 c 11 c 12 c 13 c 14 C c 21 c 22 c 23 c 24 (5) c 31 c 32 c 33 c 34 0 0 0 1 A α x α y x, y s x 0 y 0 (x 0 y 0 ) E R T C C [0 0 0 1] C 0, C 1,..., C 7
.,., H.264/AVC. 8
C i j (0 i, j ) C i C j α(0 α 1) C ij α C i (1 α) C j (6) N ij N i N j, R(θ) cos 1 (N i N j ) (7) i j(0 i, j ) N i N j N ij R(θ) V
VI [1] J. Shade, S. Gortler, and R. Szeliski, "Layered Depth Image," Proc. of ACM SIGGRAPH, 1998, pp. 291-298. [2] "Call for Evidence on Multi-view Video Coding," ISO/IEC JTC1/SC29/WG11 N6720, Oct. 2004. [3] "Call for Proposals on Multi-view Video Coding," ISO/IEC JTC1/SC29/WG11 N7327, Jul. 2005. [4] "Subjective Test Results for the CfP on Multi-view Video Coding," ISO/IEC JTC1/SC29/WG11 N7779, Jan. 2006. [5] "Description of Core Experiments in MVC," ISO/IEC JTC1/SC29/WG11 N8019, Apr. 2006. [6] H. Shum and S. Kang, "Survey of Image-based Representations and Compression Techniques," IEEE Trans. on Circuits and Systems for Video Technology, Vol. 13, No. 11, 2003, pp. 1020-1037. [7] S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, "The Lumigraph," Proc. of ACM SIGGRAPH, 1996, pp. 43-54. [8] M. Levoy and P. Hanrahan, "Light Field Rendering," Proc. of ACM SIGGRAPH, 1996, pp. 31-42. [9] S. Seitz and C. Dyer, "View Morphing, " Proc. of ACM SIGGRAPH, 1996, pp. 21-30. [10] http://grail.cs.washington.edu/projects/ldi/ [11] T. Kanade, P. Rander, and P. Narayanan, "Virtualized Reality: Constructing Virtual Worlds from Real Scenes," IEEE Multimedia Magazine, Vol.1, No. 1, 1997, pp. 34-47. [12] W. Matusik, C. Buehler, R. Raskar, L. McMillan, and S. Gortler, "Image-based Visual Hulls," Proc. of ACM SIGGRAPH, 2000., pp. 369-374 [13] J. Yang, M. Everett, C. Buehler, and L. McMillan, "A Real-time Distributed Light Field Camera," Eurographics Workshop on Rendering, 2002, pp. 77-85. [14] C. Zitnick, S. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, "High-quality Video View Interpolation using a Layered Representation," Proc. of ACM SIGGRAPH, 2004, pp. 600-608. [15] L. McMillan, "An Image-Based Approach to Threedimensional Computer Graphics," Ph.D. dissertation, Univ. North Carolina at Chapel Hill, 1997. [16] Interactive visual media group at Microsoft Research, http://research.microsoft.com/vision/interactive VisualMediaGroup/3DVideoDownload/ [17] "Generation and Coding of Layered Depth Images for Multi-view Video," ISO/IEC JTC1/SC29/WG11 m12485, Oct. 2005. [18] S. Yoon, S. Kim, E. Lee, and Y. Ho, "A Framework for Multi-view Video Coding using Layered Depth Images," Lecture Notes in Computer Science (LNCS), Vol. 3767, 2005, pp. 431-442. [19] S. Yoon, E. Lee, S. Kim, Y. Ho, K. Yun, S. Cho, and N. Hur, "Coding of Layered Depth Images Representing Multiple Viewpoint Video," Proc. of Picture Coding Symposium, SS3-2, 2006, pp. 1-6. [20] J. Duan and J. Li, "Compression of the LDI," IEEE Trans. on Image Processing, Vol.12, No. 3, 2003, pp.365-372.