Skip to content


Monoscopic to Stereoscopic video conversion of  YouTube videos

Speakers: Debargha Mukherjee and Chen Wu, Google, Inc.


YouTube holds one of the largest stereoscopic 3D video corpuses in the world. Since 2009 YouTube has supported ingestion and transmittal of 3D content. In 2011, such support was standardized to support uploads from compliant 3D cameras and smartphones as well as to enable seamless playback of transcoded videos in 3D on compliant stereoscopic displays and smartphones. YouTube 3D channel was also launched in 2011, which received many accolades from 3D enthusiasts. In late 2011, YouTube released a beta service, to convert a regular monoscopic video into stereoscopic 3D using a one-click option. In order to further accelerate the growth of available 3D content for consumption, recently this service offering was broadened to support automatic conversion of a subset of high definition videos uploaded to YouTube. By leveraging the power of parallel video processing on Google’s cloud infrastructure, a sophisticated algorithm derives a per-pixel depth map followed by depth-image based rendering of the requisite left and right views. The conversion algorithm further leverages existing stereoscopic videos in the vast YouTube corpus to learn interesting depth statistics in relation to image features, which are then applied to monoscopic videos to generate per-pixel approximate depth maps. In order to generate a depth-map consistent with objects in a scene, a novel but computationally inexpensive refinement mechanism is next applied to these approximate depth-maps, before rendering of the left-right views. In this talk, we will take the audience through 3D support on YouTube in general, as well as our 2D to 3D conversion service.


Debargha Mukherjee received his B. Tech. degree from the Indian Institute of Technology, Kharagpur in E&ECE in 1993, and his M.S. and Ph.D. degrees in ECE, from the University of California, Santa Barbara in 1995 and 1999 respectively. Between 1999 and 2010, he was with Hewlett Packard Laboratories in Palo Alto, USA, as a scientist conducting research on image and video compression and communication. Since 2010 he has been with Google Inc., where as a backend video transcoding/processing specialist, he has been in charge of incorporation of 3D backend support in YouTube including 3D conversion.

Debargha’s areas of interest include video and image coding, processing, and communication; signal processing and information theory. He has published more than 70 papers in leading international conferences and journals, and holds 19 US patents in these areas. He is a Senior Member of the IEEE, and is currently serving as Associate Editor of the IEEE Transactions on Image Processing and the SPIE Journal of Electronic Imaging. In 2009, he served as the Technical co-chair of the Second International Conference on Immersive Telecommunications. He was the recipient of the IEEE student paper award at IEEE ICIP in Chicago, USA in 1998, and the co-recipient of the Top 10% paper award at IEEE MMSP workshop in Brazil in 2010.

Chen Wu received M.S. and Ph.D. degrees in EE in 2007 and 2011 respectively, both from Stanford University, and B.S. from Tsinghua University, China in 2005. Since 2011 she has been working at YouTube, on backend video transcoding and processing. Chen’s interest includes image and video processing, computer vision, and related fields. She has published more than 30 papers in international conferences and journals.

Holographic video and how it might become part of the 3D ecosystem

The exploration of holographic television — in which a diffractive display creates 3D images with all the perceptual cues, including focus and motion parallax — is nearly as old as the display hologram itself.  In this talk I provide a technical and historical overview of holographic TV and look at how some recent developments in consumer capture, compression, computation, and display could enable holographic TV quickly to become part of the 3D content and display ecosystem.

V. Michael Bove, Jr. holds an S.B.E.E., an S.M. in Visual Studies, and a Ph.D. in Media Technology, all from the Massachusetts Institute of Technology, where he is currently head of the Object-Based Media Group at the Media Laboratory, and co-directs the Center for Future Storytelling and the consumer electronics working group CE2.0. He is the author or co-author of over 60 journal or conference papers on digital television systems, video processing hardware/software design, multimedia, scene modeling, visual display technologies, and optics. He holds patents on inventions relating to video recording, hardcopy,interactive television, holography, and medical imaging, and has been a member of several professional and government committees. He is co-author with the late Stephen A. Benton of the book Holographic Imaging (Wiley, 2008). He is on the Board of Editors of the Journal of the Society of Motion Picture and Television Engineers, and served as associate editor of Optical Engineering. He served as general chair of the 2006 IEEE Consumer Communications and Networking Conference (CCNC’06), will co-chair the 2012 International Symposium on Display Holography, and is a member of Board of Governors of the National Academy of Media Arts and Sciences.