Temporal View Synthesis of Dynamic Scenes through
3D Object Motion Estimation with Multi-Plane Images

IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2022

Nagabhushan Somraj, Pranali Sancheti and Rajiv Soundararajan

Indian Institute of Science

Video Presentations
Video Comparisons
DeCOMPnet Model
IISc VEED-Dynamic Dataset
Qualitative Results
Citation

Technical Talks on this work

ISMAR 2022 prerecorded video:
18-Oct-2022: In ISMAR 2022. [Video]
01-Feb-2022: In IISc Student Research Seminar Series, 2022. [Video]

Sample comparison videos with other competing methods

Play the videos in the fullscreen mode for the best view

City02 - Seq00 - Single Frame Prediction; Video at 30fps

City02 - Seq00 - Multi Frame Prediction; Video at 30fps

Shaman3 - Albedo - Multi Frame Prediction; Video at 1fps

The challenge of graphically rendering high frame-rate videos on low compute devices can be addressed through periodic prediction of future frames to enhance the user experience in virtual reality applications. This is studied through the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video given the previous frames and the head poses of the previous and the next frames. In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving. We design a framework that decouples the motion into user and object motion to effectively use the available user motion while predicting the next frames. We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it. We employ multi-plane images (MPI) as a 3D representation of the scenes and model the object motion as the 3D displacement between the corresponding points in the MPI representation. In order to handle the sparsity in MPIs while estimating the motion, we incorporate partial convolutions and masked correlation layers to estimate corresponding points. The predicted object motion is then integrated with the given user or camera motion to generate the next frame. Using a disocclusion infilling module, we synthesize the regions uncovered due to the camera and object motion. We develop a new synthetic dataset for TVS of dynamic scenes consisting of 800 videos at full HD resolution. We show through experiments on our dataset and the MPI Sintel dataset that our model outperforms all the competing methods in the literature.

Temporal view synthesis for frame rate upsampling of dynamic videos

Alternate frames are graphically rendered and the intermediate frames are predicted using temporal view synthesis. Summary Figure

DeCOMPnet - Model Block Diagram

IISc VEED-Dynamic Database

The database consists of 200 diverse indoor and outdoor scenes (see samples below). We use Blender to render the videos. We obtain the blend files for the scenes mainly from blendswap and turbosquid. 4 different camera trajectories are added to each scene and thus we have a total of 800 videos. Motion is added to pre-existing objects in the scene or new objects are added and animated. The videos are rendered at full HD resolution (1920 x 1080) and at 30fps and contain 12 frames each.

Samples

Kitchen

Bedroom

Skyscraper

Forest

Kitchen

Bedroom

Skyscraper

Forest

Kitchen

Bedroom

Skyscraper

Forest

Download

Link: OneDrive
The above link contains the following data:

Sample data
Train and Test sets with RGB-D data, camera intrinsics and extrinsics. This data is password protected. Please fill this form to get the password.
Links to the original blend files.
README file that describes the data format

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License .

Qualitative Results

Visualization of outputs of various stages of our framework

Qualitative Comparisons on IISc VEED-Dynamic Database

Qualitative Comparisons on MPI-Sintel Database

Flow estimation with and without MPI

Citation

If you use our work, please cite our paper:

Nagabhushan Somraj, Pranali Sancheti and Rajiv Soundararajan, "Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images", In Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 817-826, Oct 2022, doi: 10.1109/ISMAR55827.2022.00100.

Bibtex:

@inproceedings{somraj2022decompnet,
title = {Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images},
author = {Somraj, Nagabhushan and Sancheti, Pranali and Soundararajan, Rajiv},
booktitle = {IEEE International Symposium on Mixed and Augmented Reality (ISMAR)},
pages = {817-826},
month = {October},
year = {2022},
doi = {10.1109/ISMAR55827.2022.00100}
}

Temporal View Synthesis of Dynamic Scenes through
3D Object Motion Estimation with Multi-Plane Images

Contents

Technical Talks on this work

Sample comparison videos with other competing methods

Play the videos in the fullscreen mode for the best view

Abstract

Temporal view synthesis for frame rate upsampling of dynamic videos

DeCOMPnet - Model Block Diagram

IISc VEED-Dynamic Database

Samples

Download

Qualitative Results

Visualization of outputs of various stages of our framework

Qualitative Comparisons on IISc VEED-Dynamic Database

Qualitative Comparisons on MPI-Sintel Database

Flow estimation with and without MPI

Citation