My current research interests are broadly in the areas of image and video processing, computer vision, and using machine learning tools for the same.
Neural Radiance Fields (NeRF) can synthesize photo-realistic novel views of a scene from novel viewpoints, but require hundreds of images of the scene from different viewpoints. Their performance degrades significantly when only a few input images are available. The goal in sparse input NeRF is to guide the NeRF training to converge to plausible solutions when training with very few input views.
The goal of temporal view synthesis is to synthesize future frames of a video given the past frames, where the camera pose of both the past and future frames is known. TVS could be applied in both static and dynamic scenes. TVS finds applications in frame-rate upsampling of graphically rendered videos on low compute devices, streaming videos in remote presence applications with low transmission bandwidth, etc.
In the past, I have worked on video quality assessment and video prediction models.
For various applications, it is desirable to assess the quality of videos. For those applications where humans are the end users, the best and accurate method to evaluate the quality of the videos would be through a subjective study. Since subjective studies are cumbersome and difficult to scale, it is desirable to have an objective measure of quality, that correlates well with human judgement. VQA deals with designing objective measures to predict the quality of a given video.
Video Prediction refers to the problem to predicting future frames given a few past frames. This problem has gained traction with the recent success of deep learning and generative models. This problem has applications in video representation learning, anomaly detection, video compression, self driving cars, robotic path planning, and many more.