Google AI has revealed its new deep learning-based approach that can estimate depth from videos where both camera and subject are in motion.
We are pretty good at making sense of the 3D world through 2D projections as humans, but the whole thing is not so easy when it comes to machines. The goal here is to develop a mechanism capable of achieving 3D world understanding by studying geometry and depth from 2D images via computation.
The problem here was that computers have hard times when both the camera and objects in a scene are in motion. The freely moving camera and objects can confuse algorithms as the traditional approach assumes the same object can be observed from more than one viewpoint at the same time, enabling triangulation. The assumption here needs either a multi-camera array or all objects being stationary while one camera moves through the scene.
The Google AI team used 2,000 “Mannequin Challenge” YouTube videos to teach an AI model. These videos feature groups of people acting like frozen characters from The Matrix, while a camera person moves through and records the scene. By learning priors on human poses and shapes from the data, the trained model can now estimate dense depth prediction on motion-motion videos without traditional direct 3D triangulation.
You can learn more about the research here.