A curious diffusion model from Google Research.
Google Research has presented 3DiM – its diffusion model for 3D novel view synthesis from a single image. Simply put, it takes one image and generates a 3D model, much like NVIDIA's GET3D, which creates 3D meshes.
The core of 3DiM is an image-to-image diffusion model, which allows the researchers to avoid the difficulties of designing and training architectures that jointly model multiple frames and enable training with datasets that have two views per scene.
3DiM takes one reference view and a relative pose as input and generates a novel view via diffusion. Then, 3DiM can generate a full 3D consistent scene, with the output frames generated autoregressively. According to the creators, 3DiMs are geometry free, do not rely on hyper-networks or test-time optimization for novel view synthesis, and allow a single model to easily scale to a large number of scenes.
The researchers claim that 3DiM's generated videos achieve higher fidelity compared to prior work on the SRN ShapeNet dataset while being approximately 3D consistent.
Find out more here and don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.