Depth Anything is a collaborative work from TikTok, The University of Hong Kong, and Zhejiang Lab.
Image credit: The University of Hong Kong et al.
Researchers from TikTok, The University of Hong Kong, Zhejiang Lab, and Zhejiang University presented Depth Anything, a new image-based depth estimation method that might make video editing easier.
Trained on 1.5 million labeled and 62 million unlabeled images, it provides impressive Monocular Depth Estimation (MDE) foundation models with these features:
Image credit: The University of Hong Kong et al.
The creators want to build "a simple yet powerful foundation model dealing with any images under any circumstances" without pursuing novel technical modules.
"We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos."
You can find more examples, the code, and training data on the project's page.
Blender Guru seems to approve of the tool:
Don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.