logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

Researchers Unveil a New 3D-Aware One-Shot Head Reenactment Method

The new method lets one animate character photos using live-action footage.

A group of researchers has introduced VOODOO 3D, a novel 3D-aware one-shot head reenactment method built upon a fully volumetric neural disentanglement framework that considers both source appearance and driver expressions, operates in real-time, and is capable of animating character photos using live-action footage, producing a high-fidelity result suitable for 3D teleconferencing systems based on holographic displays.

According to the team, their approach is based on a neural self-supervised disentanglement method, which involves transforming both the source image and the driver video frame into a shared 3D volumetric representation based on tri-planes. This representation can then be manipulated freely using expression tri-planes derived from the driving images and can be rendered from any view using neural radiance fields.

The disentanglement process is achieved through self-supervised learning on a large in-the-wild video dataset. Furthermore, the team has introduced a highly effective fine-tuning approach to enhance the generalizability of the 3D lifting, utilizing the same real-world data.

"Our head reenactment pipeline consists of three stages: 3D Lifting, Volumetric Disentanglement, and Tri-plane Rendering," commented the team. "Given a pair of source and driver images, we first frontalize them using a pre-trained but fine-tuned tri-plane-based 3D lifting module. This driver alignment step is crucial and allows our model to disentangle the expressions from the head pose, which prevents overfitting.

Then, the frontalized faces are fed into two separate convolutional encoders to extract the face features Fs and Fd. These extracted features are concatenated with the ones extracted from the tri-planes of the source, and all are fed together into several transformer blocks to produce the expression tri-plane residual, which is added to the tri-planes of the source image. The final target image can be rendered from the new tri-planes using a pre-trained tri-plane renderer using the driver's pose."

Learn more about VOODOO3D here and don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on InstagramTwitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more