logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

Researchers Present a New Speech-to-3D Animation Framework

DiffPoseTalk is based on the diffusion model combined with a style encoder, enabling it to produce 3D facial animations.

A team of researchers from Tsinghua University, Beijing Jiaotong University, and Tianjin University have recently shared a research paper introducing DiffPoseTalk, a new AI-based speech-driven stylistic 3D facial animation and head pose generation framework. Based on the diffusion model combined with a style encoder that extracts style embeddings from short reference videos, the model has the ability to generate realistic-looking 3D facial animations by utilizing speech, a shape template, and reference styles as inputs.

"During inference, we employ classifier-free guidance to guide the generation process based on the speech and style," commented the team. "We extend this to include the generation of head poses, thereby enhancing user perception."

"Additionally, we address the shortage of scanned 3D talking face data by training our model on reconstructed 3DMM parameters from a high-quality, in-the-wild audio-visual dataset. Our extensive experiments and user study demonstrate that our approach outperforms state-of-the-art methods."

Learn more about DiffPoseTalk here and don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on InstagramTwitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more. 

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more