EMO: Emote Portrait Alive will make anyone sing your favorite song.
Would you like to hear Audrey Hepburn singing Ed Sheeran's song? Well, you can with EMO, a method that makes portraits talk and sing.
It takes a single image and audio as input and produces a video, combining the two. The avatar doesn't mindlessly open its mouth, it shows appropriate emotions and moves its head, which looks pretty realistic based on the examples the authors from the Institute for Intelligent Computing and Alibaba Group showcased.
Image credit: Linrui Tian et al.
"In the initial stage, termed Frames Encoding, the ReferenceNet is deployed to extract features from the reference image and motion frames. Subsequently, during the Diffusion Process stage, a pretrained audio encoder processes the audio embedding. The facial region mask is integrated with multi-frame noise to govern the generation of facial imagery. This is followed by the employment of the Backbone Network to facilitate the denoising operation. Within the Backbone Network, two forms of attention mechanisms are applied: Reference-Attention and Audio-Attention. These mechanisms are essential for preserving the character's identity and modulating the character's movements, respectively. Additionally, Temporal Modules are utilized to manipulate the temporal dimension, and adjust the velocity of motion."
The method generates videos of any duration depending on the length of the input audio and works on images of various styles: realistic, stylized, anime, and so on. It recognizes tonal variations and can make avatars speak different languages. They can even rap if the sounds are clear enough.
However, it's not perfect yet. You can see the characters struggling with finer movements, and their tongues sometimes don't quite follow the sounds. You will see it clearly in Jennie's Solo example.
Id you want to learn more about the technology, find the project here. Also, join our 80 Level Talent platform and our Telegram channel, follow us on Instagram, Twitter, and LinkedIn, where we share breakdowns, the latest news, awesome artworks, and more.