Learning to Listen: Generating Facial Motion Sequences

This neural network can model the listener's response based on the speaker's facial motion and audio.

Have a look at Learning to Listen – a "framework for modeling interactional communication in dyadic conversations." The network takes the speaker's audio and motion with the listener's past movement and creates several appropriate facial motions for the listener's response synchronous with the speaker.

The researchers aimed to model the conversational dynamics between a speaker and listener. They introduced "a novel motion VQ-VAE that allows us to output nondeterministic listener motion sequences in an autoregressive manner." The approach generates realistic, synchronous, and diverse listener motion sequences that outperform the previous State of the Art.

You can find the research with more examples here.

Where do you think this method could be used? Share your ideas and don't forget to join our new Reddit page, our new Telegram channel, follow us on Instagram and Twitter, where we are sharing breakdowns, the latest news, awesome artworks, and more.

Learning to Listen: Generating Facial Motion Sequences

Join discussion

Comments 0

You might also like

We need your consent