Learning to Listen: Generating Facial Motion Sequences

This neural network can model the listener's response based on the speaker's facial motion and audio.

Have a look at Learning to Listen – a "framework for modeling interactional communication in dyadic conversations." The network takes the speaker's audio and motion with the listener's past movement and creates several appropriate facial motions for the listener's response synchronous with the speaker.

The researchers aimed to model the conversational dynamics between a speaker and listener. They introduced "a novel motion VQ-VAE that allows us to output nondeterministic listener motion sequences in an autoregressive manner." The approach generates realistic, synchronous, and diverse listener motion sequences that outperform the previous State of the Art.

You can find the research with more examples here.

Where do you think this method could be used? Share your ideas and don't forget to join our new Reddit pageour new Telegram channel, follow us on Instagram and Twitter, where we are sharing breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more