07 August 2025

Face-to-Face with AI: How Speech Graphics is Pioneering Human-to-AI Interactions

Learn how Speech Graphics, industry leaders in the field of audio-driven facial animation, enable face-to-face conversations between humans and artificial intelligence.

In case you missed it

You may find these articles interesting

Imagine picking up a microphone and seeing an avatar of yourself on a screen speaking the words that are coming out of your mouth in real time. As you speak into the microphone, your avatar moves and reacts emotionally to the sound of your voice, while perfectly lip-synching your words.

Now, enter another NPC avatar controlled by ChatGPT. It reacts to the sound of your voice and speaks back to you, also with perfect lip sync and emotional and nonverbal cues. The conversation is tight, with no noticeable delays in the AI's responses and very little lag between your voice and your avatar. Together, you discuss any topic that the LLM is primed for – in this case, how to escape an alien invasion while wearing red shirts!

Speech Graphics demonstrated this remarkable experience on the show floor at GDC 2025, where visitors to the booth were welcomed to slip into the AI-sphere and take over control of the human side of the conversation.

Seeing how their voice could affect the events unfolding on the screen was so entertaining and immersive, they would go on talking for minutes before realizing how far down the rabbit hole they had gone! The scenario was on a spaceship, during an alien attack, with a riff on the unlucky red-shirt crew members and whether they should go fight or run and hide.

On display was the latest technology from Speech Graphics, including their real-time, audio-driven character animation system called SG Com, as well as the company's Rapport platform, which seamlessly combines a multiplicity of cloud-based interactive technologies into a single service – in this case, marshalling Open AI's ChatGPT, Whisper ASR, and ElevenLabs emotional voices in combination with SG Com animation. The experience was produced in the Unreal Engine using MetaHuman character rigs, all driven via Speech Graphics/Rapport plugins.

This scenario is just an example. Speech Graphics technology can work with any type of character rig, any art style, and any language, so you could just as easily engineer a conversation between two anime characters speaking Japanese, for example. It can animate non-human characters as well, with any additional anatomy such as antennae (for those aliens). The technology can also plug into the Unity engine.

As game developers begin to explore the role of AI in games, this demo provides a visceral vision of what is possible in terms of automating content creation and giving users direct control over the experience, using nothing but their own voice. (The technology uses no cameras, only voice, so privacy is protected.)

Speech Graphics' audio-driven technology automates both speaking and listening behaviors. It drives not just accurate lip sync but also detects emotional content in the voice (e.g., positive vs negative), as well as detecting acoustic events such as laughter, grunts, and even breathing – all of which helps drive animation of facial expressions and body language. The goal is always to create the illusion that the animated character is genuinely the source of the sound you hear, by matching its behavior perfectly to the cues in the voice. The underlying muscle-dynamic models keep the movements of the character natural and believable, and because these are real-time models, behavior can be suddenly interrupted with minimal latency, adapting seamlessly to new motion goals.

The linguistic expertise of the team developing the technology helps it work well with any language, even fictional ones. There is no need to indicate which language to switch to. On the back end, the Rapport platform delivers a wide array of LLMs, voices, and other services to integrate into the experience, with an easy-to-use web interface. Adding the plugin to a game or application is easy and can be done in Unreal Blueprints.

With this groundbreaking fusion of audio-driven animation, emotional AI, and real-time interaction, Speech Graphics isn't just enhancing virtual conversations – it's redefining them. The result is a deeply immersive, fully embodied AI experience that feels less like talking to a machine and more like sharing a scene with a living, breathing character. Whether you're building games, virtual assistants, or entirely new kinds of narrative experiences, this technology opens the door to a new era of interactive storytelling.

Keep reading

Comments

0

Leave Comment

Ready to grow your game’s revenue?

Talk to us

Comments

0

We need your consent