The company is working on a neural network-based technology that will add real-time emotions to Roblox avatars.
The Roblox team has shared a new blog post that sheds some light upon a cool technology the team is currently working on that should add real-time emotions to in-game avatars. The blog post describes a deep learning framework for regressing facial animation controls from a video that should provide natural and believable avatar interactions. According to the team, this technology is a crucial milestone in Roblox’s march towards making the metaverse a part of people’s daily lives.
The developers write that to accomplish the task, they use Facial Action Coding System or FACS, which defines a set of controls (based on facial muscle placement) to deform the 3D face mesh with the idea being for the deep learning-based method to take a video as input and output a set of FACS for each frame. To achieve this, the team uses a two-stage architecture: face detection and FACS regression.
To achieve the best performance, the team implemented a fast variant of the relatively well-known MTCNN face detection algorithm, tweaking the algorithm for their specific use case where once a face is detected, the MTCNN implementation only runs the final O-Net stage in the successive frames, resulting in an average 10x speed-up.
"Our FACS regression architecture uses a multitask setup which co-trains landmarks and FACS weights using a shared backbone (known as the encoder) as a feature extractor," comments the team. "This setup allows us to augment the FACS weights learned from synthetic animation sequences with real images that capture the subtleties of facial expression. The FACS regression sub-network that is trained alongside the landmarks regressor uses causal convolutions; these convolutions operate on features over time as opposed to convolutions that only operate on spatial features as can be found in the encoder. This allows the model to learn temporal aspects of facial animations and makes it less sensitive to inconsistencies such as jitter."