The team's previous model, Codec Avatars 2.0, required using a complex rig.
In May, we shared a story on Meta's Codec Avatars 2.0 – the company's approach to generating VR avatars using advanced machine learning techniques. The tech uses cameras mounted into the headset to observe the person's eyes and mouth and virtually recreate even subtle movements, like moving eyebrows, squinting, scrunching nose, and more.
Generating an individual Codec Avatar with the previous version required a specialized capture rig called MUGSY with 171 high-resolution cameras. The latest version though allows generating an avatar with a scan from a smartphone with a front-facing depth sensor. For example, you can use any iPhone model that comes with FaceID. Users are required to pan the phone around their neutral faces, then again replicating 65 facial expressions.
The team states this scanning process takes 3 and a half minutes on average, but generating the avatar (in full detail) then requires six hours on a machine with four high-end GPUs. It appears that the tech would be used with cloud GPUs.
How did they replace 100 cameras with a single phone, you might ask. The new research relies on a Universal Prior Model 'hypernetwork' – a neural network that generates the weights for another neural network. The researchers claim they trained this UPM 'hypernetwork' by scanning the faces of 255 diverse individuals with the help of an advanced capture rig.
The researchers noted the current system has trouble dealing with glasses and long hair. It is also limited to the head, not the rest of the body. Learn more about the model here.