Making the Xbox Series X/S Trailer

Colin Urquhart, a Co-Founder and CEO of DI4D, shared the creative process of the Xbox Series X/S Power Your Dream trailer and talked about facial capture technologies.

About the Company

My name is Colin Urquhart, I am a Co-Founder and CEO at DI4D – we specialize in super high fidelity facial capture. DI4D was founded close to 20 years ago in Glasgow, UK by myself and Douglas Green, and since then, our technology has been used on a broad range of blockbuster movies, video games, and for research purposes. We also now provide our services in LA, where I’m based.

We’ve been part of some amazing projects recently. Most notably, DI4D’s facial capture technology was used in the production of the newly released movie Venom: Let There Be Carnage, as well as Codemaster’s F1 2021 video game. Working with REALTIME, we had the opportunity to use our new PURE4D pipeline on F1 2021, driving the animation of multiple digital doubles.

The Xbox Project

We had already worked with MPC on previous projects such as Blade Runner 2049 and the official trailer for The Last of Us Part II, so the team was familiar with our technology. They wanted facial capture data at the highest fidelity possible to create an accurate digital double of Daniel Kaluuya and chose to bring us in for the Xbox Series X/S Power Your Dreams trailer. The data needed to be faithful to Daniel’s likeness and performance to drive a convincing end result, so our goal was to capture high-quality data and then process that data efficiently. We’re always aiming for photoreal likeness.

Facial Performance

Daniel Kaluuya was captured with two systems, the seated DI4D PRO system, which captures the highest fidelity data, and a stereo, head-mounted camera system used to capture the final performance. The DI4D PRO provided data to MPC to build a character animation rig, with the performance data then used to drive the final facial animation.

The advantage of combining data from both the PRO System and a stereo-photogrammetry head-mounted camera is increased fidelity. While the PRO system can provide incredible facial performance quality, it requires an actor to be seated and, therefore, stationary. A head-mounted camera can’t replicate the same level of quality on its own, but it does allow the actor to perform freely and naturally in a space, making the data more faithful to a real performance. Combined together, these two sets of data can create incredible results!

The Process of Capturing Facial Expressions

Capturing facial expressions with the PRO system is actually incredibly easy. Because it’s video capture, we can track multiple expressions in a very short take. The PRO system itself comprises nine synchronized 12-megapixel machine vision cameras and uses standard video lighting without the need for markers, make-up, or structured light projection. This means actors don’t need any preparation – it’s as simple as sitting down and acting! We also show the actors a reference video of the facial expressions to follow.

The PRO system requires the actor to stay in a single physical position during capture. This works perfectly when capturing facial expressions and for capturing performance in less dynamic scenes. For example, we used the PRO system to capture the facial performance of Ian Hanmore for the CG short film, Home.

Preparing Captured Data

DI4D uses its own proprietary processing pipeline which is the result of almost twenty years of advanced research and development. While we didn’t create any particularly new technology for this project for MPC, we are constantly refining our in-house workflow and adapting it to individual project requirements.

Our pipeline starts by using our proprietary stereo-photogrammetry technology to reconstruct a 3D scan per frame from the synchronized video data captured with the DI4D PRO system. The client typically also provides us with a retopologized, high-resolution 3D mesh of the actor. We scan through the DI4D PRO data to find a frame in which the expression of the actor matches as closely as possible their expression in the static scan. Next, we transfer precisely the topology from the retopologized scan mesh across to that specific frame of 4D data. Finally, we track this mesh very accurately through the rest of the 4D sequence data. Our ability to track precisely all of the vertices on the 3D surface of the face as it deforms with each dynamic expression is what makes our solution so unique.

Making the Eyes Look Believable

Often, the eyes in animation aren’t tracked exactly – this is something we have started developing ourselves, but it wasn’t something we provided for MPC. The CG models for the eyes are usually built separately and animated later. This is mostly due to practicality.

Eye-tracking is required when the eye movement in the performance is accurate, such as when two actors are looking at each other in a scene. But often, animators will want to redirect the gaze due to other animated sequences taking place in the environment that wasn’t present during the facial performance capture session.

However, one of the most technically challenging things that we do provide is tracking the eyelids, making sure they track across the eyeball. This is where the most cleanup is required because the eyelashes can block the tracking. It’s also easy for the eyeball to penetrate the eyelids during tracking too, which doesn’t look very good. We dedicate a lot of focus to ensuring that the eyelids are accurate.

Development of Facial Capture Technologies

As with many areas of production, facial animation has developed at a rapid pace, with continual innovation over recent years. Much of this has been driven by evolving end-user technology. The latest game consoles and engines, for example, can display more detailed facial animation than ever before. With our recently launched PURE4D, game studios are now able to deliver high-fidelity facial animation at scale – faithfully recreating actor performances in the engine.

Over the next few years, facial capture may become more democratized – we’ve already seen smartphones used routinely in some pipelines. The challenge though, of delivering a believable, nuanced and realistic character performance remains as a high bar – and that’s something that is likely to require deeper knowledge and expertise in the space for the time being.

Colin Urquhart, Co-Founder & CEO of DI4D

Interview conducted by Arti Sergeev

Keep reading

You may find these articles interesting