Colin Urquhart, Co-Founder and CEO of DI4D, talked about the company, their large film projects, DI4D PRO and HMC systems and workflow, and more.
In case you missed it
You might find these articles interesting
About the Company
Colin Urquhart: DI4D’s mission is to deliver the most true-to-life, performance-driven facial animation possible for high-end video games, cinematics, and blockbuster movie projects. Our proprietary, industry-leading technology captures the finest details in the expressions and emotions of an actor’s facial performance, which allows us to really bring their CG counterparts to life. As opposed to many traditional facial capture solutions, we believe that it’s far better to create realistic facial animation by capturing the subtleties and nuance of a real-life actor’s facial performance with life-like precision. The result is a true, photo-real, digital double.
At the core of DI4D, there’s a team of industry experts who have been leaders in the field of 3D and 4D facial capture for over 20 years. Prior to forming DI4D as a spin-out company in January 2003, co-founder Douglas Green and I were researchers at the Universities of Glasgow and Edinburgh. We originally formed the company to exploit our research and innovations in stereo photogrammetry-based 3D capture technology, which was derived from my own PhD research.
DI4D initially pioneered the use of photogrammetry to capture the 3D shape and appearance of real-life people to create realistic video game versions of themselves. Back then we called them “virtual clones”, but now they have become known as digital doubles. Early on, we developed demonstrations of virtual clones of ourselves in GTA2 and Unreal Tournament 2 but video game graphics at the time were not detailed enough to use our virtual clones. Later, as we moved into the PS3 / Xbox 360 generation of consoles, our tech was adopted by video game companies like EA, who used it to capture star player likeness for FIFA. Our technology has always been ahead of its time, waiting for game technology to catch up. We’re entering another exciting change of console generation again now.
We've also been pioneering the use of stereo photogrammetry and optical flow tracking for 4D facial performance capture since 2005. Not only does 4D capture produce much higher fidelity data than traditional facial mocap, but it also eliminates the need for any markers, makeup, and structured light. We initially provided 4D facial capture systems for research use in fields such as psychology and facial surgery, before focussing increasingly on entertainment applications such as movie VFX and pre-rendered cinematics. We are now seeing a rapidly increasing level of interest in the use of 4D facial capture for next-gen in-game animation. We've expanded our company from the early days of just Dug and I in a tiny office in Glasgow, to a team of twenty highly skilled engineers, motion editors, and support staff, across Glasgow, Scotland, and Los Angeles, California.
What Projects Have You Worked On?
Charlie and the Chocolate Factory (2005) was the first project we worked on with film industry talent, testing 4D facial performance capture of the actor Deep Roy for the movie's Oompa Loompa characters. This was a great test for our technology early on and it made a significant impact on our development going forward. We spent the next 5 years refining our technology for use by several leading research customers, which is a great testament to the accuracy of our system. Later, in 2011, we provided Axis Animation with facial performance capture for all the characters in their E3 launch trailer for Dead Island, which went viral.
In 2014 we worked on the movie La belle et la bête (Beauty and the Beast), capturing the facial performance of Vincent Cassel for his CG Beast character – we tracked every detail of his facial performance with the DI4D PRO system. It was incredible work from Vincent, it demonstrates how important performance is for facial capture.
Shortly afterwards, we captured the facial performance of Oscar-nominated actress Angela Bassett – the facial capture data was used to drive her character's facial expressions and performance in the successful Tom Clancy Rainbow Six Siege video game trailer, created by Blur Studio. Both the La belle et la bête and Tom Clancy projects still stand up today as great examples of high-fidelity, facial performances.
Remedy Entertainment made extensive use of our DI4D PRO and DI4D HMC systems for their award-winning video game Quantum Break, published by Microsoft Studios. Quantum Break was notable for fusing live-action footage with video game action and made extensive use of in-game digital doubles of lead actors Shawn Ashmore, Aidan Gillen, Dominic Monaghan, Lance Reddick, and Courtney Hope. With the same actors appearing in the live-action and as digital doubles in the video game, it was important that the facial animation was as true-to-life as possible.
As well as being one of the first projects to use DI4D for in-game facial animation, it was also one of the first projects to make use of an early version of our Head-Mounted Camera system. More recently, we provided HMC based 4D facial performance capture for the cinematic cut scenes in Call of Duty: Modern Warfare with Infinity Ward and Blur Studio.
One of our highest-profile movie projects to date was Blade Runner 2049 with MPC and Warner Bros. – the movie won the 2017 Academy Award for Best Visual Effects. Rachael’s return in Blade Runner 2049 was one of the most technically challenging VFX scenes in the film, and her appearance was kept strictly under wraps until it was released. To help MPC achieve the required degree of photo-realism, our team developed the capability to track even higher resolution meshes than we had previously, allowing us to attain an unprecedented level of fidelity that could resolve the subtlest of facial expressions and neck motion.
We've shot many movie projects with high profile actors that were slated for release in 2020. However, Bloodshot, for which we shot Vin Diesel’s facial performance, is the only one to have actually made it onto cinema screens so far. Watch this space, there’s more to come!
How DI4D PRO and HMC Systems Work
Our DI4D PRO system comprises nine synchronised 12-megapixel machine vision cameras that capture a seated actor’s facial performance and produces the highest fidelity, colour 4D facial performance data possible. The cameras are arranged as three stereo pairs of greyscale cameras, each with an additional colour camera. Our process uses standard video lighting and doesn’t require markers, make-up, or structured light projection. We currently operate two DI4D PRO systems, one based at our Los Angeles studio, and the other at our office in Glasgow, Scotland, UK. Both DI4D PRO systems are highly mobile and are used regularly for on-location shoots e.g. on or near set, beside a mocap stage, or in an audio recording booth.
Our new DI4D HMC system allows high fidelity 4D facial performance capture simultaneously from multiple actors who can move around freely without inhibiting their acting performance. It is a wireless, helmet-mounted stereo greyscale camera system that draws on all the years of experience we have gained using our earlier HMC and third party systems. We have put a huge amount of design effort into ensuring that the DI4D HMC is as lightweight, and comfortable as possible, while still recording high-quality stereo video data and maintaining a high degree of camera stability, which is essential to obtaining good quality 4D data. For shoots that require dynamic facial capture with the free movement of actors, the DI4D HMC is the ideal solution. It’s often used on a motion capture stage in conjunction with body motion capture and audio recording to allow simultaneous full performance capture from several actors in unison.
Our 4D processing pipeline is similar whether the data is captured with a DI4D PRO or DI4D HMC system. First, our proprietary stereo photogrammetry software is used to reconstruct a dense 3D scan of the actor’s face in every frame of the captured performance. This means that we know the shape of the actor’s face in every frame, and do not need to infer it from sparse data, such as markers or prominent facial features, as other systems have to do. Secondly, our proprietary optical flow tracking software is used to track every vertex in 3D through the reconstructed 3D scan sequence, ensuring a very high degree of consistency and detail. The resulting facial animation is generally provided as pointcache data. It is important to note that this pointcache animation does not require a traditional facial animation rig and, as a result, the realism of animation is not limited by the quality of a rig. However, the pointcache animation can then be solved onto an animation rig if required.
Optimization of Raw Data
The DI4D PRO system does capture a large amount of uncompressed video sequence data: approximately 1TB for every 2.5 minutes of capture. The DI4D HMC uses per frame compression and lower resolution cameras to capture more manageable amounts of data. A whole day of DI4D HMC video capture is usually much less than 1TB.
When we process the captured video sequence data to produce a raw 3D scan per frame, this can result in a large amount of data. However, after we use optical flow tracking to track the customer’s mesh through the 3D scan sequence, the resulting pointcache animation, with fixed mesh topology, is surprisingly lightweight! We can also apply a final stage of processing to “solve” the pointcache animation to a simple rig comprising blend shapes pulled from the pointcache data. This results in an extremely compact data set, which still reproduces the actor’s exact facial performance very faithfully.
Both pointcache animation and blend shape rig animation are readily ingested by all major animation packages and game engines, making them highly compatible for diverse projects.
What Direction Is the Industry Moving In?
Video game graphics have been becoming more realistic with every release, and this trend will be accelerated by the launch of the new generation of more powerful video game consoles. However, this creates a key challenge for video game developers: how to generate ever more detailed and realistic content while staying within time and cost budgets. For example, it’s becoming increasingly difficult to use hand animation or re-targeting solutions to keep clearing an ever-higher bar for realism in facial animation.
The solution to obtaining the required level of realism is increasingly to capture assets from real life rather than trying to create them artistically. This is fuelling the growth in the appearance of digital doubles in video games, and fuelling growth in the use of motion capture and facial performance capture to provide the animation for these digital doubles.
In parallel, with the advancements in powerful graphics hardware, the increasing graphical sophistication of video game engines is blurring the traditional distinction between pre-rendered cinematics and in-game graphics rendered in real-time. The same techniques and assets are increasingly being used for both types of content. The use of digital doubles and high-end facial performance capture solutions, such as DI4D, are progressively being used for in-game animation too.
We believe that facial animation for human characters in video games will become more and more like traditional cinematography, with the actors’ performances being captured and applied directly to their digital doubles. This will drive an increased emphasis on good acting performance and good direction, with less requirement for hand animation and “tweaks”. With our years of experience at the forefront of 4D facial capture of actors for movies and pre-rendered animation, DI4D is well placed to lead this digital double revolution into video games.