Cory Strassburger, a co-founder of a VR Studio called Kite & Lightning, talked about using IPhone X to create facial animations.
Cory Strassburger, a co-founder of a VR Studio called Kite & Lightning, talked about using IPhone X to create facial animations.
Intro
Why iPhone X
Capturing Process
– Next I copy the text files from the iPhone X to the desktop via USB.
– The captured data needs to be reformatted for importing into Maya so I wrote a simple desktop app to do this. It takes the chosen text file(s) and converts them to Maya .anim files. This converter needs a lot of work. Because the .anim file require a lot of specific information about the names of your blendshapes, frame rate, etc.. I ended up hardwiring that data to my specific setup but if I rename a blendshape or blendshape node I have to change the data. Its a bit of a pain though that will vanish once I add some features to the converter.
– Lastly I import the .anim file into Maya (via the shape editor) and voila, your Maya character is mimicking what you saw on the iPhone during capture. Of course this data could drive any character assuming it has the same blendshape set. If not you could manually connect the animation data to any channel you want.
The whole process is so fast to just whip out the phone, plug in a mic and start recording and importing into maya is just a couple clicks. The beauty for me is the way my facial rig is setup now, I can theoretically import the data, clean up any artifacts pretty quickly (as there are so few), and then with the core part of my facial rig, animate quickly on top of the data to add more expressivity. I just got all this working so I haven’t tested that theory yet. Another big improvement thats not shown in either test is the jaw articulation is now way more physically accurate. In Unity, for quickly testing, I just linked the lower jaw, teeth and tongue to the “JawOpen” Blendshape which is a hack and not the way your mouth works. However with my actual facial rig, the jaw responds to the mouth movements more accuratly so the next tests should reflect that.
Performance
Haha, it is exactly the same data thats driving the emoji’s (Well, assuming apple didn’t keep any special treats locked up for themselves) though it’s all exposed via ARKit so no real hacking needed. I am no expert in facial animation tech but from my experience I’ve found that facial tracking technology that uses a depth camera like the iPhone X or Faceshift (which apple bought to create these features) can generate a more accurate facial model than one using a mono 2d camera. And from my observations its not a dramatic difference but usually I see it in the puckering and mouth articulations were the lips protrude. In 2d I think its hard to tell exactly what the mouth is doing without that extra depth data. I imagine the 2d algorithm has to cleverly make those decisions based on the mouth shape along with what the other facial shapes are doing.
One interesting discovery I made was there is a distance sweet spot where the depth data is most clean and accurate. Once I got within this range the stability improved quite a bit. I also discovered light contamination is a factor. I’m pretty sure the iPhoneX is generating its depth from capturing structured light which requires projecting an infrared pattern onto the face. And though the projection is infrared I noticed the light coming from my computer monitor would introduce noise as well as any other hard light sources in my house. I though a black room would be best but that was wrong too. It seems like a flat neutral lighting works best but I need to dig into the science a little to understand it better.
Content for Games
However, In one sense the iPhoneX is just a portable depth camera and they’ve incorporated the Faceshift tech in order to interpret the depth camera data into useful data to drive the blendshapes. (Or emojis!) And its the Faceshift tech thats made it easy for someone like me to quickly get results. (interpreting depth data into useable facial capture is not trivial.) The problem with building a program around this, is one day apple might just change that underlying tech and it would kind of break your capture program. You could just access the depth data and write your own algorithms for interpreting it but does it need to run on a $1,000 mobile phone? A desktop depth camera is pretty cheap and there are still some unknowns as to capturing with an iPhone X in real production situations.
I do think we can get some real use of this tech right now (ghetto style) and actually use it in our game and for marketing content and shorts. I also thing there is a lot more room for improving the last test I did with Rig & blendshape improvements so once i’ve max out what the iPhoneX data can do, it will be a better gauge for me. I also want to test out Faceware live and see how it does relative to my needs. If I can get better quality than what the iPhoneX can produce and its fast, I will definitely go that route. I’d also love to see what this data can do with something like the Snapper rig!
Ultimately I hope AR and Apples involvement will trigger more development in facial capture. Sadly we might have emoji’s and a pile of poo to thank, but who cares anyway!
Cory Strassburger, Co-Founder — Kite & Lightning.
Interview conducted by Kirill Tokarev.