Slava Smirnov and Gleb Sterkin discussed the use of their machine learning-based software to optimize the animation process in the CG world, compared other solutions for animation, and talked about financial cost and time for the production.
Please introduce yourself. What do you do? What companies have you worked for? How did you get into CG and machine learning?
Slava: I was doing some digital art and animation in Cinema4D back then in the early 2000s. Later on, I switched to managing digital development for BBDO and DDB (the world's best advertising agencies). Those dozens of global brands’ projects taught me a lot about working with creatives. Making ideas come true is the best experience I could get as a head of digital production. Yet, it was utterly painful to see global brands killing the most ambitious ideas. So I turned to more fundamental stuff - machine learning, learned math and Python programming language. This brought me to the greatest AI company for gamers (GOSU.ai) where I was leading B2B R&D for 2 years. This year, I started to endeavor machine learning utilized for creatives of all sorts. So we gathered a team of world's class machine learning researchers, and we are coming up with a bunch of machine learning for CG-solutions. We started on deep learning for animation.
Gleb: My name is Gleb, and I’m responsible for the tech side of things on our team. My career started in the science field and I was doing a Ph.D. for physics research. Later on, I switched to the industry side of things and went with machine learning from a business perspective. I had a chance to utilize data science for a US ride-sharing service (Uber competitor), did machine learning for the biggest digital bank in Europe, and eventually led the R & D deep learning team of TOP-5 tech companies in the world. Despite company names are not from the creative industry, I was always striving to make the most out of tech for the creative side of humanity.
Alright, let’s discuss current solutions for animation. Please tell us about the available ways.
Slava: I don't wanna sound too obvious but, generally speaking, there are 3 ways of getting something animated:
- do manual keyframing (time-consuming, skill-depended, doesn't scale)
- try to use program/procedural methods (hard to edit and maintain)
- use mocap solutions (flexible, scalable, some are cost-effective on the scale)
For mocap solutions, there are:
- mocap studios (expensive, not flexible)
- suits (gotta buy them before you act, deal with issues on metal surroundings, deal with suit sizes, still pricey for some)
- markerless solutions (low entry, cheaper than others, scale from small to big projects gracefully)
Gleb: Technically speaking, markerless mocap solves tasks of reconstructing 3D human pose by looking at pixels of the human body. At the end of the day, it comes down to the quality of the dataset and the quality of the algorithms, which are mostly based on various deep learning models, the latter requires extremely sparse talent and skills. That's why we've gathered extensive data and are spending a large portion of time on researching, perfecting our algorithms, and bringing it to the industry.
Let’s talk about machine learning. What is it and how can it be blended with animation? Could you give some examples?
Gleb: Machine learning is essentially a class of algorithms that build a math model of mapping something into something. This machine learning model looks at the data and learns to take A as an input and to output B (predictions). So for CG, there are multiple tasks that can be solved with a machine learning approach. For example, in animation it’s a task of predicting the next frame (B) from the current one (A) (motion matching), predict retargeting, predict 3D keypoints from 2D video, and many many more.
Slava: Yeah, in plain English we look at machine learning as a concentrated knowledge of human motion data. For example, information from 2D video and predicted 3D keypoints. Essentially, it’s the task of reconstructing a 3D human pose from a video. Thus we are having markerless mocap solutions.
Slava: What’s interesting is that our technology lowers the tech barrier for the end-user. Game studio teams or individual creatives don't need to spend thousands of dollars on trying it out. They can just put their regular camera wherever the actor is and see mocap data thousands of kilometers away. What's also important is that our greatest team could achieve software to work in real-time. Meaning our users can see their mocap data straight away.
Gleb: Gotta add up machine learning alone is not enough for good 3D key points from the video but combined with other advanced programming techniques it produces quite impressive results.
Slava: Now it’s a matter of bringing it to the industry, learn industry requirements, and perfect it on a request basis. That's why we invite all sorts of creatives to join us for free of charge creative collaborations. So if you have a project in mind where our software can benefit your work drop us a line here.
How do you see machine learning optimize production? It’d be great if could share some numbers. We all hear it's pretty efficient, but we don’t usually hear about time and money.
Slava: One of the main purposes of machine learning is the democratization of quality and speeding up processes for mid-teams and indies. I'll share an example from one of our pilot projects. The team is making a horror game with roughly dozens of unique characters and some number of cutscenes. The choice for them was to either go with keyframing which they had doubts about scalability or try to find something which would let them get a lot of animation data and experiment on the go. For character animation, they had to get the characters' basic states (idle, attack, walk, etc). They are shooting videos of those with their regular camera, putting it to our software, and are having mocap data right away (like in a matter of minutes). So speaking of numbers, they had 17 characters, 14 states each. For keyframing, they would spend roughly 4-5 hours for each state. 70 hours for one character's states. 1200 hours for all 17 characters' states. With a cost of 20$/hour for keyframing that would take them roughly $24K and half a year of work for all characters. With our solution, the cost is multiple times lower and the time spent to get it is like roughly a month something.
Gleb: Not to mention cutscenes. Basically, our technology-enabled the team to go for their vision with their most ambitious goals. This is what makes us most happy about what we do. After initial tests, we asked them - would you guys prefer it on an engine side of things? And their reaction just blew us away. So we are bringing it to UE4 at the moment.
Slava: In general, the way our software works for them the following way: they have real-time mocap, later on, they go for some polishing and eventually get it all up and running for the scene significantly quicker compared to other ways. This saves them tons of money to invest into gameplay, marketing and ultimately lets them iterate for a better product faster. The problem we are facing on our side is the number of requests is above our capacity to handle them manually. So these days we are turning the software into a service where anyone can plug their camera and get their mocap data right within the engine in real-time.
Impressive. What if I wanna get started with machine learning? What skills do I need?
Gleb: Machine learning is a field at the intersection of computer science and math. So you should get yourself familiar with both. You want to know the python, some algorithms and data structures plus linear algebra, calculus, and optimization. After that, you would need to dive deeper into the machine learning and/or neural network algorithms. Some areas of machine learning are more "user-friendly", like working with tabular data, basic classification tasks, but others, like generative models, working with 3D, video, text and audio synthesis is still in the active research phase. So it takes an effort to stay on track with the current state of the art within machine learning.
Slava: If you ask me if I wanna go and learn machine learning again, I would say hell, no. The amount of pain by feeling stupid was never that high in my life, and it took me years to feel comfortable with the field. Unless you were always dreaming of building AI systems, my advice would be don't rush for another fancy word spending years of life trying to figure it out. We all would be better off perfecting our fundamental field skills, and I deeply believe that's a way to prosper in career and life.
Let’s say I own a small or mid-tear studio with a limited budget. I need to animate a lot of characters and some of them should be top-notch. What solutions would you propose in this case?
Slava: If you have like 1-2 characters, you can go for keyframed states. If you have like 3+ characters and wanna save some money for perfecting game mechanics, I would go for markerless solutions like ours. When you feel comfortable with an in-engine preview of mocap data, add up a bit of keyframing labor to perfect it, and I guess you are good to go. One should keep in mind game development and animations for games are iterative processes, there's no limit to perfection, and tools like ours are aimed to help you with that.
And what about companies with huge budgets? What are some of the most advanced solutions out there?
Slava: We gotta keep in mind corporations are not about providing democratized solutions for other studios. They are into the optimization of their own costs with the help of machine learning. If you look at, say, Ubisoft's case, they have, obviously, an ability to shoot mocap in almost unlimited numbers at early stages. The average number of short clips (states) they have for a game is 15K. Yet, when it comes to later stages of development tweaking those 15k states gets messy. Thus inefficiency for them happens on both the reshooting mocap and iteration side of things. So they try to optimize that by redefining their mocap approach with machine learning.
Gleb: Yup, so instead of hardcoded graph-based states they go for flexible tag systems. With machine learning which combines 2 animation states, it essentially allows them to mix stuff on later iteration stages fluidly. So, imagine, the director comes in and says let's aim for more, let's have all characters walking be dependent on the amount of damage dealt. You can picture this is a nightmare for the team to add and check another 5K states. Here comes their machine learning. And when they need to tweak a walk with damaging behavior, they don't have to shoot another 5K animations. They shoot a couple and mix with the existing 15K by machine learning. Kinda production version of motion matching.
Slava: That's promising, it gives them a bit of flexibility, saves time and money on their scale but the approach is not a holy grail. It has limitations depending on the mocap, requires years of research and there are creative control issues. In general, that's an example of some advanced machine learning stuff only huge companies can afford for their own needs of optimization. Another thing corporations are historically good at are hands animations but we'll see about that soon. Stay connected with 80lv to know more.
Haha, we'll see! Ok, what do you think about the future of animation/CG? What impact do you see ML is bringing to the industry?
Gleb: A notable effort is being made in the deep learning research community in the CG world. There is work on the usage of neural networks in the rendering computation pipeline. From rendering point clouds without constructing a mesh or without an explicit texture all the way to using no geometry at all. There is also a lot of research in the area of geometry-free image editing. Moving closer to the animation workflow, deep learning research was done on controlling the character, attempts on efficient keyframe interpolation, or reduction of the memory footprint for motion matching. There is a lot of research on various topics happening, but the gap between cutting-edge deep learning research and an artist is normally measured in years. It is one of our goals to bridge the gap and bring tools that the CG world can easily use in their day-to-day work.
Slava: We believe machine learning-based products are making a lot of the current solutions obsolete, free up space for more creative work, and democratize quality for mid and indie teams. At the current pace, It’s not a question of IF but it's a question of WHEN the transition ends. In terms of animation, good keyframing is not going away, mocap studios will continue to experience a decline, suits, I don't see a reason why you would pay thousands of dollars to give them a try, yet they would have their niche. Due to the fact deep learning is an encapsulated knowledge of human data, it will let quality content to be produced a bit easier. Good news – you can go for some solutions now.