Evgeny Dyabin from the Cascadeur team discussed the logic behind their autoposing feature implemented using standard deep learning methods.
In case you missed it
You might find these articles interesting
Introduction
Deep learning tools are becoming increasingly important in software development. Game development also benefits from this new area of computer science, as can be seen from the example of Nekki’s new character animation tool Cascadeur. In his much-noticed 2019’s blog post, Cascadeur’s team leader Evgeny Dyabin already talked about the development progress of Cascadeur's AutoPosing tool. This powerful tool, however, has more interesting nuances worth discussing. In particular, Evgeny’s last year’s blog post never touched upon how several neural networks work together to constitute a complete instrument. In the following article that Evgeny wrote for 80.lv, he describes the approach they use to implement a relatively advanced functionality using only standard deep learning methods.
Defining the Problem
Evgeny: Our intention is to give a user the means to quickly set up character poses. The user would be able to control only a handful of main points, while the tool would position every other point automatically, keeping the pose realistic.
Using a fully connected neural network implies fixed input and output, so we’ve created several networks with a different number of input points: 6, 15, 20, and 28 points out of 43 total points in a character. Points used as input for every network level are colored green in the images below.
The issue with using levels of detail is that if we want to move a point on the 4th level, we’ll have to input all 28 points. And yet we do not want to force users to edit all these points. Our goal is to give an option to only move some of them. How can this be done?
Our solution is to take into account the nesting of the input data, use the physical model, and combine the outcomes.
Input Data Nesting
We have selected levels of detail with the unique feature of hierarchical nesting.
A set of input points for a neural network of each level includes every point from the previous level and adds several new ones. This allows us to use the output from one network as an input for the next one.
Combining the Outcomes
Let’s take a look at how the tool works. In our example, the user has positioned all of the 6 main points and tries to adjust the spatial orientation of the left hand using the points from the second level of detail.
When you change the position of any point other than the main 6, the tool memorizes it and starts using it in the calculations of the positions of the other points.
The tool works in several stages, depending on the points edited. This process is illustrated in the image below.
In the beginning, the first-level network positions all 43 points of the character using 6 main points as the foundation. Then, higher-level networks are called one after the other. Each successive network uses more detailed input data, either updated by the user or output by the previous-level network. This way, we are able to simultaneously use several networks with varying levels of detail.
Physical Accuracy
Sometimes, the imperfections of machine learning models, as well as the fact that our network predicts the global positions of the points, result in the incorrect edge lengths in the output pose. This issue is resolved by running an iterative physics simulation that restores the edges to their initial length. If we reduce the number of iterations in the software settings, we’ll be able to see how it affects the outcome.
To prevent incorrect poses from being used as input data, this process is called after each network level finishes working.
Conclusion and Future Plans
The instrument we created has proved its usefulness, especially at the early stages of the animation pipeline, where rough, approximate poses are needed. In time, we plan to add support for custom-made humanoid skeletons, as well as to make the tool more precise and stable.
We also explore general deep learning solutions that are already used in other fields for tasks like restoring parts of the image or transferring visual style between images. These solutions can be used to add the desired characteristics to poses and animations.
Follow the news at cascadeur.com!
Evgeny Dyabin, Cascadeur’s Team Leader
The article was originally posted on DTF, follow this link to read it in Russian.
Keep reading
You may find this article interesting