While working on Shadow Fight 3, the team set up about 1,100 combat animations with an average duration of about 4 seconds each. They decided that this would be a good start for training a neural network one day.
"During our work on various projects, we noticed, that animators can imagine the character’s pose by drawing a simple stick figure when making their first sketches," noted the team. "We thought that since an experienced animator can set a pose well by using a simple sketch, it will be possible for the neural network to handle it too."
They decided to take only 6 key points from each pose (wrists, ankles, pelvis, and base of the neck) and check if the neural network can predict the position of the remaining 37 points.
First, the network receives the positions of 6 points from a specific pose and then predicts the positions of the remaining 37 points. "We would then compare them with the positions in the original pose. In the loss function, we would use the least-squares method for the distances between the predicted positions of the points and the source."
For the training dataset, the team used movements of the characters from Shadow Fight 3, taking poses from each frame and getting about 115,000 poses.
The neural network architecture was based on a fully-connected five-layer network with an activation function and an initialization method from Self-Normalizing Neural Networks. "Having 3 coordinates for each node, we got an input layer of 6x3 elements and an output layer of 37x3 elements," states the team. "We searched for the optimal architecture for hidden layers and settled on a five-layer architecture with the number of neurons of 300, 400, 300, 200 on each hidden layer, but networks with fewer hidden layers also produced good results."
They noted that L2 regularization of network parameters was very useful since it made predictions smoother and more continuous. "A neural network with these parameters predicts the position of points with an average error of 3.5 cm. This is a very high average, but it’s important to take into account the specifics of the job."
Later, the team came up with the idea of training a few more networks with an expanded set of points that specify the orientation of the hands, feet, and head, plus the position of the knees and elbows. They have added 16-point and 28-point schemes, and they found out the results of these networks can be combined "so that the user can set positions to an arbitrary set of points. For example, the user decided to move the left elbow but did not touch the right one. In this case, the positions of the right elbow and right shoulder are predicted in a 6-point pattern, while the position of the left shoulder is predicted in a 16-point pattern."
The first version of this tool is already available in Cascadeur, and it appears that it might one day become the standard of the animation industry. Make sure to discuss the tool in the comments.