Using the dataset, the team built a network that requires just one image to determine which parts of the human body interact with the surroundings.
Computer Scientists Chun-Hao P. Huang, Hongwei Yi, Markus Höschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, and Michael J. Black have presented RICH, a cool dataset that contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, high-resolution 3D scene scans, and accurate vertex-level contact labels on the body.
Using RICH, the team managed to train a network that predicts dense body-scene contacts from a single RGB image. According to the team, their solution is the first method to directly estimate 3D body-scene contact from a single image.
"Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging," commented the team. "Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives."