VISOR: A New Dataset For Segmenting Hands & Objects in Videos

The dataset includes hundreds of thousands of manual semantic masks of 257 object classes and more than 10M interpolated dense masks.

A team of researchers from the Universities of Bristol, Michigan, and Toronto has introduced EPIC-KITCHENS VISOR, a brand new dataset of pixel annotations capable of segmenting hands and a large variety of active objects in first-person view videos.

Using an AI-powered annotation pipeline, VISOR is able to understand various objects shown in a video, such as human hands, various ingredients, cutlery, and other kitchen-related objects. In total, the dataset includes 271K manual semantic masks of 257 object classes and more than 10M interpolated dense masks, covering 36 hours of 179 untrimmed videos.

"VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets," comments the team. "Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced, and cooked – where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands."

Click here to learn more about EPIC-KITCHENS VISOR. Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Built for Creators. Read by the Best

Partner with 80 Level

Comments

0

Leave Comment

Built for Creators. Read by the Best

Partner with 80 Level

Comments

0

We need your consent