logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

VISOR: A New Dataset For Segmenting Hands & Objects in Videos

The dataset includes hundreds of thousands of manual semantic masks of 257 object classes and more than 10M interpolated dense masks.

A team of researchers from the Universities of Bristol, Michigan, and Toronto has introduced EPIC-KITCHENS VISOR, a brand new dataset of pixel annotations capable of segmenting hands and a large variety of active objects in first-person view videos.

Using an AI-powered annotation pipeline, VISOR is able to understand various objects shown in a video, such as human hands, various ingredients, cutlery, and other kitchen-related objects. In total, the dataset includes 271K manual semantic masks of 257 object classes and more than 10M interpolated dense masks, covering 36 hours of 179 untrimmed videos. 

"VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets," comments the team. "Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced, and cooked – where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands."

Click here to learn more about EPIC-KITCHENS VISOR. Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more