ORBIT is a Microsoft dataset for training models to recognize objects from multiple images. ORBIT includes from 1 to 10 videos of 468 objects from everyday life.
Usually object recognition models are trained on datasets with thousands of examples for each category of objects. However, studying new objects with just a few examples can be useful for many new applications. For example, robotics requires systems that can quickly learn new details.
In partnership with the City of London University, Microsoft has developed the ORBIT dataset for teaching models to new objects with just a few examples and a benchmark for evaluating the effectiveness of models to be trained in this mode. The dataset contains 3,822 videos recorded by 77 visually impaired or visually impaired people using their mobile phones. In total, ORBIT contains 2,687,934 frames.
The peculiarity of the dataset lies in its great variability – in total, the participants recorded 486 items of everyday life on video, such as a medical mask, keys and clothes.