FAIR presented CO3D, a dataset containing accurate three-dimensional reconstructions of 19,000 real objects. The dataset is intended for use in augmented reality tasks and in game development.
Common Objects in 3D (CO3D) contains 1.5 million frames from almost 19,000 videos depicting objects from 50 categories of the MS-COCO dataset. CO3D surpasses similar datasets both in terms of the number of categories and the number of objects.
To collect data, FAIR used COLMAP, a photogrammetry framework that requires images of each object taken from different angles. COLMAP creates a three-dimensional reconstruction of an object by tracking the camera position and forming a dense cloud of points that define the object’s surface. After that, FAIR used a semi-automatic active learning algorithm that filters out videos with insufficient accuracy of 3D reconstruction.
These photos were obtained using the crowdsourcing platform Amazon Mechanical Turk. Employees were asked to select an object of a certain category, place it on a hard surface and use a smartphone to record a video, keeping the entire object in view when the smartphone moves in a circle.
The dataset is available here.