As part of a larger project aimed to improve and bring accurate 3D object detection on mobile devices, researchers from Google announced the release of large-scale video dataset with 3D bounding box annotations.
The novel, dataset called Objectron contains more than 15 thousand object-centric short video clips, annotated with the 3D bounding box of the object of interest. The main idea behind creating the Objectron dataset is the difficulty of learning and understanding 3D objects in space in 2D images (or videos). In order to better tackle this problem, researchers decided to build a more informative, special dataset that would contain information about the 3D objects’ structure and that would be able to support the training of more accurate machine learning models.
Objectron is aimed to address the abovementioned problem by incorporating a rich data labeling of the set of collected short videos. Each video clip in Objectron contains a salient object which is usually centered and it shows this object from different angles while the camera is moving and rotating. Researchers collected metadata such as camera positions along with sparse point clouds coupled with the video clips, and they also manually annotated a 3D bounding box of the object for every frame. This information is supposed to enable researchers to develop more robust 3D object detection models.
Today, together with the dataset, researchers released a 3D object detection model that can detect 4 object categories: shoes, chairs, mugs and cameras. The architecture of this model is depicted in the image below. In addition to the model, researchers developed a new evaluation metric for 3D object detection based on a generalized form of 3D IoU (intersection-over-union) metric.
Researchers open-sourced the dataset along with several other things such as shuffled (processed) dataset, supporting scripts for data loading, and scripts for evaluation using the novel evaluation metric. The 3D object detection model was also open-sourced as part of Google’s open-source framework MediaPipe.