Mesh R-CNN Detects Objects and Estimates 3D Shape From Images

Researchers from Facebook AI Research have proposed Mesh R-CNN – a neural network model that detects objects in images and outputs a 3D shape in the form of triangle mesh for each object.

The novel system that researchers proposed actually combines the advances in several computer vision tasks. Mesh R-CNN is based on Mask R-CNN neural network model for object detection which it augments with the ability to produce 3D shapes for the detected objects.

Architecture

The input to the proposed method is a single RGB image, similarly as in Mask R-CNN. Also, the architecture of the object detection branch is the same containing a backbone network and a region-proposal network. The novelty is the voxel branch which takes the aligned features and estimates a coarse 3D voxelization of a detected object.

The coarse cubified mesh from the voxel branch is passed through a graph convolution network acting as a mesh refinement branch. This branch outputs the final more precise mesh as a triangle mesh.

Evaluation

Researchers evaluated the proposed method by evaluating the mesh predictor separately on ShapeNet and the full Mesh R-CNN on the Pix3D dataset for 3D shape prediction from natural images. They mention that Mesh R-CNN shows promising results in 3D shape estimation from images.

Mesh R-CNN represents the first model that jointly performs object detection and 3D shape estimation from in-the-wild images. More details about the method can be found in the paper published on arxiv.