Researchers from the National University of Defense Technology, Changsha in China, the Chinese Academy of Sciences and Inception Institute of Artificial Intelligence have proposed a novel method for object detection based on semantic feature detection.
Object detection is one of the most popular tasks within Computer Vision in the past decade. With the rise of deep learning and especially Convolutional Neural Networks, a number of object detection methods were proposed and solved this task to a satisfactory level.
Generally these methods rely on region proposals (like in Region Proposal Networks) and anchor boxes (Fast R-CNN, Faster R-CNN). The novel method provides a new perspective, where the problem of object detection is mapped to the problem of high-level semantic feature detection.
In fact, researchers propose to scan the image looking for high-level feature points (similarly like CNNs are detecting edges, corners, and blobs). The proposed method predicts center points as well as the scale of objects in an anchor-free setting.
Researchers proposed a simple convolutional neural network as feature extractor, which perform object detection by predicting center and scale heatmaps in the last layers.
The network is trained in a supervised manner, with implicit supervision signal coming from a bounding-box object detector. To put it simply, researchers used bounding-box detections to extract center and scale ground truth and use that for supervised learning.
The architecture of the network is shown in the diagram. The evaluations showed that the method presents competitive results in terms of accuracy and good speed, tested on pedestrian detection benchmark datasets. The code was open sourced and is available on Github. More details about the method can be read in the pre-print paper published on arxiv.