In a novel paper, “Unifying Deep Local and Global Features for Image Search”, researchers from Google proposed a new method for image searching that unifies local and global features, enabling effective image representations to be learned and used to solve instance-level recognition problems.
The problem of instance-level recognition is a common task in computer vision, where the goal is to recognize a specific instance of an object and not just it’s class, for example as in classification. However, this means that instance-level recognition is a much harder problem than other problems such as classification for example. Solving this problem by image search is possible in a way that a database is filtered first with global features, then with more fine-grained local features. This, in turn, requires a framework where global and local features are unified and can be extracted effectively.
Researchers propose a new method named DELG which can be trained in an end-to-end manner. This single deep model consists of two branches: the global features branch which extracts global features and uses them to do image retrieval, and a local features branch that uses the extracted local features to re-rank the results from the previous search. Researchers introduce an additional autoencoder for reducing the features’ dimensionality and consequently improve the stability and performance of the training.
For training, researchers used the Google Landmarks Dataset (GLD) with more than 1.2 million images. For validation, they used several datasets: Oxford, Paris, and GLD v2. Researchers performed several ablation studies and analyzed separate components of their proposed framework separately. The obtained results from the experiments showed that the method is able to successfully extract both local and global features in a unified way.