Convolutional Neural Network (CNN) features have been successfully employed in recent works as an image descriptor for various vision tasks. But the inability of the deep CNN features to exhibit invariance to geometric transformations and object compositions poses a great challenge for image search. In this work, we demonstrate the effectiveness of the objectness prior over the deep CNN features of image regions for obtaining an invariant image representation. The proposed approach represents the image as a vector of pooled CNN features describing the underlying objects. This representation provides robustness to spatial layout of the objects in the scene and achieves invariance to general geometric transformations, such as translation, rotation and scaling. Our approach also leads to a compact representation of the scene, making each image to occupy a smaller memory footprint.
Figure 1. Retrieval performance on various databases with the binarized representations using ITQ.
You can download the proposed representations for the four datasets here.
Konda Reddy Mopuri, R. Venkatesh Babu; Object Level Deep Feature Pooling for Compact Image Representation; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015, pp. 62-70 [pdf] [poster] [bibtex]