Convolutional Neural Network (CNN) features have been successfully employed in recent works as an image descriptor for various vision tasks. But the inability of the deep CNN features to exhibit invariance to geometric transformations and object compositions poses a great challenge for image search. In this work, we demonstrate the effectiveness of the objectness prior over the deep CNN features of image regions for obtaining an invariant image representation. The proposed approach represents the image as a vector of pooled CNN features describing the underlying objects. This representation provides robustness to spatial layout of the objects in the scene and achieves invariance to general geometric transformations, such as translation, rotation and scaling. Our approach also leads to a compact representation of the scene, making each image to occupy a smaller memory footprint.


Experimental Results

We have experimented with Holidays [ICCV 2008], Oxford5K [CVPR 2007], Paris6K [CVPR 2008] and UKB [CVPR 2006] datasets. The following figures and tables present the results and comparison with the existing works (please note that the references in the tables are w.r.t the CVPRW 2015 paper).

Figure 1. Retrieval performance on various databases with the binarized representations using ITQ.


You can download the proposed representations for the four datasets here.

Related Publications

Konda Reddy Mopuri, R. Venkatesh Babu; Object Level Deep Feature Pooling for Compact Image Representation; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2015, pp. 62-70 [pdf] [poster] [bibtex]