2024-04-30
2024-06-28
2024-06-06
Manuscript received March 15, 2023; revised April 10, 2023; accepted May 8, 2023.
Abstract—Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models. Keywords—deep image retrieval, convolution neural network, feature embedding Cite: Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, and Sathit Prasomphan, "Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval," Journal of Image and Graphics, Vol. 11, No. 4, pp. 359-366, December 2023. Copyright © 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.