Fashion Image Retrieval with Occlusion | SpringerLink
Skip to main content

Fashion Image Retrieval with Occlusion

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15321))

Included in the following conference series:

  • 228 Accesses

Abstract

With the growth of online fashion platforms and independent content creators, there is a growing interest in visually searching for similar clothing items as shown online. In real-world settings, clothes are often covered by other objects, making retrieval challenging. To make fashion image retrieval more robust, we explore fashion image retrieval with occlusion. We conducted various experiments on the In-shop Clothes Retrieval dataset, a subset of the DeepFashion benchmark. We constructed variations of the dataset with different occlusion types, including various sizes and locations of MSCOCO objects and object masks to simulate realistic occlusion circumstances. We evaluate the zero-shot and fine-tuned performance of the state-of-the-art models on these datasets and observe performance drop. We observe that fine-tuning models on one occluded dataset makes the model more robust to other occlusion types and reduces performance drop. The dataset used in this paper can be found in https://bit.ly/4749Mbo.

J. Sohn, H. Jung, Z. Yan, and V. Masti—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. An, G., Huo, Y., Yoon, S.E.: Hypergraph propagation and community selection for objects retrieval. Adv. Neural. Inf. Process. Syst. 34, 3596–3608 (2021)

    Google Scholar 

  2. An, X., Deng, J., Yang, K., Li, J., Feng, Z., Guo, J., Yang, J., Liu, T.: Unicom: Universal and compact representation learning for image retrieval. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  3. Arandjelović, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: 2012 IEEE conference on computer vision and pattern recognition. pp. 2911–2918. IEEE (2012)

    Google Scholar 

  4. Babenko, A., Lempitsky, V.: Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2055–2063 (2016)

    Google Scholar 

  5. Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers (2021), https://arxiv.org/abs/2104.14294

  6. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: 2007 IEEE 11th International Conference on Computer Vision. pp. 1–8. IEEE (2007)

    Google Scholar 

  7. Corbiere, C., Ben-Younes, H., Ramé, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: Proceedings of the IEEE international conference on computer vision workshops. pp. 2268–2274 (2017)

    Google Scholar 

  8. Deng, J., Guo, J., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 4685–4694 (2018), https://api.semanticscholar.org/CorpusID:8923541

  9. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4690–4699 (2019)

    Google Scholar 

  10. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  11. Ermolov, A., Mirvakhabova, L., Khrulkov, V., Sebe, N., Oseledets, I.: Hyperbolic vision transformers: Combining improvements in metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7409–7419 (2022)

    Google Scholar 

  12. Gordo, A., Almazan, J., Revaud, J., Larlus, D.: End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vision 124(2), 237–254 (2017)

    Article  MathSciNet  Google Scholar 

  13. Gordo, A., Radenovic, F., Berg, T.: Attention-based query expansion learning. In: European Conference on Computer Vision. pp. 172–188. Springer (2020)

    Google Scholar 

  14. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). vol. 2, pp. 1735–1742. IEEE (2006)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

    Google Scholar 

  16. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Mining on manifolds: Metric learning without labels. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 7642–7651 (2018), https://api.semanticscholar.org/CorpusID:4466042

  17. Kan, S., Cen, Y., Li, Y., Mladenovic, V., He, Z.: Relative order analysis and optimization for unsupervised deep metric learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13994–14003 (2021), https://api.semanticscholar.org/CorpusID:235691639

  18. Kim, S., Kim, D., Cho, M., Kwak, S.: Self-taught metric learning without labels. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 7421–7431 (2022), https://api.semanticscholar.org/CorpusID:248512812

  19. Li, L., Zhang, T., Kang, Z., Jiang, X.: Mask-fpan: Semi-supervised face parsing in the wild with de-occlusion and uv gan. Computers & Graphics 116, 185–193 (2023)

    Article  Google Scholar 

  20. Li, Y., Kan, S., He, Z.: Unsupervised deep metric learning with transformed attention consistency and contrastive clustering loss. ArXiv abs/2008.04378 (2020), https://api.semanticscholar.org/CorpusID:221095511

  21. Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft coco: Common objects in context (2015)

    Google Scholar 

  22. Lin, Y.L., Tran, S., Davis, L.S.: Fashion outfit complementary item retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3311–3319 (2020)

    Google Scholar 

  23. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11976–11986 (2022)

    Google Scholar 

  24. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)

    Google Scholar 

  25. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  26. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  27. Naka, R., Katsurai, M., Yanagi, K., Goto, R.: Fashion style-aware embeddings for clothing image retrieval. In: Proceedings of the 2022 International Conference on Multimedia Retrieval. pp. 49–53 (2022)

    Google Scholar 

  28. Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

  29. Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2024), https://openreview.net/forum?id=a68SUt6zFt

  30. Park, S., Shin, M., Ham, S., Choe, S., Kang, Y.: Study on fashion image retrieval methods for efficient fashion visual search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. pp. 0–0 (2019)

    Google Scholar 

  31. Park, S., Lee, H., Yoo, J.H., Kim, G., Kim, S., et al.: Partially occluded facial image retrieval based on a similarity measurement. Mathematical Problems in Engineering 2015 (2015)

    Google Scholar 

  32. Philbin, J., Zisserman, A.: Object mining using a matching graph on very large image collections. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. pp. 738–745. IEEE (2008)

    Google Scholar 

  33. Qian, Q., Shang, L., Sun, B., Hu, J., Li, H., Jin, R.: Softtriple loss: Deep metric learning without triplet sampling. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 6449–6457 (2019), https://api.semanticscholar.org/CorpusID:202558557

  34. Radenović, F., Tolias, G., Chum, O.: Fine-tuning cnn image retrieval with no human annotation. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1655–1668 (2018)

    Article  Google Scholar 

  35. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 815–823 (2015)

    Google Scholar 

  36. Shaker, A.M., Maaz, M., Rasheed, H.A., Khan, S., Yang, M., Khan, F.S.: Swiftformer: Efficient additive attention for transformer-based real-time mobile vision applications. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 17379–17390 (2023), https://api.semanticscholar.org/CorpusID:257766532

  37. Shiau, R., Wu, H.Y., Kim, E., Du, Y.L., Guo, A., Zhang, Z., Li, E., Gu, K., Rosenberg, C., Zhai, A.: Shop the look: Building a large scale visual shopping system at pinterest. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 3203–3212 (2020)

    Google Scholar 

  38. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems 29 (2016)

    Google Scholar 

  39. Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., Wei, Y.: Circle loss: A unified perspective of pair similarity optimization. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6397–6406 (2020), https://api.semanticscholar.org/CorpusID:211296865

  40. Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6105–6114. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/v97/tan19a.html

  41. Tian, Y., Newsam, S., Boakye, K.: Fashion image retrieval with text feedback by additive attention compositional learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1011–1021 (2023)

    Google Scholar 

  42. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention (2021), https://arxiv.org/abs/2012.12877

  43. Tu, C.T., Lee, K.H.: Occluded face recovery by image retrieval. In: 2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS). pp. 1–2. IEEE (2021)

    Google Scholar 

  44. Voo, K.T., Jiang, L., Loy, C.C.: Delving into high-quality synthetic face occlusion segmentation datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4711–4720 (2022)

    Google Scholar 

  45. Yan, C., Yan, K., Zhang, Y., Wan, Y., Zhu, D.: Attribute-guided fashion image retrieval by iterative similarity learning. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). pp. 1–6. IEEE (2022)

    Google Scholar 

  46. Yan, J., Luo, L., Deng, C., Huang, H.: Unsupervised hyperbolic metric learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 12460–12469 (2021), https://api.semanticscholar.org/CorpusID:235693274

  47. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. In: British Machine Vision Conference (2018), https://api.semanticscholar.org/CorpusID:199442350

  48. Zhu, J., Huang, H., Deng, Q.: Fashion image retrieval with multi-granular alignment. arXiv preprint arXiv:2302.08902 (2023)

Download references

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (RS-2022-00143911, AI Excellence Global Innovative Leader Education Program)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jimin Sohn .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3611 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sohn, J., Jung, H., Yan, Z., Masti, V., Li, X., Raj, B. (2025). Fashion Image Retrieval with Occlusion. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15321. Springer, Cham. https://doi.org/10.1007/978-3-031-78305-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78305-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78304-3

  • Online ISBN: 978-3-031-78305-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics