Abstract
Computer vision is driven by the many datasets available for training or evaluating novel methods. However, each dataset has a different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols, etc. In this paper we explore the automatic discovery of visual-semantic relations between labels across datasets. We aim to understand how instances of a certain class in a dataset relate to the instances of another class in another dataset. Are they in an identity, parent/child, overlap relation? Or is there no link between them at all? To find relations between labels across datasets, we propose methods based on language, on vision, and on their combination. We show that we can effectively discover label relations across datasets, as well as their type. We apply our method to four applications: understand label relations, identify missing aspects, increase label specificity, and predict transfer learning gains. We conclude that label relations cannot be established by looking at the names of classes alone, as they depend strongly on how each of the datasets was constructed.
J. Uijlings and T. Mensink—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An instance is either a single object (for thing classes, e.g. cat, car), or the union of all regions of a stuff class (e.g. grass, water), following the panoptic definition [13].
- 2.
References
Robust vision challenge. http://www.robustvision.net/
Bevandić, P., Oršić, M., Grubišić, I., Šarić, J., Šegvić, S.: Multi-domain semantic segmentation with overlapping labels. In: Proceedings of the WACV (2022)
Bucher, M., Vu, T., Cord, M., Pérez, P.: Zero-shot semantic segmentation. In: NeurIPS (2019)
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff dataset (2018). http://calvin.inf.ed.ac.uk/datasets/coco-stuff
Caesar, H., Uijlings, J., Ferrari, V.: COCO-stuff: thing and stuff classes in context. In: CVPR (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Everingham, M., Eslami, S., van Gool, L., Williams, C., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015). https://doi.org/10.1007/s11263-014-0733-5
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Ghiasi, G., Gu, X., Cui, Y., Lin, T.: Open-vocabulary image segmentation. Technical report, ArXiV (2021)
Google: Wiki words 500 with normalization - a 500 dimensional wor2vec skip-gram model trained on English Wikipedia. https://tfhub.dev/google/Wiki-words-500-with-normalization/2
Jia, C., et al.: Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML (2021)
Kirillov, A.: Panoptic challenge intro. COCO+Mapillary Joint Recognition Challenge Workshop. http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Overview.pdf
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)
Kokkinos, I.: UberNet: training a ‘universal’ CNN for low-, mid-, and high-level vision using diverse datasets and limited memory. In: CVPR (2017)
Kuznetsova, A., et al.: The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. IJCV 128, 1956–1981 (2020). https://doi.org/10.1007/s11263-020-01316-z
Lambert, J., Liu, Z., Sener, O., Hays, J., Koltun, V.: MSeg: a composite dataset for multi-domain semantic segmentation. In: CVPR (2020)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
McInnes, L., Healy, J., Saul, N., Grossberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018)
Mensink, T., Uijlings, J., Kuznetsova, A., Gygli, M., Ferrari, V.: Factors of influence for transfer learning across diverse appearance domains and task types. IEEE Trans. PAMI (2021)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Ponce, J., et al.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006). https://doi.org/10.1007/11957959_2
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NeurIPS (2017)
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Torralba, A., Efros, A.: An unbiased look on dataset bias. In: CVPR (2011)
Triantafillou, E., et al.: Meta-dataset: a dataset of datasets for learning to learn from few examples. In: ICLR (2020)
Wang, J., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. PAMI 43(10), 3349–3364 (2020)
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from Abbey to Zoo. In: CVPR (2010)
Xiao, J., Owens, A., Torralba, A.: SUN3D: a database of big spaces reconstructed using SfM and object labels. In: ICCV (2013)
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)
Zendel, O., Honauer, K., Murschitz, M., Humenberger, M., Fernandez Dominguez, G.: Analyzing computer vision data - the good, the bad and the ugly. In: CVPR (2017)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: CVPR (2017)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: NeurIPS (2014)
Zhou, X., Koltun, V., Krähenbühl, P.: Simple multi-dataset detection. In: CVPR (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Uijlings, J., Mensink, T., Ferrari, V. (2022). The Missing Link: Finding Label Relations Across Datasets. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-20074-8_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)