A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-supervised Classification and Clustering | SpringerLink
Skip to main content

A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-supervised Classification and Clustering

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Consistently high data quality is essential for the development of novel loss functions and architectures in the field of deep learning. The existence of such data and labels is usually presumed, while acquiring high-quality datasets is still a major issue in many cases. Subjective annotations by annotators often lead to ambiguous labels in real-world datasets. We propose a data-centric approach to relabel such ambiguous labels instead of implementing the handling of this issue in a neural network. A hard classification is by definition not enough to capture the real-world ambiguity of the data. Therefore, we propose our method “Data-Centric Classification & Clustering (DC3)” which combines semi-supervised classification and clustering. It automatically estimates the ambiguity of an image and performs a classification or clustering depending on that ambiguity. DC3 is general in nature so that it can be used in addition to many Semi-Supervised Learning (SSL) algorithms. On average, our approach yields a 7.6% better F1-Score for classifications and a 7.9% lower inner distance of clusters across multiple evaluated SSL algorithms and datasets. Most importantly, we give a proof-of-concept that the classifications and clusterings from DC3 are beneficial as proposals for the manual refinement of such ambiguous labels. Overall, a combination of SSL with our method DC3 can lead to better handling of ambiguous labels during the annotation process. (Source code is available at https://github.com/Emprime/dc3).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 12583
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 15729
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Addison, P.F.E.E., et al.: A new wave of marine evidence-based management: emerging challenges and solutions to transform monitoring, evaluating, and reporting. ICES J. Mar. Sci. 75(3), 941–952 (2018). https://doi.org/10.1093/icesjms/fsx216

    Article  Google Scholar 

  2. Algan, G., Ulusoy, I.: Image classification with deep learning in the presence of noisy labels: a survey. Knowl.-Based Syst. (2020). https://doi.org/10.1016/j.knosys.2021.106771

    Article  Google Scholar 

  3. Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 5050–5060 (2019)

    Google Scholar 

  4. Beyer, L., Hénaff, O.J., Kolesnikov, A., Zhai, X., van den Oord, A.: Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020)

  5. Brünger, J., Dippel, S., Koch, R., Veit, C.: ‘Tailception’: using neural networks for assessing tail lesions on pictures of pig carcasses. Animal 13(5), 1030–1036 (2019). https://doi.org/10.1017/S1751731118003038

  6. Cai, W., Chen, S., Zhang, D.: A simultaneous learning framework for clustering and classification. Pattern Recogn. 42(7), 1248–1259 (2009). https://doi.org/10.1016/j.patcog.2008.11.029

    Article  MATH  Google Scholar 

  7. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)

    Google Scholar 

  8. Caron, M., Goyal, P., Misra, I., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  9. Cevikalp, H., Benligiray, B., Gerek, O.N.: Semi-supervised robust deep neural networks for multi-label image classification. Pattern Recogn. 100, 107164 (2020). https://doi.org/10.1016/j.patcog.2019.107164

    Article  Google Scholar 

  10. Chapelle, O., Scholkopf, B., Zien, A., Schölkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542 (2006)

    Article  Google Scholar 

  11. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: Advances in Neural Information Processing Systems 33 Pre-Proceedings (NeurIPS 2020) (2020)

    Google Scholar 

  12. Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)

    Google Scholar 

  13. Crawford, K., Paglen, T.: Excavating AI: the politics of images in machine learning training sets. AI Soc. 1–12. https://doi.org/10.1007/s00146-021-01162-8

  14. Culverhouse, P., Williams, R., Reguera, B., Herry, V., González-Gil, S.: Do experts make mistakes? A comparison of human and machine identification of dinoflagellates. Mar. Ecol. Prog. Ser. 247, 17–25 (2003). https://doi.org/10.3354/meps247017

    Article  Google Scholar 

  15. Damm, T., et al.: Artificial intelligence-driven hip fracture prediction based on pelvic radiographs exceeds performance of DXA: the “study of osteoporotic fractures” (SOF). J. Bone Miner. Res. 37, 193–193 (2021)

    Google Scholar 

  16. De Fauw, J., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9), 1342–1350 (2018)

    Article  Google Scholar 

  17. Gao, B.B., Xing, C., Xie, C.W., Wu, J., Geng, X.: Deep label distribution learning with label ambiguity. IEEE Trans. Image Process. 26(6), 2825–2838 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  18. Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems 33 Pre-proceedings (NeurIPS 2020) (2020)

    Google Scholar 

  19. Grossmann, V., Schmarje, L., Koch, R.: Beyond hard labels: investigating data label distributions. arXiv preprint arXiv:2207.06224 (2022)

  20. He, K., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  21. Jenckel, M., Parkala, S.S., Bukhari, S.S., Dengel, A.: Impact of training LSTM-RNN with fuzzy ground truth. In: ICPRAM (2018)

    Google Scholar 

  22. Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9865–9874. No. Iic (2019)

    Google Scholar 

  23. Jungo, A., et al.: On the effect of inter-observer variability for a reliable estimation of uncertainty of medical image segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 682–690. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_77

    Chapter  Google Scholar 

  24. Karimi, D., Nir, G., Fazli, L., Black, P.C., Goldenberg, L., Salcudean, S.E.: Deep learning-based Gleason grading of prostate cancer from histopathology images-role of multiscale decision aggregation and data augmentation. IEEE J. Biomed. Health Inf. 24(5), 1413–1426 (2020). https://doi.org/10.1109/JBHI.2019.2944643

    Article  Google Scholar 

  25. Karimi, D., Dou, H., Warfield, S.K., Gholipour, A.: Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020)

    Article  Google Scholar 

  26. Kim, B., Choo, J., Kwon, Y.D., Joe, S., Min, S., Gwon, Y.: SelfMatch: combining contrastive self-supervision and consistency for semi-supervised learning (NeurIPS) (2021)

    Google Scholar 

  27. Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1920–1929 (2019)

    Google Scholar 

  28. Krizhevsky, A., Hinton, G., Others: Learning multiple layers of features from tiny images. Technical Report (2009)

    Google Scholar 

  29. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 60, pp. 1097–1105. Association for Computing Machinery (2012). https://doi.org/10.1145/3065386

  30. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)

    Google Scholar 

  31. Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 2 (2013)

    Google Scholar 

  32. Li, J., Socher, R., Hoi, S.C.H.: DivideMix: learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations, pp. 1–14 (2020)

    Google Scholar 

  33. der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  34. Menon, A.K., et al.: Disentangling sampling and labeling bias for learning in large-output spaces. In: International Conference on Machine Learning (2021)

    Google Scholar 

  35. Motamedi, M., Sakharnykh, N., Kaldewey, T.: A data-centric approach for training deep neural networks with less data. In: NeurIPS 2021 Data-centric AI Workshop (2021)

    Google Scholar 

  36. Ooms, E.A., et al.: Mammography: interobserver variability in breast density assessment. Breast 16(6), 568–576 (2007). https://doi.org/10.1016/j.breast.2007.04.007

  37. Peikari, M., Salama, S., Nofech-mozes, S., Martel, A.L.: A cluster-then-label semi- supervised learning approach for pathology image classification. Sci. Rep. 1–13 (2018). https://doi.org/10.1038/s41598-018-24876-0

  38. Peterson, J., Battleday, R., Griffiths, T., Russakovsky, O.: Human uncertainty makes classification more robust. In: Proceedings of the IEEE International Conference on Computer Vision 2019-October, pp. 9616–9625 (2019). https://doi.org/10.1109/ICCV.2019.00971

  39. Pham, H., Dai, Z., Xie, Q., Luong, M.T., Le, Q.V.: Meta Pseudo Labels (2020)

    Google Scholar 

  40. Qian, Q., Chen, S., Cai, W.: Simultaneous clustering and classification over cluster structure representation. Pattern Recogn. 45(6), 2227–2236 (2012). https://doi.org/10.1016/j.patcog.2011.11.027

    Article  MATH  Google Scholar 

  41. Santarossa, M., et al.: MedRegNet: unsupervised multimodal retinal-image registration with GANs and ranking loss. In: Medical Imaging 2022: Image Processing, vol. 12032, pp. 321–333. SPIE (2022)

    Google Scholar 

  42. Schmarje, L., Brünger, J., Santarossa, M., Schröder, S.M., Kiko, R., Koch, R.: Fuzzy Overclustering: semi-supervised classification of fuzzy labels with overclustering and inverse cross-entropy. Sensors 21(19), 6661 (2021). https://doi.org/10.3390/s21196661

    Article  Google Scholar 

  43. Schmarje, L., et al.: Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. arXiv preprint arXiv:2207.06214 (2022)

  44. Schmarje, L., Koch, R.: Life is not black and white - combining semi-supervised learning with fuzzy labels. In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (2021)

    Google Scholar 

  45. Schmarje, L., Liao, Y.H., Koch, R.: A data-centric image classification benchmark. In: NeurIPS 2021 Data-centric AI workshop (2021)

    Google Scholar 

  46. Schmarje, L., Zelenka, C., Geisen, U., Glüer, C.-C., Koch, R.: 2D and 3D segmentation of uncertain local collagen fiber orientations in SHG microscopy. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 374–386. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_26

    Chapter  Google Scholar 

  47. Śmieja, M., Struski, Ł., Figueiredo, M.A.T.: A classification-based approach to semi-supervised clustering with pairwise constraints (2020)

    Google Scholar 

  48. Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems 33 Pre-proceedings (NeurIPS 2020) (2020)

    Google Scholar 

  49. Song, H., Kim, M., Park, D., Lee, J.G., Shin, Y., Lee, J.G.: Learning from noisy labels with deep neural networks: a survey. In: IEEE Transactions on Neural Networks and Learning Systems, pp. 1–19 (2022). https://doi.org/10.1109/TNNLS.2022.3152527

  50. Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J.N., Wu, Z., Ding, X.: Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med. Image Anal. 63, 101693 (2020). https://doi.org/10.1016/j.media.2020.101693

    Article  Google Scholar 

  51. Tarling, P., Cantor, M., Clapés, A., Escalera, S.: Deep learning with self-supervision and uncertainty regularization to count fish in underwater images, pp. 1–22 (2021)

    Google Scholar 

  52. Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: ICLR (2017)

    Google Scholar 

  53. Tian, Y., Henaff, O.J., van den Oord, A.: Divide and contrast: self-supervised Learning from uncurated data (2021)

    Google Scholar 

  54. Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision, pp. 268–285 (2020)

    Google Scholar 

  55. Volkmann, N., et al.: So much trouble in the herd: detection of first signs of cannibalism in turkeys. In: Recent Advances in Animal Welfare Science VII Virtual UFAW Animal Welfare Conference, p. 82 (2020)

    Google Scholar 

  56. Volkmann, N., et al.: Learn to train: improving training data for a neural network to detect pecking injuries in turkeys. Animals 2021(11), 1–13 (2021). https://doi.org/10.3390/ani11092655

    Article  Google Scholar 

  57. Volkmann, N., et al.: Keypoint detection for injury identification during turkey husbandry using neural networks. Sensors 22(14), 5188 (2022). https://doi.org/10.3390/s22145188

    Article  Google Scholar 

  58. Wei, Y., Feng, J., Liang, X., Cheng, M.M.: Object region mining with adversarial erasing : a simple classification to object region mining with adversarial. In: CVPR (March), pp. 1568–1576 (2017)

    Google Scholar 

  59. Xie, Q., et al.: Self-training with noisy student improves ImageNet classification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01070

  60. Yun, S., Oh, S.J., Heo, B., Han, D., Choe, J., Chun, S.: Re-labeling ImageNet: from single to multi-labels, from global to localized labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2340–2350 (2021)

    Google Scholar 

  61. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction (2021)

    Google Scholar 

Download references

Acknowledgements

We acknowledge funding of LS by the ARTEMIS project (grant no. 01EC1908E) funded by the Federal Ministry of Education and Research (BMBF), Germany. SMS was funded by BMBF projects CUSCO (grant no. 03F0813D) and MOSAiC (grant no. 03F0917B). RKi was supported via a “Make Our Planet Great Again” grant of the French National Research Agency within the “Programme d’Investissements d’Avenir”; reference “ANR-19-MPGA-0012”. Funding for PlanktonID project were granted to RKi and RKo (CP1733) by the Cluster of Excellence 80 “Future Ocean” within the Excellence Initiative by the Deutsche Forschungsgemeinschaft on behalf of the German federal and state governments. Turkey data set was collected in the project “RedAlert - detection of pecking injuries in turkeys using neural networks” which was supported by the “Animal Welfare Innovation Award” of the “Initiative Tierwohl”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lars Schmarje .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1020 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schmarje, L. et al. (2022). A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-supervised Classification and Clustering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20074-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20073-1

  • Online ISBN: 978-3-031-20074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics