Abstract
Consistently high data quality is essential for the development of novel loss functions and architectures in the field of deep learning. The existence of such data and labels is usually presumed, while acquiring high-quality datasets is still a major issue in many cases. Subjective annotations by annotators often lead to ambiguous labels in real-world datasets. We propose a data-centric approach to relabel such ambiguous labels instead of implementing the handling of this issue in a neural network. A hard classification is by definition not enough to capture the real-world ambiguity of the data. Therefore, we propose our method “Data-Centric Classification & Clustering (DC3)” which combines semi-supervised classification and clustering. It automatically estimates the ambiguity of an image and performs a classification or clustering depending on that ambiguity. DC3 is general in nature so that it can be used in addition to many Semi-Supervised Learning (SSL) algorithms. On average, our approach yields a 7.6% better F1-Score for classifications and a 7.9% lower inner distance of clusters across multiple evaluated SSL algorithms and datasets. Most importantly, we give a proof-of-concept that the classifications and clusterings from DC3 are beneficial as proposals for the manual refinement of such ambiguous labels. Overall, a combination of SSL with our method DC3 can lead to better handling of ambiguous labels during the annotation process. (Source code is available at https://github.com/Emprime/dc3).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Addison, P.F.E.E., et al.: A new wave of marine evidence-based management: emerging challenges and solutions to transform monitoring, evaluating, and reporting. ICES J. Mar. Sci. 75(3), 941–952 (2018). https://doi.org/10.1093/icesjms/fsx216
Algan, G., Ulusoy, I.: Image classification with deep learning in the presence of noisy labels: a survey. Knowl.-Based Syst. (2020). https://doi.org/10.1016/j.knosys.2021.106771
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.A.: Mixmatch: a holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems, pp. 5050–5060 (2019)
Beyer, L., Hénaff, O.J., Kolesnikov, A., Zhai, X., van den Oord, A.: Are we done with ImageNet? arXiv preprint arXiv:2006.07159 (2020)
Brünger, J., Dippel, S., Koch, R., Veit, C.: ‘Tailception’: using neural networks for assessing tail lesions on pictures of pig carcasses. Animal 13(5), 1030–1036 (2019). https://doi.org/10.1017/S1751731118003038
Cai, W., Chen, S., Zhang, D.: A simultaneous learning framework for clustering and classification. Pattern Recogn. 42(7), 1248–1259 (2009). https://doi.org/10.1016/j.patcog.2008.11.029
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
Caron, M., Goyal, P., Misra, I., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In: Proceedings of Advances in Neural Information Processing Systems (NeurIPS) (2020)
Cevikalp, H., Benligiray, B., Gerek, O.N.: Semi-supervised robust deep neural networks for multi-label image classification. Pattern Recogn. 100, 107164 (2020). https://doi.org/10.1016/j.patcog.2019.107164
Chapelle, O., Scholkopf, B., Zien, A., Schölkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542 (2006)
Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. In: Advances in Neural Information Processing Systems 33 Pre-Proceedings (NeurIPS 2020) (2020)
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 215–223 (2011)
Crawford, K., Paglen, T.: Excavating AI: the politics of images in machine learning training sets. AI Soc. 1–12. https://doi.org/10.1007/s00146-021-01162-8
Culverhouse, P., Williams, R., Reguera, B., Herry, V., González-Gil, S.: Do experts make mistakes? A comparison of human and machine identification of dinoflagellates. Mar. Ecol. Prog. Ser. 247, 17–25 (2003). https://doi.org/10.3354/meps247017
Damm, T., et al.: Artificial intelligence-driven hip fracture prediction based on pelvic radiographs exceeds performance of DXA: the “study of osteoporotic fractures” (SOF). J. Bone Miner. Res. 37, 193–193 (2021)
De Fauw, J., et al.: Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24(9), 1342–1350 (2018)
Gao, B.B., Xing, C., Xie, C.W., Wu, J., Geng, X.: Deep label distribution learning with label ambiguity. IEEE Trans. Image Process. 26(6), 2825–2838 (2017)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems 33 Pre-proceedings (NeurIPS 2020) (2020)
Grossmann, V., Schmarje, L., Koch, R.: Beyond hard labels: investigating data label distributions. arXiv preprint arXiv:2207.06224 (2022)
He, K., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Jenckel, M., Parkala, S.S., Bukhari, S.S., Dengel, A.: Impact of training LSTM-RNN with fuzzy ground truth. In: ICPRAM (2018)
Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9865–9874. No. Iic (2019)
Jungo, A., et al.: On the effect of inter-observer variability for a reliable estimation of uncertainty of medical image segmentation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 682–690. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00928-1_77
Karimi, D., Nir, G., Fazli, L., Black, P.C., Goldenberg, L., Salcudean, S.E.: Deep learning-based Gleason grading of prostate cancer from histopathology images-role of multiscale decision aggregation and data augmentation. IEEE J. Biomed. Health Inf. 24(5), 1413–1426 (2020). https://doi.org/10.1109/JBHI.2019.2944643
Karimi, D., Dou, H., Warfield, S.K., Gholipour, A.: Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020)
Kim, B., Choo, J., Kwon, Y.D., Joe, S., Min, S., Gwon, Y.: SelfMatch: combining contrastive self-supervision and consistency for semi-supervised learning (NeurIPS) (2021)
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1920–1929 (2019)
Krizhevsky, A., Hinton, G., Others: Learning multiple layers of features from tiny images. Technical Report (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 60, pp. 1097–1105. Association for Computing Machinery (2012). https://doi.org/10.1145/3065386
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. In: International Conference on Learning Representations (2017)
Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 2 (2013)
Li, J., Socher, R., Hoi, S.C.H.: DivideMix: learning with noisy labels as semi-supervised learning. In: International Conference on Learning Representations, pp. 1–14 (2020)
der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Menon, A.K., et al.: Disentangling sampling and labeling bias for learning in large-output spaces. In: International Conference on Machine Learning (2021)
Motamedi, M., Sakharnykh, N., Kaldewey, T.: A data-centric approach for training deep neural networks with less data. In: NeurIPS 2021 Data-centric AI Workshop (2021)
Ooms, E.A., et al.: Mammography: interobserver variability in breast density assessment. Breast 16(6), 568–576 (2007). https://doi.org/10.1016/j.breast.2007.04.007
Peikari, M., Salama, S., Nofech-mozes, S., Martel, A.L.: A cluster-then-label semi- supervised learning approach for pathology image classification. Sci. Rep. 1–13 (2018). https://doi.org/10.1038/s41598-018-24876-0
Peterson, J., Battleday, R., Griffiths, T., Russakovsky, O.: Human uncertainty makes classification more robust. In: Proceedings of the IEEE International Conference on Computer Vision 2019-October, pp. 9616–9625 (2019). https://doi.org/10.1109/ICCV.2019.00971
Pham, H., Dai, Z., Xie, Q., Luong, M.T., Le, Q.V.: Meta Pseudo Labels (2020)
Qian, Q., Chen, S., Cai, W.: Simultaneous clustering and classification over cluster structure representation. Pattern Recogn. 45(6), 2227–2236 (2012). https://doi.org/10.1016/j.patcog.2011.11.027
Santarossa, M., et al.: MedRegNet: unsupervised multimodal retinal-image registration with GANs and ranking loss. In: Medical Imaging 2022: Image Processing, vol. 12032, pp. 321–333. SPIE (2022)
Schmarje, L., Brünger, J., Santarossa, M., Schröder, S.M., Kiko, R., Koch, R.: Fuzzy Overclustering: semi-supervised classification of fuzzy labels with overclustering and inverse cross-entropy. Sensors 21(19), 6661 (2021). https://doi.org/10.3390/s21196661
Schmarje, L., et al.: Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation. arXiv preprint arXiv:2207.06214 (2022)
Schmarje, L., Koch, R.: Life is not black and white - combining semi-supervised learning with fuzzy labels. In: Proceedings of the Conference "Lernen, Wissen, Daten, Analysen" (2021)
Schmarje, L., Liao, Y.H., Koch, R.: A data-centric image classification benchmark. In: NeurIPS 2021 Data-centric AI workshop (2021)
Schmarje, L., Zelenka, C., Geisen, U., Glüer, C.-C., Koch, R.: 2D and 3D segmentation of uncertain local collagen fiber orientations in SHG microscopy. In: Fink, G.A., Frintrop, S., Jiang, X. (eds.) DAGM GCPR 2019. LNCS, vol. 11824, pp. 374–386. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33676-9_26
Śmieja, M., Struski, Ł., Figueiredo, M.A.T.: A classification-based approach to semi-supervised clustering with pairwise constraints (2020)
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Advances in Neural Information Processing Systems 33 Pre-proceedings (NeurIPS 2020) (2020)
Song, H., Kim, M., Park, D., Lee, J.G., Shin, Y., Lee, J.G.: Learning from noisy labels with deep neural networks: a survey. In: IEEE Transactions on Neural Networks and Learning Systems, pp. 1–19 (2022). https://doi.org/10.1109/TNNLS.2022.3152527
Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J.N., Wu, Z., Ding, X.: Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. Med. Image Anal. 63, 101693 (2020). https://doi.org/10.1016/j.media.2020.101693
Tarling, P., Cantor, M., Clapés, A., Escalera, S.: Deep learning with self-supervision and uncertainty regularization to count fish in underwater images, pp. 1–22 (2021)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: ICLR (2017)
Tian, Y., Henaff, O.J., van den Oord, A.: Divide and contrast: self-supervised Learning from uncurated data (2021)
Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: Scan: learning to classify images without labels. In: Proceedings of the European Conference on Computer Vision, pp. 268–285 (2020)
Volkmann, N., et al.: So much trouble in the herd: detection of first signs of cannibalism in turkeys. In: Recent Advances in Animal Welfare Science VII Virtual UFAW Animal Welfare Conference, p. 82 (2020)
Volkmann, N., et al.: Learn to train: improving training data for a neural network to detect pecking injuries in turkeys. Animals 2021(11), 1–13 (2021). https://doi.org/10.3390/ani11092655
Volkmann, N., et al.: Keypoint detection for injury identification during turkey husbandry using neural networks. Sensors 22(14), 5188 (2022). https://doi.org/10.3390/s22145188
Wei, Y., Feng, J., Liang, X., Cheng, M.M.: Object region mining with adversarial erasing : a simple classification to object region mining with adversarial. In: CVPR (March), pp. 1568–1576 (2017)
Xie, Q., et al.: Self-training with noisy student improves ImageNet classification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.01070
Yun, S., Oh, S.J., Heo, B., Han, D., Choe, J., Chun, S.: Re-labeling ImageNet: from single to multi-labels, from global to localized labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2340–2350 (2021)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction (2021)
Acknowledgements
We acknowledge funding of LS by the ARTEMIS project (grant no. 01EC1908E) funded by the Federal Ministry of Education and Research (BMBF), Germany. SMS was funded by BMBF projects CUSCO (grant no. 03F0813D) and MOSAiC (grant no. 03F0917B). RKi was supported via a “Make Our Planet Great Again” grant of the French National Research Agency within the “Programme d’Investissements d’Avenir”; reference “ANR-19-MPGA-0012”. Funding for PlanktonID project were granted to RKi and RKo (CP1733) by the Cluster of Excellence 80 “Future Ocean” within the Excellence Initiative by the Deutsche Forschungsgemeinschaft on behalf of the German federal and state governments. Turkey data set was collected in the project “RedAlert - detection of pecking injuries in turkeys using neural networks” which was supported by the “Animal Welfare Innovation Award” of the “Initiative Tierwohl”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schmarje, L. et al. (2022). A Data-Centric Approach for Improving Ambiguous Labels with Combined Semi-supervised Classification and Clustering. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-20074-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)