Multi-label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Zhou, Ao; Liu, Bin; Peng, Zhaoyang; Wang, Jin; Tsoumakas, Grigorios

doi:10.1007/978-3-031-70362-1_16

Ao Zhou¹³,
Bin Liu¹³,
Zhaoyang Peng¹³,
Jin Wang¹³ &
…
Grigorios Tsoumakas¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14945))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

518 Accesses

Abstract

Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected with equal probability when constructing mini-batches. However, the intrinsic class imbalance in multi-label data may bias the model towards majority labels, since samples relevant to minority labels may be underrepresented in each mini-batch. Meanwhile, during the training process, we observe that instances associated with minority labels tend to induce greater losses. Existing heuristic batch selection methods, such as priority selection of samples with high contribution to the objective function, i.e., samples with high loss, have been proven to accelerate convergence while reducing the loss and test error in single-label data. However, batch selection methods have not yet been applied and validated in multi-label data. In this study, we introduce a simple yet effective adaptive batch selection algorithm tailored to multi-label deep learning models. It adaptively selects each batch by prioritizing hard samples related to minority labels. A variant of our method also takes informative label correlations into consideration. Comprehensive experiments combining five multi-label deep learning models on thirteen benchmark datasets show that our method converges faster and performs better than random batch selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Foster Adaptivity and Balance in Learning with Noisy Labels

Missing label imputation through inception-based semi-supervised ensemble learning

Article 17 December 2021

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

Notes

1.
\(\left\lceil \pi \right\rceil \) is upward rounding function.
2.
\(\textbf{A} \in \mathbb {R}^{q\times q}\) is the symmetric conditional probability matrix, which measures the co-occurrence relationship between labels, and the definition can be found in the supplementary materials.

References

Bai, J., Kong, S., Gomes, C.: Disentangled variational autoencoder based multi-label classification with covariance-aware multivariate probit model. In: IJCAI, pp. 4313–4321 (2021)
Google Scholar
Benavoli, A., Corani, G., Mangili, F.: Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res. 17(1), 152–161 (2016)
MathSciNet Google Scholar
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Chakraborty, S., Balasubramanian, V., Panchanathan, S.: Optimal batch selection for active learning in multi-label classification. In: ACMMM, pp. 1413–1416 (2011)
Google Scholar
Chang, H.S., Learned-Miller, E., McCallum, A.: Active bias: training more accurate neural networks by emphasizing high variance samples. In: NeurIPS, vol. 30 (2017)
Google Scholar
Chen, B., Wornell, G.W.: Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory 47(4), 1423–1443 (2001)
Article MathSciNet Google Scholar
Chen, S., Wang, R., Lu, J., Wang, X.: Stable matching-based two-way selection in multi-label active learning with imbalanced data. Inf. Sci. 610, 281–299 (2022)
Article Google Scholar
Daniels, Z., Metaxas, D.: Addressing imbalance in multi-label classification using structured hellinger forests. In: AAAI, vol. 31 (2017)
Google Scholar
Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73, 133–153 (2008)
Article Google Scholar
Gerych, W., Hartvigsen, T., Buquicchio, L., Agu, E., Rundensteiner, E.A.: Recurrent Bayesian classifier chains for exact multi-label classification. In: NeurIPS, vol. 34, pp. 15981–15992 (2021)
Google Scholar
Hang, J.Y., Zhang, M.L.: Collaborative learning of label semantics and deep label-specific features for multi-label classification. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9860–9871 (2021)
Article Google Scholar
Hang, J.Y., Zhang, M.L.: Dual perspective of label-specific feature learning for multi-label classification. In: ICML, pp. 8375–8386 (2022)
Google Scholar
Hang, J.Y., Zhang, M.L., Feng, Y., Song, X.: End-to-end probabilistic label-specific feature learning for multi-label classification. In: AAAI, vol. 36, pp. 6847–6855 (2022)
Google Scholar
Huang, Y., et al.: Improving face recognition from hard samples via distribution distillation loss. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 138–154. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_9
Chapter Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl.-Based Syst. 89, 385–397 (2015)
Article Google Scholar
Pereira, R.M., Costa, Y.M., Silla, C.N., Jr.: MLTL: a multi-label approach for the tomek link undersampling algorithm. Neurocomputing 383, 95–105 (2020)
Article Google Scholar
Zhang, Y., Kang, B., Hooi, B., Yan, S., Feng, J.: Deep long-tailed learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 45(9), 10795–10816 (2023)
Article Google Scholar
Jiang, T., Wang, D., Sun, L., Yang, H., Zhao, Z., Zhuang, F.: Lightxml: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In: AAAI, pp. 7987–7994 (2021)
Google Scholar
Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: ICML, pp. 2525–2534 (2018)
Google Scholar
Liu, B., Blekas, K., Tsoumakas, G.: Multi-label sampling based on local label imbalance. Pattern Recogn. 122, 108–294 (2022)
Article Google Scholar
Liu, B., Tsoumakas, G.: Dealing with class imbalance in classifier chains via random undersampling. Knowl.-Based Syst. 192, 105–292 (2020)
Article Google Scholar
Liu, W., Wang, H., Shen, X., Tsang, I.W.: The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7955–7974 (2021)
Article Google Scholar
Liu, Y., et al.: Hard sample aware network for contrastive deep graph clustering. In: AAAI, vol. 37, pp. 8914–8922 (2023)
Google Scholar
Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. In: ICLR Workshop (2016)
Google Scholar
Nguyen, H.D., Vu, X.S., Le, D.T.: Modular graph transformer networks for multi-label image classification. In: AAAI, pp. 9092–9100 (2021)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)
Article MathSciNet Google Scholar
Ridnik, T., et al.: Asymmetric loss for multi-label classification. In: CVPR, pp. 82–91 (2021)
Google Scholar
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: ECML-PKDD, pp. 145–158 (2011)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: CVPR, pp. 761–769 (2016)
Google Scholar
Song, H., Kim, M., Kim, S., Lee, J.G.: Carpe diem, seize the samples uncertain “at the moment” for adaptive batch selection. In: CIKM, pp. 1385–1394 (2020)
Google Scholar
Song, H., Kim, S., Kim, M., Lee, J.G.: Ada-boundary: accelerating DNN training via adaptive boundary batch selection. Mach. Learn. 109, 1837–1853 (2020)
Article MathSciNet Google Scholar
Tarekegn, A.N., Giacobini, M., Michalak, K.: A review of methods for imbalanced multi-label classification. Pattern Recogn. 118, 107–125 (2021)
Article Google Scholar
Teng, Z., Cao, P., Huang, M., Gao, Z., Wang, X.: Multi-label borderline oversampling technique. Pattern Recogn. 145, 109–123 (2024)
Article Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
MathSciNet Google Scholar
Tsoumakas, G., Vlahavas, I.: Random k-labelsets: an ensemble method for multilabel classification. In: ECML, pp. 406–417 (2007)
Google Scholar
Yeh, C.K., Wu, W.C., Ko, W.J., Wang, Y.C.F.: Learning deep latent space for multi-label classification. In: AAAI, vol. 31 (2017)
Google Scholar
Zhang, K., et al.: Label correlation guided borderline oversampling for imbalanced multi-label data learning. Knowl.-Based Syst. 279, 110–138 (2023)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2013)
Article Google Scholar
Zhao, W., Kong, S., Bai, J., Fink, D., Gomes, C.: Hot-VAE: learning high-order label correlation for multi-label classification via attention-based variational autoencoders. In: AAAI, vol. 35, pp. 15016–15024 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62302074) and the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202300631).

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, China
Ao Zhou, Bin Liu, Zhaoyang Peng & Jin Wang
School of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Authors

Ao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoyang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Tsoumakas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Liu .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, A., Liu, B., Peng, Z., Wang, J., Tsoumakas, G. (2024). Multi-label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14945. Springer, Cham. https://doi.org/10.1007/978-3-031-70362-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-70362-1_16
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70361-4
Online ISBN: 978-3-031-70362-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Multi-label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Foster Adaptivity and Balance in Learning with Noisy Labels

Missing label imputation through inception-based semi-supervised ensemble learning

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Multi-label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Foster Adaptivity and Balance in Learning with Noisy Labels

Missing label imputation through inception-based semi-supervised ensemble learning

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation