Handling imbalance in hierarchical classification problems using local classifiers approaches | Data Mining and Knowledge Discovery Skip to main content
Log in

Handling imbalance in hierarchical classification problems using local classifiers approaches

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The task of learning from imbalanced datasets has been widely investigated in the binary, multi-class and multi-label classification scenarios. Although this problem also affects hierarchical datasets, there are few work in the literature dealing with it. Meanwhile, the local classifier approaches are the most used techniques in the literature to deal with Hierarchical Classification problems. In this paper, we present new ways to handle data imbalance in hierarchical classification problems when using local classifiers approaches. We propose three different resampling schemas, according to the local classification approach: (1) Local Classifiers per Node; (2) Local Classifiers per Parent Node; and (3) Local Classifiers per Level. In order to define how imbalanced a certain hierarchical dataset is, we also propose three novel metrics to measure the imbalance in hierarchical datasets considering the different local classification approaches. The experimental evaluation in eight well-known datasets showed that the imbalance metrics can indeed measure the datasets imbalance and the proposed resampling schemas are able to improve the classification results when compared to baselines, state-of-the-art and related work approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Available at http://sites.labic.icmc.usp.br/jeanmetz/datasets.html.

  2. Available at https://github.com/mdeff/fma.

  3. Available at https://www.imageclef.org/2009/medanno.

  4. Available at http://lshtc.iit.demokritos.gr/.

  5. Available at https://dtai.cs.kuleuven.be/clus/.

  6. Available at https://cs.gmu.edu/~mlbio/HierCost/.

  7. Available at http://scikit-learn.org/.

  8. Available at https://github.com/tsoumakas/mulan/.

  9. Available at https://github.com/scikit-learn-contrib/imbalanced-learn.

  10. Available at https://github.com/rodolfomp123/imb-mulan.

References

  • Ariyaratne HB, Zhang D (2012) A novel automatic hierachical approach to music genre classification. In: Proceedings of the IEEE international conference on multimedia and expo workshops, pp 564–569

  • Bader-El-Den M, Teitei E, Perry T (2018) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst

  • Bannour H, Hudelot C (2012) Hierarchical image annotation using semantic hierarchies. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 2431–2434

  • Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20–29

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  • Bennett PN, Nguyen N (2009) Refined experts: improving classification in large taxonomies. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pp 11–18

  • Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305

    MathSciNet  MATH  Google Scholar 

  • Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority oversampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining, Bangkok, Thailand, pp 475–482

  • Castellanos FJ, Valero-Mas JJ, Calvo-Zaragoza J, Rico-Juan JR (2018) Oversampling imbalanced data in the string space. Pattern Recogn Lett 103:32–38

    Article  Google Scholar 

  • Cesa-Bianchi N, Valentini G (2009) Hierarchical cost-sensitive algorithms for genome-wide gene function prediction. In: Machine learning in systems biology, pp 14–29

  • Cesa-Bianchi N, Re M, Valentini G (2012) Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference. Mach Learn 88(1–2):209–241

    Article  MathSciNet  MATH  Google Scholar 

  • Charte F, Rivera A, del Jesus MJ, Herrera F (2013) A first approach to deal with imbalance in multi-label datasets. In: Proceedings of the international conference on hybrid artificial intelligence systems, pp 150–160

  • Charte F, Rivas AJR, del Jesus M, Herrera F (2014) MLeNN: a first approach to heuristic multilabel undersampling. In: Proceedings of the international conference on intelligent data engineering and automated learning, pp 1–9

  • Charte F, Rivera A, del Jesus M, Herrera F (2015a) Addressing imbalance in multilabel classification: measures and random resampling algorithms. J Neurocomputing 163:3–16

  • Charte F, Rivera A, del Jesus M, Herrera F (2015b) MLSMOTE: approaching imbalanced multilabel learning through synthetic instance generation. Knowl Based Syst 89:385–397

  • Charuvaka A, Rangwala H (2015) Hiercost: improving large scale hierarchical classification with cost sensitive learning. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 675–690

  • Chawla N, Bowyer K, Hall L, Kegelmeyer P (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  • Chen B, Hu J (2010) Hierarchical multi-label classification incorporating prior information for gene function prediction. In: 2010 10th International conference on intelligent systems design and applications. IEEE, pp 231–236

  • Chen B, Hu J (2012) Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction. IEEJ Trans Electr Electron Eng 7(2):183–189

    Article  Google Scholar 

  • Chen B, Duan L, Hu J (2012) Composite kernel based SVM for hierarchical multi-label gene function classification. In: Proceedings of the international joint conference on neural networks (IJCNN). IEEE, pp 1–6

  • Cieslak DA, Hoens TR, Chawla NV, Kegelmeyer WP (2012) Hellinger distance decision trees are robust and skew-insensitive. Data Min Knowl Disc 24(1):136–158

    Article  MathSciNet  MATH  Google Scholar 

  • Colonna JG, Gama J, Nakamura EF (2018) A comparison of hierarchical multi-output recognition approaches for anuran classification. Mach Learn 107(11):1651–1671

    Article  MathSciNet  MATH  Google Scholar 

  • Defferrard M, Benzi K, Vandergheynst P, Bresson X (2017) FMA: A dataset for music analysis. In: Proceedings of the international society for music information retrieval conference, Suzhou, China, pp 316–323

  • Diamantini C, Potena D (2009) Bayes vector quantizer for class-imbalance problem. IEEE Trans Knowl Data Eng 21(5):638–651

    Article  Google Scholar 

  • Dimitrovski I, Kocev D, Loskovska S, Dzeroski S (2011) Hierarchical annotation of medical images. Pattern Recogn 44(10):2436–2449

    Article  Google Scholar 

  • Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp 256–263

  • Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    Article  MathSciNet  MATH  Google Scholar 

  • Fagni T, Sebastiani F (2007) On the selection of negative examples for hierarchical text categorization. In: Proceedings of the language & technology conference, pp 24–28

  • Fernández A, LóPez V, Galar M, Del Jesus MJ, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110

    Article  Google Scholar 

  • García-Pedrajas N, Pérez-Rodríguez J, García-Pedrajas M, Ortiz-Boyer D, Fyfe C (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25(1):22–34

    Article  Google Scholar 

  • Gopal S, Yang Y (2015) Hierarchical Bayesian inference and recursive regularization for large-scale classification. ACM Trans Knowl Discov Data 9(3):1–23

    Article  Google Scholar 

  • Ha-Thuc V, Renders JM (2011) Large-scale hierarchical text classification without labelled data. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 685–694

  • Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  • Han H, Wang WY, Mao BH (2005) Borderline-smote: a new oversampling method in imbalanced datasets learning. In: International conference on intelligent computing. Hefei, China, pp 878–887

  • Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  • Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Adv Neural Inf Process Syst 11(1):507–513

    MATH  Google Scholar 

  • He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference neural networks, Hong Kong, pp 1322–1328

  • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449

    Article  MATH  Google Scholar 

  • Jeni LA, Cohn JF, De La Torre F (2013) Facing imbalanced data: recommendations for the use of performance metrics. In: Proceedings of the humaine association conference on affective computing and intelligent interaction, pp 245–251

  • Jung SH, Bang H, Young S (2005) Sample size calculation for multiple testing in microarray data analysis. Biostatistics 6(1):157–169

    Article  MATH  Google Scholar 

  • Kiritchenko S, Matwin S, Famili F (2005) Functional annotation of genes using hierarchical text categorization. In: Proceedings of the ACL workshop on linking biological literature, Detroit, USA

  • Kocev D, Vens C, Struyf J, Džeroski S (2013) Tree ensembles for predicting structured outputs. Pattern Recogn 46(3):817–833

    Article  Google Scholar 

  • Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232

    Article  Google Scholar 

  • Kumar S, Rowley HA, Wang X, Rodrigues JJM (2015) Hierarchical classification in credit card data extraction. US Patent 9,213,907

  • Li D, Ju Y, Zou Q (2016) Protein folds prediction with hierarchical structured SVM. Curr Proteom 13(2):79–85

    Article  Google Scholar 

  • Mani I, Zhang I (2003) knn approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, Washington DC, USA, vol 126

  • McNamara DS, Crossley SA, Roscoe RD, Allen LK, Dai J (2015) A hierarchical classification approach to automated essay scoring. Assess Writ 23:35–59

    Article  Google Scholar 

  • Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C, Farré X, Marigorta UM, Fehr E, Dickhaus T (2016) Combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies. Sci Rep 6:36671

    Article  Google Scholar 

  • Mukaka MM (2012) A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24(3):69–71

    Google Scholar 

  • Naik A, Rangwala H (2016) Large-scale hierarchical classification with rare categories and inconsistencies. AI Matters 2(3):27–29

    Article  Google Scholar 

  • Naik A, Rangwala H (2018) Large scale hierarchical classification: state of the art. Springer, Berlin

    Book  Google Scholar 

  • Naik A, Rangwala H (2019) Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach. J Intell Inf Syst 52(1):141–164

    Article  Google Scholar 

  • Nakano FK, Lietaert M, Vens C (2019) Machine learning for discovering missing or wrong protein function annotations. BMC Bioinform 20(1):485

    Article  Google Scholar 

  • Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. International conference on rough sets and current trends in computing, Warsaw, Poland, pp 158–167

  • Notaro M, Schubach M, Robinson PN, Valentini G (2017) Prediction of human phenotype ontology terms by means of hierarchical ensemble methods. BMC Bioinform 18(1):449

    Article  Google Scholar 

  • Obozinski G, Lanckriet G, Grant C, Jordan MI, Noble WS (2008) Consistent probabilistic outputs for protein function prediction. Genome Biol 9(1):S6

    Article  Google Scholar 

  • Paes BC, Plastino A, Freitas AA (2012) Improving local per level hierarchical classification. J Inf Data Manag 3(3):394–394

    Google Scholar 

  • Partalas I, Kosmopoulos A, Baskiotis N, Artières T, Paliouras G, Gaussier É, Androutsopoulos I, Amini M, Gallinari P (2015) LSHTC: a benchmark for large-scale text classification. CoRR abs/1503.08581

  • Pereira RM, da Costa YMG, Silla Jr CN (2018) Dealing with imbalanceness in hierarchical multi-label datasets using multi-label resampling techniques. In: IEEE 30th international conference on tools with artificial intelligence (ICTAI), pp 818–824

  • Pereira RM, Costa YM, Silla CN Jr (2020) MLTL: a multi-label approach for the Tomek link undersampling algorithm. Neurocomputing 383:95–105

    Article  Google Scholar 

  • Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  • Roy A, Cruz RMO, Sabourin R, Cavalcanti GDC (2018) A study on combining dynamic selection and data preprocessing for imbalance learning. Neurocomputing 286:179–192

    Article  Google Scholar 

  • Ruepp A, Zollner A, Maier D, Albermann K, Hani J, Mokrejs M, Tetko I, Güldener U, Mannhaupt G, Münsterkötter M et al (2004) The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res 32(18):5539–5545

    Article  Google Scholar 

  • Sarnal Barbedo JG, Lopes A (2006) Automatic genre classification of musical signals. EURASIP J Adv Signal Process 2007(1):064960

    Article  MathSciNet  Google Scholar 

  • Schietgat L, Vens C, Struyf J, Blockeel H, Kocev D, Džeroski S (2010) Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinform 11(1):1–14

    Article  MATH  Google Scholar 

  • Silla CN Jr, Freitas AA (2009) Novel top-down approaches for hierarchical classification and their application to automatic music genre classification. In: 2009 IEEE international conference on systems, man and cybernetics. IEEE, pp 3499–3504

  • Silla CN Jr, Freitas AA (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Disc 22(1–2):31–72

    Article  MathSciNet  MATH  Google Scholar 

  • Sitompul OS, Nababan EB et al (2018) Biased support vector machine and weighted-smote in handling class imbalance problem. Int J Adv Intell Inform 4(1):21–27

    Article  Google Scholar 

  • Sokolova M, Japkowicz N, Szpakowicz S (2006) Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Proceedings of the Australasian joint conference on artificial intelligence, pp 1015–1021

  • Soleymani R, Granger E, Fumera G (2020) F-measure curves: a tool to visualize classifier performance under imbalance. Pattern Recogn 100:107146

    Article  Google Scholar 

  • Song Y, Roth D (2014) On dataless hierarchical text classification. In: Twenty-eighth AAAI conference on artificial intelligence

  • Stefanowski J, Wilk S (2008) Selective pre-processing of imbalanced data for improving classification performance. In: International conference on data warehousing and knowledge discovery, Italy, Turin, pp 283–292

  • Stein RA, Jaques PA, Valiati JF (2019) An analysis of hierarchical text classification using word embeddings. Inf Sci 471:216–232

    Article  Google Scholar 

  • Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378

    Article  MATH  Google Scholar 

  • Szalkai B, Grolmusz V, Hancock J (2018) Seclaf: a webserver and deep neural network design tool for hierarchical biological sequence classification. Bioinformatics 1:3

    Google Scholar 

  • Tang H, Wang Y, Tang S, Chu D, Li C (2019) A randomized clustering forest approach for efficient prediction of protein functions. IEEE Access 7:12360–12372

    Article  Google Scholar 

  • Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452

    MathSciNet  MATH  Google Scholar 

  • Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification. In: European conference on machine learning. Springer, pp 406–417

  • Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185

    Article  Google Scholar 

  • Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B (Cybern) 42(4):1119–1130

    Article  Google Scholar 

  • Xu C, Geng X (2019) Hierarchical classification based on label distribution learning. Proc AAAI Conf Artif Intell 33:5533–5540

    Google Scholar 

  • Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Article  Google Scholar 

  • Yu L, Zhou R, Tang L, Chen R (2018) A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Appl Soft Comput 69:192–202

    Article  Google Scholar 

  • Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15(3):321–334

    Article  Google Scholar 

  • Zhou ZH, Liu XY (2010) On multi-class cost-sensitive learning. Comput Intell 26(3):232–257

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Brazilian Research Support Agencies: Coordination for the Improvement of Higher Education Personnel (CAPES), National Council for Scientific and Technological Development (CNPq) and Araucaria Foundation (FA) for their financial support. We also thank the anonymous reviewers and the Action Editor Grigorios Tsoumakas for their valuable feedback on the earlier versions of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodolfo M. Pereira.

Additional information

Responsible editor: Grigorios Tsoumakas.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

In the appendix we present all the Tables of classification and metrics results generated in the experiments of this work, which were summarized into charts in the main part of paper. In Tables 2427, the lines in italic represent the average ranking of the approaches. Besides the raw results we also present here the Tables of the statistics, which were applied over the results in order to give statistical background in the responses of the Analysis and Discussion section (Tables 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 and 32).

Table 7 F-Score results for the proposed approaches in the Cell-cycle dataset
Table 8 F-Score results for the proposed approaches in the Eisen dataset
Table 9 F-Score results for the proposed approaches in the Exp dataset
Table 10 F-Score results for the proposed approaches in the FMA MFCC dataset
Table 11 F-Score results for the proposed approaches in the Gasch-1 dataset
Table 12 F-Score results for the proposed approaches in the CLEF dataset
Table 13 F-Score results for the proposed approaches in the DMOZ-2010 dataset
Table 14 F-Score results for the proposed approaches in the LSHTC-small dataset
Table 15 F-Score results for the Top-Down (TD) approaches in all datasets
Table 16 F-Score results for the Flat-ML approach in all datasets
Table 17 F-Score results for the Flat-MLRS approach with all datasets
Table 18 F-Score results for the Clus-HMC approach with all datasets
Table 19 F-Score results for the HierCost approach with all datasets
Table 20 Wilcoxon statistical tests for F-score results in the Flat Multi-Label scenarios
Table 21 Wilcoxon statistical tests for F-score results in the resampling for the Local Classifiers per Node approach
Table 22 Wilcoxon statistical tests for F-score results in the resampling for the Local Classifiers per Parent Node approach
Table 23 Wilcoxon statistical tests for F-score results in the resampling for the Local Classifiers per Level approach
Table 24 Average ranking of the classification results in the resampling for the Local Classifiers per Node approach
Table 25 Average ranking of the classification results in the resampling for the Local Classifiers per Parent Node approach
Table 26 Average ranking of the classification results in the resampling for the Local Classifiers per Level approach
Table 27 Average ranking of the classification results in the resampling for the all the Local Classifiers approaches
Table 28 Post-hoc Mannwhitney test comparing the flat with the local classifier approaches
Table 29 Pearson correlation statistical test for the \({\textit{MeanIR}}_{{\textit{LCN}}}\) measure and the classification results
Table 30 Pearson Correlation Statistical Test for the \(MeanIR_{LCPN}\) measure and the classification results
Table 31 Pearson correlation statistical test for the \({\textit{MeanIR}}_{{\textit{LCL}}}\) measure and the classification results
Table 32 Wilcoxon statistical tests comparing the best F-score results from the proposed approaches (LCN) versus the best results for each global classification approach considering all datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pereira, R.M., Costa, Y.M.G. & Silla, C.N. Handling imbalance in hierarchical classification problems using local classifiers approaches. Data Min Knowl Disc 35, 1564–1621 (2021). https://doi.org/10.1007/s10618-021-00762-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00762-8

Keywords

Navigation