Explainable Ensemble Trees

Aria, Massimo; Gnasso, Agostino; Iorio, Carmela; Pandolfo, Giuseppe

doi:10.1007/s00180-022-01312-6

Explainable Ensemble Trees

Original paper
Published: 12 January 2023

Volume 39, pages 3–19, (2024)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Massimo Aria ORCID: orcid.org/0000-0002-8517-9411¹,
Agostino Gnasso¹,
Carmela Iorio¹ &
…
Giuseppe Pandolfo¹

906 Accesses
Explore all metrics

Abstract

Ensemble methods are supervised learning algorithms that provide highly accurate solutions by training many models. Random forest is probably the most widely used in regression and classification problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. However, such an algorithm suffers from a lack of explainability and thus does not allow users to understand how particular decisions are made. To improve on that, we propose a new way of interpreting an ensemble tree structure. Starting from a random forest model, our approach is able to explain graphically the relationship structure between the response variable and predictors. The proposed method appears to be useful in all real-world cases where model interpretation for predictive purposes is crucial. The proposal is evaluated by means of real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Example-Based Explanations of Random Forest Predictions

Can’t see the forest for the trees

Article Open access 29 July 2023

A Brief Survey on Random Forest Ensembles in Classification Model

References

Aria M, Cuccurullo C, Gnasso A (2021) A comparison among interpretative proposals for random forests. Mach Learn Appl 6:100094
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, CA: Wadsworth. Int Group 432:151–166
Google Scholar
Burkart N, Huber MF (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317
Article MathSciNet Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161–168
Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, pp 96–103
Chipman H, George E, McCulloh R (1998) Making sense of a forest of trees. Comput Sci Stat, 84–92
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608
Dvořák J (2019) Classification trees with soft splits optimized for ranking. Comput Stat 34(2):763–786
Article MathSciNet Google Scholar
Ehrlinger J (2015) ggRandomForests: random forests for regression. R package version 1(4) (2015)
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York
Google Scholar
Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA). IEEE, pp 80–89
Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319
Article MathSciNet Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):1–42
Article Google Scholar
Guyon I et al. (1997) A scaling law for the validation-set training-set size ratio. AT &T Bell Laboratories 1(11)
Haddouchi M, Berrado A (2018) Assessing interpretation capacity in machine learning: a critical review. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, pp 1–6
Haddouchi M, Berrado A (2019) A survey of methods and tools used for interpreting random forest. In: 2019 1st international conference on smart systems and data science (ICSSD). IEEE, pp 1–6
Iorio C, Aria M, D’Ambrosio A, Siciliano R (2019) Informative trees by visual pruning. Expert Syst Appl 127:228–240
Article Google Scholar
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57
Article Google Scholar
Lisboa PJ (2013) Interpretability in machine learning—principles and practice. In: International workshop on fuzzy logic and applications. Springer, pp 15–21
Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Advances in neural information processing systems, pp 431–439
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30
Molnar C, Casalicchio G, Bischl B (2020) Interpretable machine learning—a brief history, state-of-the-art and challenges. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 417–431
Plumb G, Molitor D, Talwalkar AS (2018) Model agnostic supervised local explanations. Adv Neural Inf Process Syst 31
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310
Article MathSciNet Google Scholar
Štrumbelj E, Bosnić Z, Kononenko I, Zakotnik B, Grašič Kuhar C (2010) Explanation and reliability of prediction models: the case of breast cancer recurrence. Knowl Inf Syst 24(2):305–324
Article Google Scholar
Tan S, Soloviev M, Hooker G, Wells MT (2020) Tree space prototypes: another look at making tree ensembles interpretable. In: Proceedings of the 2020 ACM-IMS on foundations of data science conference, pp 23–34
Wang M, Zhang H (2009) Search for the smallest random forest. Stat Interface 2(3):381–388
Article MathSciNet Google Scholar
Zhao X, Wu Y, Lee DL, Cui W (2018) iforest: Interpreting random forests via visual analytics. IEEE Trans Vis Comput Graph 25(1):407–416
Article Google Scholar

Download references

Funding

Funding was provided by Ministero dell’Istruzione, dell’Università e della Ricerca (Grant No. PRIN 2017, ID: 2017KZZLYP). Giuseppe Pandolfo acknowledges the support of the National Operative Program (PON) Ricerca e Innovazione 2014-2020 (PON R &I) - Azione IV.4 - “Dottorati e contratti di ricerca su tematiche dell’innovazione.

Author information

Authors and Affiliations

Department of Economics and Statistics, University of Naples Federico II, Naples, Italy
Massimo Aria, Agostino Gnasso, Carmela Iorio & Giuseppe Pandolfo

Authors

Massimo Aria
View author publications
You can also search for this author inPubMed Google Scholar
Agostino Gnasso
View author publications
You can also search for this author inPubMed Google Scholar
Carmela Iorio
View author publications
You can also search for this author inPubMed Google Scholar
Giuseppe Pandolfo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Massimo Aria.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Aria, M., Gnasso, A., Iorio, C. et al. Explainable Ensemble Trees. Comput Stat 39, 3–19 (2024). https://doi.org/10.1007/s00180-022-01312-6

Download citation

Received: 21 May 2022
Accepted: 04 December 2022
Published: 12 January 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00180-022-01312-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Explainable Ensemble Trees

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Example-Based Explanations of Random Forest Predictions

Can’t see the forest for the trees

A Brief Survey on Random Forest Ensembles in Classification Model

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now