Abstract
Ensemble methods are supervised learning algorithms that provide highly accurate solutions by training many models. Random forest is probably the most widely used in regression and classification problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. However, such an algorithm suffers from a lack of explainability and thus does not allow users to understand how particular decisions are made. To improve on that, we propose a new way of interpreting an ensemble tree structure. Starting from a random forest model, our approach is able to explain graphically the relationship structure between the response variable and predictors. The proposed method appears to be useful in all real-world cases where model interpretation for predictive purposes is crucial. The proposal is evaluated by means of real data sets.



Similar content being viewed by others
References
Aria M, Cuccurullo C, Gnasso A (2021) A comparison among interpretative proposals for random forests. Mach Learn Appl 6:100094
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmont, CA: Wadsworth. Int Group 432:151–166
Burkart N, Huber MF (2021) A survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161–168
Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th international conference on Machine learning, pp 96–103
Chipman H, George E, McCulloh R (1998) Making sense of a forest of trees. Comput Sci Stat, 84–92
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv:1702.08608
Dvořák J (2019) Classification trees with soft splits optimized for ranking. Comput Stat 34(2):763–786
Ehrlinger J (2015) ggRandomForests: random forests for regression. R package version 1(4) (2015)
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, New York
Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L (2018) Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th international conference on data science and advanced analytics (DSAA). IEEE, pp 80–89
Grömping U (2009) Variable importance assessment in regression: linear regression versus random forest. Am Stat 63(4):308–319
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):1–42
Guyon I et al. (1997) A scaling law for the validation-set training-set size ratio. AT &T Bell Laboratories 1(11)
Haddouchi M, Berrado A (2018) Assessing interpretation capacity in machine learning: a critical review. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, pp 1–6
Haddouchi M, Berrado A (2019) A survey of methods and tools used for interpreting random forest. In: 2019 1st international conference on smart systems and data science (ICSSD). IEEE, pp 1–6
Iorio C, Aria M, D’Ambrosio A, Siciliano R (2019) Informative trees by visual pruning. Expert Syst Appl 127:228–240
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57
Lisboa PJ (2013) Interpretability in machine learning—principles and practice. In: International workshop on fuzzy logic and applications. Springer, pp 15–21
Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Advances in neural information processing systems, pp 431–439
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30
Molnar C, Casalicchio G, Bischl B (2020) Interpretable machine learning—a brief history, state-of-the-art and challenges. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 417–431
Plumb G, Molitor D, Talwalkar AS (2018) Model agnostic supervised local explanations. Adv Neural Inf Process Syst 31
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Shmueli G (2010) To explain or to predict? Stat Sci 25(3):289–310
Štrumbelj E, Bosnić Z, Kononenko I, Zakotnik B, Grašič Kuhar C (2010) Explanation and reliability of prediction models: the case of breast cancer recurrence. Knowl Inf Syst 24(2):305–324
Tan S, Soloviev M, Hooker G, Wells MT (2020) Tree space prototypes: another look at making tree ensembles interpretable. In: Proceedings of the 2020 ACM-IMS on foundations of data science conference, pp 23–34
Wang M, Zhang H (2009) Search for the smallest random forest. Stat Interface 2(3):381–388
Zhao X, Wu Y, Lee DL, Cui W (2018) iforest: Interpreting random forests via visual analytics. IEEE Trans Vis Comput Graph 25(1):407–416
Funding
Funding was provided by Ministero dell’Istruzione, dell’Università e della Ricerca (Grant No. PRIN 2017, ID: 2017KZZLYP). Giuseppe Pandolfo acknowledges the support of the National Operative Program (PON) Ricerca e Innovazione 2014-2020 (PON R &I) - Azione IV.4 - “Dottorati e contratti di ricerca su tematiche dell’innovazione.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aria, M., Gnasso, A., Iorio, C. et al. Explainable Ensemble Trees. Comput Stat 39, 3–19 (2024). https://doi.org/10.1007/s00180-022-01312-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01312-6