Abstract
We propose a random forest based estimation procedure for Quantile-Oriented Sensitivity Analysis—QOSA. In order to be efficient, a cross-validation step on the leaf size of trees is required. Our full estimation procedure is tested on both simulated data and a real dataset. Our estimators use either the bootstrap samples or the original sample in the estimation. Also, they are either based on a quantile plug-in procedure (the R-estimators) or on a direct minimization (the Q-estimators). This leads to 8 different estimators which are compared on simulations. From these simulations, it seems that the estimation method based on a direct minimization is better than the one plugging the quantile. This is a significant result because the method with direct minimization requires only one sample and could therefore be preferred.
Similar content being viewed by others
Notes
Large Scale Atmospherical Chemestrial Model.
References
Antoniadis A, Lambert-Lacroix S, Poggi J-M (2021) Random forests for global sensitivity analysis: a selective review. Reliab Eng Syst Saf 206:107312
Besse P, Milhem H, Mestre O, Dufour A, Peuch V-H (2007) Comparaison de techniques de “Data Mining’’ pour l’adaptation statistique des prévisions d’ozone du modèle de chimie-transport MOCAGE. Pollution atmosphérique 195:285–292
Breiman L (1996) Out-of-bag estimation
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Mining Knowl Discov 1(1):14–23
Broto B, Bachoc F, Depecker M (2020) Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA J Uncertain Quantif 8(2):693–716
Browne T, Fort J-C, Iooss B, Le Gratiet L (2017) Estimate of quantile-oriented sensitivity indices. Technical Report, hal-01450891
Da Veiga S, Gamboa F, Iooss B, Prieur C (2021) Basics and trends in sensitivity analysis: Theory and practice in R
Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):3
Duroux R, Scornet E (2018) Impact of subsampling and tree depth on random forests. ESAIM Probab Stat 22:96–128
Elie-Dit-Cosaque K (2020) qosa-indices, a python package available at: https://gitlab.com/qosa_index/qosa
Elie-Dit-Cosaque K, Maume-Deschamps V (2022) Goal-oriented shapley effects with special attention to the quantile-oriented case. SIAM/ASA J Uncertain Quantif 10(3):1037–1069
Elie-Dit-Cosaque K, Maume-Deschamps V (2022) Random forest estimation of conditional distribution functions and conditional quantiles. Electron J Stat 16(2):6553–6583
Fort J-C, Klein T, Rachdi N (2016) New sensitivity analysis subordinated to a contrast. Commun Stat Theory Methods 45(15):4349–4364
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19(3):293–325
Homma T, Saltelli A (1996) Importance measures in global sensitivity analysis of nonlinear models. Reliab Eng Syst Saf 52(1):1–17
Jansen MJ, Rossing WA, Daamen RA (1994) Monte Carlo estimation of uncertainty contributions from several independent multivariate sources. In: Predictability and nonlinear modelling in natural sciences and economics, pp 334–343. Springer
Kala Z (2019) Quantile-oriented global sensitivity analysis of design resistance. J Civ Eng Manag 25(4):297–305
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156
Kucherenko S, Song S, Wang L (2019) Quantile based global sensitivity measures. Reliab Eng Syst Saf 185:35–48
Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590
Marceau E (2013) Modélisation et évaltuation quantitative des risques en actuariat. Springer, Berlin
Maume-Deschamps V, Niang I (2018) Estimation of quantile oriented sensitivity indices. Stat Probab Lett 134:122–127
Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7(Jun):983–999
Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensitivity analysis in practice: a guide to assessing scientific models. Wiley, New York
Scornet E (2016) Random forests and kernel methods. IEEE Trans Inf Theory 62(3):1485–1500
Scornet E (2017) Tuning parameters in random forests. ESAIM Proc Surv 60:144–162
Sobol IM (1993) Sensitivity estimates for nonlinear mathematical models. Math Modell Comput Exp 1(4):407–414
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Bootstrap version of the estimators of the O term
1.1.1 Quantile estimation with a weighted approach
Another estimator of the C_CDF can be achieved by replacing the weights \(w_{n,j}^o \left( x_i \right)\) based on the original dataset of the forest by those using the bootstrap samples \(w_{n,j}^b \left( x_i \right)\) provided in Eq. (4). That gives the following estimator which has been proposed in Elie-Dit-Cosaque and Maume-Deschamps (2022b),
The conditional quantiles are then estimated by plugging \(F_{k,n}^b \left( \left. y \right| X_i = x_i \right)\) instead of \(F \left( \left. y \right| X_i = x_i \right)\). Accordingly, the associated estimator of \(\mathbb {E}\left[ \psi _\alpha \left( Y, q^\alpha \left( \left. Y \right| X_i \right) \right) \right]\) based on these weights is denoted \(\widehat{R}_i^{1, b}\).
1.1.2 Quantile estimation within a leaf
For the \(\ell\)-th tree, the estimator \(\widehat{q}_\ell ^{b, \alpha } \left( \left. Y \right| X_i=x_i \right)\) of \(q^\alpha \left( \left. Y \right| X_i = x_i \right)\) is obtained with the bootstrap observations falling into \(A_n(x_i;\Theta _\ell , \mathcal {D}_n^i)\) as follows
That gives us the following random forest estimate of the conditional quantile
Hence, we propose the estimator \(\widehat{R}_i^{2, b}\) of \(\mathbb {E}\left[ \psi _\alpha \left( Y, q^\alpha \left( \left. Y \right| X_i \right) \right) \right]\) using the bootstrap samples.
1.1.3 Minimum estimation with a weighted approach
Another estimator is obtained by replacing weigths \(w_{n,j}^o \left( x_i \right)\) with the \(w_{n,j}^b \left( x_i \right)\) version presented in Eq. (4) using the bootstrap samples. The obtained estimator of the O term is denoted by \(\widehat{Q}_i^{1, b}\).
1.1.4 Minimum estimation within a leaf
For the \(\ell\)-th tree, let \(N_n^b(m;\Theta _\ell ,\mathcal {D}_n^i)\) be the number of observations of the bootstrap sample \(\mathcal {D}_n^{i \star } \left( \Theta _\ell \right)\) falling into the m-th leaf node and \(N_{leaves}^\ell\) be the number of leaves in the \(\ell\)-th tree. We define the following tree estimator for the O term
The approximations of the k randomized trees are then averaged to obtain the following random forest estimate
1.1.5 Minimum estimation with a weighted approach and complete trees
By using the weights \(w_{n,j}^b \left( \textbf{x}\right)\) instead of \(w_{n,j}^o \left( \textbf{x}\right)\), we may define the estimator \(\widehat{Q}_i^{3, b}\).
1.2 Algorithms for estimating the first-order QOSA index
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Elie-Dit-Cosaque, K., Maume-Deschamps, V. Random forest based quantile-oriented sensitivity analysis indices estimation. Comput Stat 39, 1747–1777 (2024). https://doi.org/10.1007/s00180-023-01450-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01450-5