Random forest based quantile-oriented sensitivity analysis indices estimation | Computational Statistics Skip to main content
Log in

Random forest based quantile-oriented sensitivity analysis indices estimation

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

We propose a random forest based estimation procedure for Quantile-Oriented Sensitivity Analysis—QOSA. In order to be efficient, a cross-validation step on the leaf size of trees is required. Our full estimation procedure is tested on both simulated data and a real dataset. Our estimators use either the bootstrap samples or the original sample in the estimation. Also, they are either based on a quantile plug-in procedure (the R-estimators) or on a direct minimization (the Q-estimators). This leads to 8 different estimators which are compared on simulations. From these simulations, it seems that the estimation method based on a direct minimization is better than the one plugging the quantile. This is a significant result because the method with direct minimization requires only one sample and could therefore be preferred.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Large Scale Atmospherical Chemestrial Model.

References

  • Antoniadis A, Lambert-Lacroix S, Poggi J-M (2021) Random forests for global sensitivity analysis: a selective review. Reliab Eng Syst Saf 206:107312

    Article  Google Scholar 

  • Besse P, Milhem H, Mestre O, Dufour A, Peuch V-H (2007) Comparaison de techniques de “Data Mining’’ pour l’adaptation statistique des prévisions d’ozone du modèle de chimie-transport MOCAGE. Pollution atmosphérique 195:285–292

    Google Scholar 

  • Breiman L (1996) Out-of-bag estimation

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Loh WY (2011) Classification and regression trees. Wiley Interdiscip Rev Data Mining Knowl Discov 1(1):14–23

    Article  Google Scholar 

  • Broto B, Bachoc F, Depecker M (2020) Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution. SIAM/ASA J Uncertain Quantif 8(2):693–716

    Article  MathSciNet  Google Scholar 

  • Browne T, Fort J-C, Iooss B, Le Gratiet L (2017) Estimate of quantile-oriented sensitivity indices. Technical Report, hal-01450891

  • Da Veiga S, Gamboa F, Iooss B, Prieur C (2021) Basics and trends in sensitivity analysis: Theory and practice in R

  • Díaz-Uriarte R, De Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinf 7(1):3

    Article  Google Scholar 

  • Duroux R, Scornet E (2018) Impact of subsampling and tree depth on random forests. ESAIM Probab Stat 22:96–128

    Article  MathSciNet  Google Scholar 

  • Elie-Dit-Cosaque K (2020) qosa-indices, a python package available at: https://gitlab.com/qosa_index/qosa

  • Elie-Dit-Cosaque K, Maume-Deschamps V (2022) Goal-oriented shapley effects with special attention to the quantile-oriented case. SIAM/ASA J Uncertain Quantif 10(3):1037–1069

    Article  MathSciNet  Google Scholar 

  • Elie-Dit-Cosaque K, Maume-Deschamps V (2022) Random forest estimation of conditional distribution functions and conditional quantiles. Electron J Stat 16(2):6553–6583

    Article  MathSciNet  Google Scholar 

  • Fort J-C, Klein T, Rachdi N (2016) New sensitivity analysis subordinated to a contrast. Commun Stat Theory Methods 45(15):4349–4364

    Article  MathSciNet  Google Scholar 

  • Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19(3):293–325

    Article  MathSciNet  Google Scholar 

  • Homma T, Saltelli A (1996) Importance measures in global sensitivity analysis of nonlinear models. Reliab Eng Syst Saf 52(1):1–17

    Article  Google Scholar 

  • Jansen MJ, Rossing WA, Daamen RA (1994) Monte Carlo estimation of uncertainty contributions from several independent multivariate sources. In: Predictability and nonlinear modelling in natural sciences and economics, pp 334–343. Springer

  • Kala Z (2019) Quantile-oriented global sensitivity analysis of design resistance. J Civ Eng Manag 25(4):297–305

    Article  Google Scholar 

  • Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156

    Article  Google Scholar 

  • Kucherenko S, Song S, Wang L (2019) Quantile based global sensitivity measures. Reliab Eng Syst Saf 185:35–48

    Article  Google Scholar 

  • Lin Y, Jeon Y (2006) Random forests and adaptive nearest neighbors. J Am Stat Assoc 101(474):578–590

    Article  MathSciNet  Google Scholar 

  • Marceau E (2013) Modélisation et évaltuation quantitative des risques en actuariat. Springer, Berlin

    Book  Google Scholar 

  • Maume-Deschamps V, Niang I (2018) Estimation of quantile oriented sensitivity indices. Stat Probab Lett 134:122–127

    Article  MathSciNet  Google Scholar 

  • Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7(Jun):983–999

    MathSciNet  Google Scholar 

  • Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensitivity analysis in practice: a guide to assessing scientific models. Wiley, New York

    Google Scholar 

  • Scornet E (2016) Random forests and kernel methods. IEEE Trans Inf Theory 62(3):1485–1500

    Article  MathSciNet  Google Scholar 

  • Scornet E (2017) Tuning parameters in random forests. ESAIM Proc Surv 60:144–162

    Article  MathSciNet  Google Scholar 

  • Sobol IM (1993) Sensitivity estimates for nonlinear mathematical models. Math Modell Comput Exp 1(4):407–414

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Véronique Maume-Deschamps.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Bootstrap version of the estimators of the O term

1.1.1 Quantile estimation with a weighted approach

Another estimator of the C_CDF can be achieved by replacing the weights \(w_{n,j}^o \left( x_i \right)\) based on the original dataset of the forest by those using the bootstrap samples \(w_{n,j}^b \left( x_i \right)\) provided in Eq. (4). That gives the following estimator which has been proposed in Elie-Dit-Cosaque and Maume-Deschamps (2022b),

$$\begin{aligned} F_{k,n}^b \left( \left. y \right| X_i = x_i \right) = \sum _{j=1}^{n} w_{n,j}^b \left( x_i \right) \mathbbm {1}_{ \{Y^j \leqslant y\}} \ . \end{aligned}$$

The conditional quantiles are then estimated by plugging \(F_{k,n}^b \left( \left. y \right| X_i = x_i \right)\) instead of \(F \left( \left. y \right| X_i = x_i \right)\). Accordingly, the associated estimator of \(\mathbb {E}\left[ \psi _\alpha \left( Y, q^\alpha \left( \left. Y \right| X_i \right) \right) \right]\) based on these weights is denoted \(\widehat{R}_i^{1, b}\).

1.1.2 Quantile estimation within a leaf

For the \(\ell\)-th tree, the estimator \(\widehat{q}_\ell ^{b, \alpha } \left( \left. Y \right| X_i=x_i \right)\) of \(q^\alpha \left( \left. Y \right| X_i = x_i \right)\) is obtained with the bootstrap observations falling into \(A_n(x_i;\Theta _\ell , \mathcal {D}_n^i)\) as follows

$$\begin{aligned} \begin{aligned} \widehat{q}_\ell ^{b, \alpha } \left( \left. Y \right| X_i=x_i \right)&= \inf _{p=1,\ldots ,n} \left\{ \right. Y^{p},\ \left( X_i^p, Y^p \right) \in \mathcal {D}_n^{i \star }(\Theta _\ell ) \text { and } X_i^p \in A_n(x_i; \Theta _\ell , \mathcal {D}_n^i): \\&\left. \sum _{j=1}^n \dfrac{B_{j} \left( \Theta _\ell ^1,\mathcal {D}_n^i \right) \cdot \mathbbm {1}_{\left\{ X_i^j\in A_n(x_i;\Theta _\ell ,\mathcal {D}_n^i) \right\} } \cdot \mathbbm {1}_{\left\{ Y^j \leqslant Y^p \right\} }}{N_n^b(x_i;\Theta _\ell ,\mathcal {D}_n^i)} \geqslant \alpha \right\} \ . \end{aligned} \end{aligned}$$

That gives us the following random forest estimate of the conditional quantile

$$\begin{aligned} \widehat{q}^{b, \alpha } \left( \left. Y \right| X_i=x_i \right) = \dfrac{1}{k} \sum _{\ell =1}^{k} \widehat{q}_\ell ^{o, \alpha } \left( \left. Y \right| X_i=x_i \right) \ . \end{aligned}$$

Hence, we propose the estimator \(\widehat{R}_i^{2, b}\) of \(\mathbb {E}\left[ \psi _\alpha \left( Y, q^\alpha \left( \left. Y \right| X_i \right) \right) \right]\) using the bootstrap samples.

1.1.3 Minimum estimation with a weighted approach

Another estimator is obtained by replacing weigths \(w_{n,j}^o \left( x_i \right)\) with the \(w_{n,j}^b \left( x_i \right)\) version presented in Eq. (4) using the bootstrap samples. The obtained estimator of the O term is denoted by \(\widehat{Q}_i^{1, b}\).

1.1.4 Minimum estimation within a leaf

For the \(\ell\)-th tree, let \(N_n^b(m;\Theta _\ell ,\mathcal {D}_n^i)\) be the number of observations of the bootstrap sample \(\mathcal {D}_n^{i \star } \left( \Theta _\ell \right)\) falling into the m-th leaf node and \(N_{leaves}^\ell\) be the number of leaves in the \(\ell\)-th tree. We define the following tree estimator for the O term

$$\begin{aligned}&\dfrac{1}{N_{leaves}^\ell } \sum _{m=1}^{N_{leaves}^\ell } \Bigg ( \min \left\{ p=1,\ldots ,n,\ \left( X_i^p, Y^p \right) \in \mathcal {D}_n^{i \star }(\Theta _\ell ) \text { and } X_i^p \in A_n \left( m; \Theta _\ell , \mathcal {D}_n^i \right) \right\} \\&\quad \left. \sum _{j=1}^n \dfrac{B_{j} \left( \Theta _\ell ^1,\mathcal {D}_n^i \right) \cdot \psi _\alpha \left( Y^j, Y^{p} \right) \cdot \mathbbm {1}_{\left\{ \left( X_i^j, Y^j \right) \in \mathcal {D}_n^{i \star }(\Theta _\ell ),\ X_i^j \in A_n \left( m; \Theta _\ell , \mathcal {D}_n^i \right) \right\} }}{N_n^b(m;\Theta _\ell ,\mathcal {D}_n^i)} \right) \ . \end{aligned}$$

The approximations of the k randomized trees are then averaged to obtain the following random forest estimate

$$\begin{aligned} \widehat{Q}_i^{2, b}&= \dfrac{1}{k} \sum _{\ell =1}^k \left[ \dfrac{1}{N_{leaves}^\ell } \sum _{m=1}^{N_{leaves}^\ell } \Bigg ( \min \right. \left\{ p=1,\ldots ,n,\ \left( X_i^p, Y^p \right) \in \mathcal {D}_n^{i \star }(\Theta _\ell ) \text { and }\right. \\&\left. \quad X_i^p \in A_n \left( m; \Theta _\ell , \mathcal {D}_n^i \right) \right\} \\&\quad \left. \left. \sum _{j=1}^n \dfrac{B_{j} \left( \Theta _\ell ^1,\mathcal {D}_n^i \right) \cdot \psi _\alpha \left( Y^j, Y^{p} \right) \cdot \mathbbm {1}_{\left\{ \left( X_i^j, Y^j \right) \in \mathcal {D}_n^{i \star }(\Theta _\ell ),\ X_i^j \in A_n \left( m; \Theta _\ell , \mathcal {D}_n^i \right) \right\} }}{N_n^b(m;\Theta _\ell ,\mathcal {D}_n^i)} \right) \right] \ . \end{aligned}$$

1.1.5 Minimum estimation with a weighted approach and complete trees

By using the weights \(w_{n,j}^b \left( \textbf{x}\right)\) instead of \(w_{n,j}^o \left( \textbf{x}\right)\), we may define the estimator \(\widehat{Q}_i^{3, b}\).

1.2 Algorithms for estimating the first-order QOSA index

Algorithm 2
figure b

QOSA index estimators plugging the quantile

Algorithm 3
figure c

QOSA index estimators with the weighted minimum approach

Algorithm 4
figure d

QOSA index estimators computing the minimum in leaves

Algorithm 5
figure e

QOSA index estimators with the weighted minimum and fully grown trees

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elie-Dit-Cosaque, K., Maume-Deschamps, V. Random forest based quantile-oriented sensitivity analysis indices estimation. Comput Stat 39, 1747–1777 (2024). https://doi.org/10.1007/s00180-023-01450-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01450-5

Keywords

Navigation