Abstract
Mean regression model could be inadequate if the probability distribution of the observed responses is not symmetric. Under such situation, the quantile regression turns to be a more robust alternative for accommodating outliers and misspecification of the error distribution, since it characterizes the entire conditional distribution of the outcome variable. This paper proposes a robust logistic quantile regression model by using a logit link function along the EM-based algorithm for maximum likelihood estimation of the p th quantile regression parameters in Galarza (Stat 6, 1, 2017). The aforementioned quantile regression (QR) model is built on a generalized class of skewed distributions which consists of skewed versions of normal, Student’s t, Laplace, contaminated normal, slash, among other heavy-tailed distributions. We evaluate the performance of our proposal to accommodate bounded responses by investigating a synthetic dataset where we consider a full model including categorical and continuous covariates as well as several of its sub-models. For the full model, we compare our proposal with a non-parametric alternative from the so-called quantreg R package. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters, automatic selection of best model, as well as simulation of envelope plots which are useful for assessing the goodness-of-fit.
Similar content being viewed by others
References
Andrews, D. F. and Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the Royal Statistical Society, Series B. 36, 99–102.
Barndorff-Nielsen, O. E. and Shephard, N. (2001). Non-gaussian ornstein–uhlenbeck-based models and some of their uses in financial economics. Journal of the Royal Statistical Society, Series B 63, 167–241.
Barrodale, I. and Roberts, F. (1977). Algorithms for restricted least absolute value estimation. Communications in Statistics-Simulation and Computation 6, 353–363.
Bayes, C. L., Bazan, J. L. and De Castro, M. (2017). A quantile parametric mixed regression model for bounded response variables. Statistics and its Interface 10, 483–493.
Benites, L., Lachos, V. H. and Vilca, F. (2013). Likelihood based inference for quantile regression using the asymmetric Laplace distribution, Technical Report 15, Universidade Estadual de Campinas.
Bottai, M., Cai, B. and McKeown, R. E. (2010). Logistic quantile regression for bounded outcomes. Statistics in Medicine 29, 2, 309–317.
Dempster, A., Laird, N. and Rubin, D (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B39, 1–38.
Ferrari, S. and Cribari-Neto, F (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31, 7, 799–815.
Galarza, C. E., Benites, L. and Lachos, V. H. (2015). lqr: Robust Linear Quantile Regression. R package version 1.5.
Galarza, C., Lachos, V., Barbosa Cabral, C. and Castro Cepero, L. (2017). Robust quantile regression using a generalized class of skewed distributions. Stat 6, 1.
Galvis, D. M., Bandyopadhyay, D. and Lachos, V.H (2014). Augmented mixed beta regression models for periodontal proportion data. Statistics in Medicine 33, 21, 3759–3771.
Gómez-Déniz, E., Sordo, M. A. and Calderín-Ojeda, E. (2014). The log–lindley distribution as an alternative to the beta regression model with applications in insurance. Insurance: mathematics and Economics 54, 49–57.
Koenker, R.G. and Bassett, J. (1978). Regression quantiles. Econometrica: Journal of the Econometric Society 46, 33–50.
Koenker, R. W. and d’Orey, V. (1987). Algorithm as 229: Computing regression quantiles. Journal of the Royal Statistical Society. Series C (Applied Statistics) 36, 3, 383–393.
Koenker, R. (2005). Quantile Regression, 38. Cambridge University Press, Cambridge.
Kottas, A. and Gelfand, A. E. (2001). Bayesian semiparametric median regression modeling. Journal of the American Statistical Association 96, 1458–1468.
Kottas, A. and Krnjajić, M. (2009). Bayesian semiparametric modelling in quantile regression. Scandinavian Journal of Statistics 36, 297–319.
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology 46, 1-2, 79–88.
Liu, Y. and Wu, Y. (2009). Stepwise multiple quantile regression estimation using non-crossing constraints. Statistics and its Interface 2, 3, 299–310.
Liu, Y. and Wu, Y. (2011). Simultaneous multiple non-crossing quantile regression estimation using kernel constraints. Journal of Nonparametric Statistics 23, 2, 415–437.
McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society. Series B (Methodological) 42, 2, 109–142.
McCullagh, P. (1984). Generalized linear models. European Journal of Operational Research 16, 3, 285–292.
McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. Chapman & Hall/CRC, London.
Mu, Y. and He, X (2007). Power transformation toward a linear regression quantile. Journal of the American Statistical Association 102, 477, 269–279.
Paz, R. F. d. et al. (2017). Alternative regression models to beta distribution under bayesian approach.
Powell, J. L. (1986). Censored regression quantiles. Journal of Econometrics 32, 1, 143–155.
Tian, Y., Tian, M. and Zhu, Q. (2014). Linear Quantile Regression Based on EM Algorithm. Communications in Statistics - Theory and Methods 43, 16, 3464–3484.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 267–288.
Verkuilen, J. and Smithson, M. (2012). Mixed and mixture regression models for continuous bounded responses using the beta distribution. Journal of Educational and Behavioral Statistics 37, 1, 82–113.
Wichitaksorn, N., Choy, S. and Gerlach, R. (2014). A generalized class of skew distributions and associated robust quantile regression models. Canadian Journal of Statistics 42, 4, 579–596.
Yu, K. and Moyeed, R. (2001). Bayesian quantile regression. Statistics & Probability Letters 54, 437–447.
Zhou, Y.-h., Ni, Z.-x. and Li, Y. (n.d). Quantile Regression via the EM Algorithm. Communications in Statistics - Simulation and Computation (10), 2162–2172.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Details of expectations in EM algorithm
The conditional distribution of the latent variable given the observed data \(f(u_{i}|y_{i},\boldsymbol {\theta }^{(k)})\) will depend on the functional form of h(ui|ν). Table 3 shows the conditional pdf of U given Y for specific choices of h(ui|ν).
In Table 3, \(z_{i}= (y_{i}-\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p})/\sigma \) and \(\mathcal {F}(x|\alpha ,\lambda )\) represents the cdf of a Gamma (α,λ) distribution. Moreover, expressions for a and b are given by \(a = \nu \phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\gamma ^{-1}\sigma ^{2}}{4{\xi _{i}^{2}}}\right )\) and \(b = (1-\nu )\phi \left (y_{i}|\mathbf {x}_{i}^{\top }{\boldsymbol {\beta }}_{\!p},\frac {\sigma ^{2}}{4{\xi _{i}^{2}}}\right ).\) The notation TG(α,λ,t) represents a random variable with Gamma(α,λ) distribution truncated to the right at the value t. Finally, GIG(ν,a,b) denotes the Generalized Inverse Gaussian (GIG) distribution (see Barndorff-Nielsen and Shephard (2001) for more details).
Rights and permissions
About this article
Cite this article
Galarza, C.E., Zhang, P. & Lachos, V.H. Logistic Quantile Regression for Bounded Outcomes Using a Family of Heavy-Tailed Distributions. Sankhya B 83 (Suppl 2), 325–349 (2021). https://doi.org/10.1007/s13571-020-00231-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-020-00231-0