Abstract
Standard regression techniques model only the mean of the response variable. Quantile regression (QR) is more powerful in that it depicts a comprehensive relationship between the response variable and independent covariates at different quantiles. It is particularly useful for non-normally distributed data with skewness or heterogeneity, which appear routinely in many scientific fields, such as economics, finance, public health and biology. Although its theory has been well developed in the literature, its computation in big data still faces multiple challenges, especially for vertically stored big data in modern distributed environments, where communication efficiency and security are usually the primary considerations. While the popular alternating direction method of multipliers (ADMM) provides a general computational solution, its slow convergence becomes a bottleneck when communication cost dominates local computational consumption, such as Internet of Things (IoT) networks. Motivated by the residual projection technique, in this paper we propose an innovative iterative parallel framework, PIQR, that converges faster and has a more secure data transmission plan, and establish its convergence property. This framework is further extended to composite quantile regression (CQR), a modified QR technique that improves estimation efficiency at extreme quantiles. Simulation studies show that both the ADMM-based method and the PIQR enjoy favorable estimation accuracy in distributed environments. While PIQR is inferior to the ADMM-based method at local computation, it requires much fewer iterations to achieve convergence, and hence significantly improves the overall computational efficiency when communication cost is the dominating factor. Moreover, PIQR transmits only data involving the residual information between different machines, and can better prevent the leakage of important data information compared with the ADMM-based method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complex 62:101512
Allen DE, Gerrans P, Powell R, Singh AK (2009) Quantile regression: its application in investment analysis. Finsia J Appl Finance 1(4):7–12
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Briollais L, Durrieu G (2014) Application of quantile regression to recent genetic and -omic studies. Hum Genet 133(8):951–966
Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā Indian J Stat 67(2):399–417
Chen X, Xie M-G (2014) A split-and-conquer approach for analysis of extraordinarily large data. Stat Sin 24(4):1655–1684
Chen L, Zhou Y (2020) Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal 144:106892
Chen S, Billings SA, Luo W (1989) Orthogonal least squares methods and their application to non-linear system identification. Int J Control 50(5):1873–1896
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Ann Stat 47(6):3244–3273
Chen X, Liu W, Mao X, Yang Z (2020) Distributed high-dimensional regression under a quantile loss function. J Mach Learn Res 21(182):1–43
Fitzenberger B, Koenker R, Machado JAF (2013) Economic applications of quantile regression. Physica-Verlag Heidelberg, New York
Gamal ME, Lai L (2015) Are Slepian–Wolf Rates necessary for distributed parameter estimation? In: 2015 53rd annual Allerton conference on communication, control, and computing (Allerton), IEEE. pp 1249–1255
Gu Y, Zou H (2020) Sparse composite quantile regression in ultrahigh dimensions with tuning parameter calibration. IEEE Trans Inf Theory 66(11):7132–7154
He X, Pan X, Tan KM, Zhou WX (2021) Smoothed quantile regression for large-scale inference. J Econom. https://doi.org/10.1016/j.jeconom.2021.07.010
Hu A, Jiao Y, Liu Y, Shi Y, Wu Y (2021) Distributed quantile regression for massive heterogeneous data. Neurocomputing 448:249–262
Huang C, Huo X (2019) A distributed one-step estimator. Math Program 174(1):41–76
Hunter DR, Lange K (2000) Quantile regression via an MM algorithm. J Comput Gr Stat 9(1):60–77
Hunter DR, Lange K (2000) Optimization transfer using surrogate objective functions: rejoinder. J Comput Gr Stat 9(1):52–59
Ivkin N, Rothchild D, Ullah E, Braverman V, Stoica I, Arora R (2019) Communication-efficient distributed SGD with sketching. In: Proceedings of the 33rd conference on neural information processing systems (NeurIPS), pp 1–11
Jiang R, Yu K (2022) Renewable quantile regression for streaming data sets. Neurocomputing 508:208–224
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114(526):668–681
Kibria BG, Joarder AH (2006) A short review of multivariate T-distribution. J Stat Res 40(1):59–72
Koenker R (2017) Quantreg: quantile regression. https://CRAN.R-project.org/package=quantreg
Koenker R (2005) Quantile regression. Cambridge University Press, New York
Koenker R, Bassett JG (1978) Regression quantiles. Econometrica 46(1):33–50
Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492
Lange K, Hunter DR, Yang I (2000) Optimization transfer using surrogate objective functions. J Comput Gr Stat 9(1):1–20
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18(1):115–144
Lin N, Xi R (2011) Aggregated estimating equation estimation. Stat Interface 4(1):73–83
Li A, Sun J, Wang B, Duan L, Li S, Chen Y, Li H (2020) LotteryFL: personalized and communication-efficient federated learning with lottery ticket hypothesis on non-IID datasets. arXiv preprint arXiv:2008.03371
Miao W, Narayanan V, Li J-S (2020) Parallel residual projection: a new paradigm for solving linear inverse problems. Sci Rep 10(1):12846
Pan R, Ren T, Guo B, Li F, Li G, Wang H (2022) A note on distributed quantile regression by pilot sampling and one-step updating. J Bus Econ Stat 40(4):1691–1700
Peng L, Huang Y (2008) Survival analysis with quantile regression models. J Am Stat Assoc 103(482):637–649
Pietrosanu M, Gao J, Kong L, Jiang B, Niu D (2021) Advanced algorithms for penalized quantile and composite quantile regression. Comput Stat 36(1):333–346
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
R Development Core Team (2013) R: a language and environment for statistical computing. http://www.R-project.org
Royen T (1995) On some central and non-central multivariate chi-square distributions. Stat Sin 5:373–397
Sherwood B, Wang L, Zhou X-H (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32(28):4967–4979
Shi L, Ye Y, Chu X, Lu G (2020) Computation bits maximization in a backscatter assisted wirelessly powered MEC network. IEEE Commun Lett 25(2):528–532
Takeuchi I, Le QV, Sears TD, Smola AJ (2006) Nonparametric quantile estimation. J Mach Learn Res 7(45):1231–1264
Tan KM, Battey H, Zhou WX (2022) Communication-constrained distributed quantile regression with optimal statistical guarantees. J Mach Learn Res 23:1–61
Trofimov I, Genkin A (2017) Distributed coordinate descent for generalized linear models with regularization. Pattern Recognit Image Anal 27(2):349–364
Trofimov I, Genkin A (2015) Distributed coordinate descent for L1-regularized logistic regression. In: International conference on analysis of images, social networks and texts, Springer. pp 243–254
Volgushev S, Chao S-K, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Wang H, Li C (2017) Distributed quantile regression over sensor networks. IEEE Trans Signal Inf Process Netw 4(2):338–348
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244
Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 19(2):801–817
Xi R, Lin N, Chen Y (2008) Compression and aggregation for logistic regression analysis in data cubes. IEEE Trans Knowl Data Eng 21(4):479–492
Yang J, Meng X, Mahoney MW (2014) Quantile regression for large-scale applications. SIAM J Sci Comput 36(5):78–110
Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications. ACM Trans Intell Syst Technol 10(2):1–19
Ye Y, Shi L, Chu X, Li D, Lu G (2021) Delay minimization in wireless powered mobile edge computing with hybrid Backcom and AT. IEEE Wirel Commun Lett 10(7):1532
Yu L, Lin N (2017) ADMM for penalized quantile regression in big data. Int Stat Rev 85(3):494–518
Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. J R Stat Soc Ser D 52(3):331–350
Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26(4):935–939
Zheng H, Kulkarni SR, Poor HV (2010) Attribute-distributed learning: models, limits, and algorithms. IEEE Trans Signal Process 59(1):386–398
Zhang Y, Duchi JC, Wainwright MJ (2013a) Communication-efficient algorithms for statistical optimization. J Mach Learn Res 14(1):3321–3363
Zhang Y, Duchi JC, Jordan MI, Wainwright MJ (2013b) Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In: Proceedings of the 26th international conference on neural information processing systems (NIPS), pp 2328–2336
Zou H, Yuan M (2008) Composite quantile regression and the oracle model selection theory. Ann Stat 36(3):1108–1126
Zou Y, Xu J, Gong S, Guo Y, Niyato D, Cheng W (2019) Backscatter-aided hybrid data offloading for wireless powered edge sensor networks. In: 2019 IEEE global communications conference (GLOBECOM). IEEE, pp 1–6
Funding
Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partly supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207]. Jr-Shin Li’s work is supported by the Air Force Office of Scientific Research under the award FA9550-21-1-0335.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this work.
Additional information
Responsible editor: Aristides Gionis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, Y., Li, JS. & Lin, N. Residual projection for quantile regression in vertically partitioned big data. Data Min Knowl Disc 37, 710–735 (2023). https://doi.org/10.1007/s10618-022-00914-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-022-00914-4