Abstract
This paper concerns a class of group-lasso learning problems where the objective function is the sum of an empirical loss and the group-lasso penalty. For a class of loss function satisfying a quadratic majorization condition, we derive a unified algorithm called groupwise-majorization-descent (GMD) for efficiently computing the solution paths of the corresponding group-lasso penalized learning problem. GMD allows for general design matrices, without requiring the predictors to be group-wise orthonormal. As illustration examples, we develop concrete algorithms for solving the group-lasso penalized least squares and several group-lasso penalized large margin classifiers. These group-lasso models have been implemented in an R package gglasso publicly available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/web/packages/gglasso. On simulated and real data, gglasso consistently outperforms the existing software for computing the group-lasso that implements either the classical groupwise descent algorithm or Nesterov’s method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. 96(12), 6745 (1999)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–451 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: Regularized paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Gr. Stat. 7(3), 397–416 (1998)
Genkin, A., Lewis, D., Madigan, D.: Large-scale Bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)
Gorman, R., Sejnowski, T.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw. 1(1), 75–89 (1988)
Graham, K., de Las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone, M., Slama, J., Miller, M., Antoine, G., et al.: Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile. Br. J. Cancer 102(8), 1284–1293 (2010)
Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58(1), 30–37 (2004)
Lange, K., Hunter, D., Yang, I.: Optimization transfer using sur- rogate objective functions (with discussion). J. Comput. Gr. Stat. 9, 1–20 (2000)
Liu, J., Ji, S., Ye, J.: SLEP: Sparse Learning with Efficient Projections, Arizona State University. URL: http://www.public.asu.edu/jye02/Software/SLEP (2009)
Meier, L., van de Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)
Nesterov, Y.: ‘Introductory lectures on convex optimization: A basic course’, Operations Research. (2004)
Nesterov, Y.: Gradient methods for minimizing composite objective function, Technical report, Technical Report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL) (2007)
Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
Quinlan, J.: Combining instance-based and model-based learning. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243 (1993)
Sæbø, S., Almøy, T., Aarøe, J., Aastveit, A.: St-pls: a multi-directional nearest shrunken centroid type classifier via pls. J. Chemom. 22(1), 54–62 (2008)
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., et al.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Nat. Acad. Sci. 103(39), 14429–14434 (2006)
Segal, M., Dahlquist, K., Conklin, B.: Regression approaches for microarray data analysis. J. Comput. Biol. 10(6), 961–980 (2003)
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., Richie, J., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., Tibshirani, R.: Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B 74, 245–266 (2012)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Vogt, J., Roth, V.: A complete analysis of the \(l_{1, p}\) group-lasso. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML 2012, Omnipress, pp 185–192 (2012)
Wang, H., Leng, C.: A note on adaptive group lasso. Comput. Stat. Data Anal. 52, 5277–5286 (2008)
Wang, L., Zhu, J., Zou, H.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24, 412–419 (2008)
Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Wu, T., Lange, K.: The MM alternative to EM. Stat. Sci. 4, 492–505 (2010)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Acknowledgments
The authors thank the editor, an associate editor and two referees for their helpful comments and suggestions. This work is supported in part by NSF Grant DMS-08-46068.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
Proof of Lemma 1
Part (1). For any \(\varvec{\beta }\) and \(\varvec{\beta }^*\), write \(\varvec{\beta }-\varvec{\beta }^*=V\) and define \(g(t)=L(\varvec{\beta }^*+tV \mid \mathbf{D})\) so that
By the mean value theorem, \(\exists \ a \in (0,1)\) such that
Write \(\Phi _f^{\prime }=\frac{\partial {\Phi (y,f)}}{\partial {f}}\). Note that
Thus \(g^{\prime }(0)=(\varvec{\beta }-\varvec{\beta }^*)^{{ \mathsf T}}\nabla L(\varvec{\beta }^* | \mathbf{D}).\) Moreover, from (26) we have
where in (27) we have used the inequality \(| \Phi ^{\prime }(y,f_1)-\Phi ^{\prime }(y,f_2)| \le C |f_1-f_2|\). Plugging (27) into (25) we have
with \( \mathbf{H}=\frac{2C}{n}\mathbf{X}^{{ \mathsf T}} \varvec{\Gamma }\mathbf{X}. \)
Part (2). Write \(\Phi _f^{\prime \prime }=\frac{\partial {\Phi ^2(y,f)}}{\partial {f^2}}\). By Taylor’s expansion, \(\exists \ b \in (0,1)\) such that
Note that
where we have used the inequality \(\Phi _f^{\prime \prime } \le C_2\). Plugging (30) into (29) we have
with \( \mathbf{H}=\frac{C_2}{n}\mathbf{X}^{{ \mathsf T}} \varvec{\Gamma }\mathbf{X}. \) \(\square \)
Rights and permissions
About this article
Cite this article
Yang, Y., Zou, H. A fast unified algorithm for solving group-lasso penalize learning problems. Stat Comput 25, 1129–1141 (2015). https://doi.org/10.1007/s11222-014-9498-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9498-5