Abstract
This paper addresses the sparse principal component analysis (SPCA) problem for covariance matrices in dimension n aiming to find solutions with sparsity k using mixed integer optimization. We propose a tailored branch-and-bound algorithm, Optimal-SPCA, that enables us to solve SPCA to certifiable optimality in seconds for \(n = 100\) s, \(k=10\) s. This same algorithm can be applied to problems with \(n=10{,}000\,\mathrm{s}\) or higher to find high-quality feasible solutions in seconds while taking several hours to prove optimality. We apply our methods to a number of real data sets to demonstrate that our approach scales to the same problem sizes attempted by other methods, while providing superior solutions compared to those methods, explaining a higher portion of variance and permitting complete control over the desired sparsity. The software that was reviewed as part of this submission has been given the DOI (digital object identifier) https://doi.org/10.5281/zenodo.2027898.



Similar content being viewed by others
References
Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. In: IEEE International Symposium on Information Theory, pp. 2454–2458. IEEE (2008)
Asteris, M., Papailiopoulos, D., Kyrillidis, A., Dimakis, A.G.: Sparse PCA via bipartite matchings. In: Advances in Neural Information Processing Systems, pp. 766–774 (2015)
Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)
Beck, A., Vaisbourd, Y.: The sparse principal component analysis problem: optimality conditions and algorithms. J0 Optim. Theory Appl. 170(1), 119–143 (2016)
Bennett, K.P., Parrado-Hernández, E.: The interplay of optimization and machine learning research. J. Mach. Learn. Res. 7, 1265–1281 (2006)
Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270, 931–942 (2017)
Bertsimas, D., Copenhaver, M.S., Mazumder, R.: Certifiably optimal low rank factor analysis. J. Mach. Learn. Res. 18(29), 1–53 (2017)
Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 64(1), 1–44 (2017)
Bertsimas, D., King, A.: An algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)
Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
Bixby, R.E.: A brief history of linear and mixed-integer programming computation. Doc. Math. Extra Volume: Optimization Stories, 107–121 (2012)
Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11 (2011)
Carrizosa, E., Guerrero, V.: rs-Sparse principal component analysis: a mixed integer nonlinear programming approach with VNS. Comput. Oper. Res. 52, 349–354 (2014)
Chamberlain, G., Rothschild, M.J.: Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51, 1281–1304 (1983)
Chan, S.O., Papailiopoulos, D., Rubinstein, A.: On the worst-case approximability of sparse PCA. arXiv preprint arXiv:1507.05950 (2015)
Chen, Y., Jalali, A., Sanghavi, S., Xu, H.: Clustering partially observed graphs via convex optimization. J. Mach. Learn. Res. 15(1), 2213–2238 (2014)
Computing, J.: Julia micro-benchmarks (2018). https://julialang.org/benchmarks/
d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)
d’Aspremont, A., El Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)
Deluzio, K., Astephen, J.: Biomechanical features of gait waveform data associated with knee osteoarthritis: an application of principal component analysis. Gait Posture 25(1), 86–93 (2007)
Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the twenty-first international conference on Machine learning, Banff, Alberta, Canada, 04–08 July 2004, p. 29. ACM, New York (2004). https://doi.org/10.1145/1015330.1015408
Du, Q., Fowler, J.E.: Hyperspectral image compression using jpeg2000 and principal component analysis. IEEE Geosci. Remote Sens. Lett. 4(2), 201–205 (2007)
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017). https://doi.org/10.1137/15M1020575
Gurobi Optimization Inc.: Gurobi 7.0 performance benchmarks. http://www.gurobi.com/pdfs/benchmarks.pdf (2015). Accessed 17 Dec 2016
Gurobi Optimization Inc.: Gurobi optimizer reference manual (2017). http://www.gurobi.com
Hand, D.J., Daly, F., McConway, K., Lunn, D., Ostrowski, E.: A Handbook of Small Data Sets, vol. 1. CRC Press, Boca Raton (1993)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in 1-spectral clustering and sparse PCA. In: Advances in Neural Information Processing Systems, pp. 847–855 (2010)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Hsu, Y.L., Huang, P.Y., Chen, D.T.: Sparse principal component analysis in cancer research. Transl. Cancer Res. 3(3), 182 (2014)
IBM: IBM ILOG CPLEX User’s manual (2017). https://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/
Iezzoni, A.F., Pritts, M.P.: Applications of principal component analysis to horticultural research. HortScience 26(4), 334–338 (1991)
Iguchi, T., Mixon, D.G., Peterson, J., Villar, S.: Probably certifiably correct k-means clustering. Math. Program. 165(2), 605–642 (2017)
Jeffers, J.N.: Two case studies in the application of principal component analysis. Appl. Stat. 16(3), 225–236 (1967)
Jolliffe, I.T.: Rotation of principal components: choice of normalization constraints. J. Appl. Stat. 22(1), 29–35 (1995)
Jolliffe, I.T.: Principal Component Analysis. Wiley, London (2002)
Jolliffe, I.T., Trendafilov, N.T., Uddin, M.: A modified principal component technique based on the LASSO. J. Comput. Graph. Stat. 12(3), 531–547 (2003)
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
Kaiser, H.F.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3), 187–200 (1958)
Kumar, V., Kanal, L.N.: Parallel branch-and-bound formulations for and/or tree search. IEEE Trans. Pattern Anal. Mach. Intell. 42(6), 768–778 (1984)
Labib, K., Vemuri, V.R.: An application of principal component analysis to the detection and visualization of computer network attacks. Annales des Telecommunications/Ann. Telecommun. 61(1–2), 218–234 (2006)
Land, A.H., Doig, A.G.: An automatic method of solving discrete programming problems. Econometrica 28, 497–520 (1960)
Lee, S., Epstein, M.P., Duncan, R., Lin, X.: Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genet. Epidemiol. 36(4), 293–302 (2012)
Lee, Y.K., Lee, E.R., Park, B.U.: Principal component analysis in very high-dimensional spaces. Stat. Sin. 22(1), 933–956 (2012)
Leng, C., Wang, H.: On general adaptive sparse principal component analysis. J. Comput. Graph. Stat. 18(1), 201–215 (2009)
Li, G.J., Wah, B.W.: Coping with anomalies in parallel branch-and-bound algorithms. IEEE Trans. Comput. 100(6), 568–573 (1986)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Lougee-Heimer, R.: The common optimization interface for operations research. IBM J. Res. Dev. 47(1), 57–66 (2003)
Luss, R., Teboulle, M.: Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Rev. 55(1), 65–98 (2013)
Ma, Z., et al.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41(2), 772–801 (2013)
Mangasarian, O.L.: Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J. Mach. Learn. Res. 7, 1517–1530 (2006)
Mazumder, R., Radchenko, P., Dedieu, A.: Subset selection with shrinkage: sparse linear modeling when the snr is low. arXiv preprint arXiv:1708.03288 (2017)
Moghaddam, B., Weiss, Y., Avidan, S.: Spectral bounds for sparse PCA: Exact and greedy algorithms. In: Advances in Neural Information Processing Systems, pp. 915–922 (2005)
Nemhauser, G.L.: Integer Programming: the Global Impact. Presented at EURO, INFORMS, Rome, Italy, 2013. http://euro-informs2013.org/data/http_/euro2013.org/wp-content/uploads/nemhauser.pdf (2013). Accessed 9 Sept 2015
Papailiopoulos, D.S., Dimakis, A.G., Korokythakis, S.: Sparse PCA through low-rank approximations. ICML 3, 747–755 (2013)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)
Richman, M.B.: Rotation of principal components. J. Climatol. 6(3), 293–335 (1986)
Richtárik, P., Takáč, M., Ahipaşaoğlu, S.D.: Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. arXiv preprint arXiv:1212.4137 (2012)
Scott, D.S.: On the accuracy of the Gerschgorin circle theorem for bounding the spread of a real symmetric matrix. Linear Algebra Appl. 65, 147–155 (1985)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2960–2968 (2012)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)
Top500 Supercomputer Sites: performance development. http://www.top500.org/statistics/perfdevel/ (2016). Accessed 17 Dec 2016
Wilkinson, J.H.: The Algebraic Eigenvalue Problem, vol. 87. Clarendon Press, Oxford (1965)
Witten, D., Tibshirani, R., Hastie, T.: A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10(3), 515–534 (2009)
Witten, D.M., Tibshirani, R.J.: Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8(1), 1–27 (2009)
Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation—an empirical study. J. Mach. Learn. Res. 7, 1887–1907 (2006)
Yuan, X.T., Zhang, T.: Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14, 899–925 (2013)
Zeng, Z.Q., Yu, H.B., Xu, H.R., Xie, Y.Q., Gao, J.: Fast training support vector machines using parallel sequential minimal optimization. In: 3rd International Conference on Intelligent System and Knowledge Engineering, 2008, vol. 1, pp. 997–1001. ISKE 2008. IEEE (2008)
Zhang, Y., Ghaoui, L.E.: Large-scale sparse principal component analysis with application to text data. In: Advances in Neural Information Processing Systems, vol. 24, pp. 532–539 (2011)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Overview of the Optimal-SPCA implementation in Julia
A Overview of the Optimal-SPCA implementation in Julia
The linked repository contains an implementation of Optimal-SPCA written in Julia 0.6.0. The latest version of this software is available on GitHub at https://github.com/lauren897/Optimal-SPCA. The Algorithm directory contains the Julia files that comprise the algorithm, and the Data directory contains an example dataset.
In order to run this software, you must install a recent version of Julia from http://julialang.org/downloads/. The most recent version of Julia at the time this code was last tested before publication was Julia 0.6.0.
Two packages must be installed in Julia before the code can be run. These packages are DataFrames, and StatsBase. They can be added by running Pkg.add(“DataFrames”) and Pkg.add(“StatsBase”) respectively.
At this point, the file test.jl should run successfully. To run the script, navigate to the Algorithm directory, and run include(“test.jl”). The script will run Optimal-SPCA on the Pitprops dataset, and then generate an additional random problem and run the algorithm on that problem. It will then identify the first few sparse principal components using Optimal-SPCA sequentially and reporting the cumulative variance explained.
The key method used in the algorithm is is branchAndBound. It takes two required arguments: prob, and k. The variable prob uses a custom type that holds the original data as well as the covariance matrix associated with the problem. (If data is not available, the Cholesky factorization of the covariance matrix will suffice.) The data is presented in an \(m \times n\) array, with \(m\) data points in \(n\) dimensions. The corresponding covariance matrix is \(n \times n\). The parameter k is a positive integer less than \(n\) and represents the desired sparsity.
By default, branchAndBound solves the problem and returns the objective function value, solution vector, and a few performance metrics, including time elapsed and the number of nodes explored. There are many optional parameters, some of which are discussed in detail in our paper. Other parameters have to do with technical aspects of the algorithm, like convergence criteria and resizing arrays. These are commented on in detail in the branchAndBound.jl file where the function is defined.
Rights and permissions
About this article
Cite this article
Berk, L., Bertsimas, D. Certifiably optimal sparse principal component analysis. Math. Prog. Comp. 11, 381–420 (2019). https://doi.org/10.1007/s12532-018-0153-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12532-018-0153-6
Keywords
- Sparse principal component analysis
- Principal component analysis
- Mixed integer optimization
- Sparse eigenvalues