Abstract
Motivated by modern regression applications, in this paper, we study the convexification of a class of convex optimization problems with indicator variables and combinatorial constraints on the indicators. Unlike most of the previous work on convexification of sparse regression problems, we simultaneously consider the nonlinear non-separable objective, indicator variables, and combinatorial constraints. Specifically, we give the convex hull description of the epigraph of the composition of a one-dimensional convex function and an affine function under arbitrary combinatorial constraints. As special cases of this result, we derive ideal convexifications for problems with hierarchy, multi-collinearity, and sparsity constraints. Moreover, we also give a short proof that for a separable objective function, the perspective reformulation is ideal independent from the constraints of the problem. Our computational experiments with sparse regression problems demonstrate the potential of the proposed approach in improving the relaxation quality without significant computational overhead.



Similar content being viewed by others
References
Aktürk, M.S., Atamtürk, A., Gürel, S.: A strong conic quadratic reformulation for machine-job assignment with controllable processing times. Oper. Res. Lett. 37(3), 187–191 (2009)
Angulo, G., Ahmed, S., Dey, S.S., Kaibel, V.: Forbidden vertices. Math. Oper. Res. 40(2), 350–360 (2015)
Anstreicher, K.M.: On convex relaxations for quadratically constrained quadratic programming. Math. Program. 136(2), 233–251 (2012)
Atamtürk, A., Gómez, A.: Strong formulations for quadratic optimization with M-matrices and indicator variables. Math. Program. 170(1), 141–176 (2018)
Atamtürk, A., Gómez, A.: Rank-one convexification for sparse regression. Optimization Online. http://www.optimization-online.org/DB_HTML/2019/01/7050.html. (2019)
Atamtürk, A., Gómez, A., Han, S.: Sparse and smooth signal estimation: convexification of L0 formulations. J. Mach. Learn. Res. 3, 1–43 (2021)
Bacci, T., Frangioni, A., Gentile, C., Tavlaridis-Gyparakis, K.: New MINLP formulations for the unit commitment problems with ramping constraints. Optimization Online. http://www.optimization-online.org/DB_FILE/2019/10/7426.pdf. (2019)
Belotti, P., Góez, J.C., Pólik, I., Ralphs, T.K., Terlaky, T.: A conic representation of the convex hull of disjunctive sets and conic cuts for integer second order cone optimization. In: Numerical Analysis and Optimization, pp. 1–35. Springer (2015)
Bertsimas, D., Cory-Wright, R., Pauphilet, J.: Mixed-projection conic optimization: A new paradigm for modeling rank constraints. arXiv preprint arXiv:2009.10395 (2020a)
Bertsimas, D., King, A.: OR Forum - An algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2016)
Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Bertsimas, D., Pauphilet, J., Van Parys, B., et al.: Sparse regression: scalable algorithms and empirical performance. Stat. Sci. 35(4), 555–578 (2020b)
Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Ann. Statist. 1, 300–323 (2020)
Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. Ann. Stat. 41(3), 1111 (2013)
Bienstock, D., Michalka, A.: Cutting-planes for optimization of convex functions over nonconvex sets. SIAM J. Opt. 24(2), 643–677 (2014)
Burer, S.: On the copositive representation of binary and continuous nonconvex quadratic programs. Math. Program. 120(2), 479–495 (2009)
Burer, S., Kılınç-Karzan, F.: How to convexify the intersection of a second order cone and a nonconvex quadratic. Math. Program. 162(1–2), 393–429 (2017)
Carrizosa, E., Mortensen, L., Morales, D.R.: On linear regression models with hierarchical categorical variables. Tech. rep. (2020)
Ceria, S., Soares, J.: Convex programming for disjunctive convex optimization. Math. Program. 86, 595–614 (1999)
Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60(6), 2211–2227 (2014)
Cozad, A., Sahinidis, N.V., Miller, D.C.: A combined first-principles and data-driven approach to model building. Comput. Chem. Eng. 73, 116–127 (2015)
Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: Continuous and mixed integer optimization perspectives. J. Mach. Learn. Res. 15, 1–4 (2021)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017)
Dong, H.: On integer and MPCC representability of affine sparsity. Oper. Res. Lett. 47(3), 208–212 (2019)
Dong, H., Ahn, M., Pang, J.-S.: Structural properties of affine sparsity constraints. Math. Program. 176(1–2), 95–135 (2019)
Dong, H., Chen, K., Linderoth, J.: Regularization vs. relaxation: A conic optimization perspective of statistical variable selection. arXiv preprint arXiv:1510.06083 (2015)
Dong, H., Linderoth, J.: On valid inequalities for quadratic programming with continuous variables and binary indicators. In: Goemans, M., Correa, J. (eds.) Integer Programming and Combinatorial Optimization, pp. 169–180. Springer, Berlin (2013)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Frangioni, A., Furini, F., Gentile, C.: Approximated perspective relaxations: a project and lift approach. Comput. Opt. Appl. 63(3), 705–735 (2016)
Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer programs. Math. Program. 106, 225–236 (2006)
Frangioni, A., Gentile, C.: SDP diagonalizations and perspective cuts for a class of nonseparable MIQP. Oper. Res. Lett. 35(2), 181–185 (2007)
Frangioni, A., Gentile, C.: A computational comparison of reformulations of the perspective relaxation: SOCP vs. cutting planes. Oper. Res. Lett. 37(3), 206–210 (2009)
Frangioni, A., Gentile, C., Grande, E., Pacifici, A.: Projected perspective reformulations with applications in design problems. Oper. Res. 59(5), 1225–1232 (2011)
Frangioni, A., Gentile, C., Hungerford, J.: Decompositions of semidefinite matrices and the perspective reformulation of nonseparable quadratic programs. Math. Oper. Res. 45(1), 15–33 (2020)
Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear programs with indicator variables. Math. Program. 124, 183–205 (2010)
Han, S., Gómez, A., Atamtürk, A.: 2x2 convexifications for convex quadratic optimization with indicator variables. arXiv preprint arXiv:2004.07448 (2020)
Hardy, G.H.: Course of Pure Mathematics. Courier Dover (1908). (Publications)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical learning with sparsity: the lasso and generalizations. In: Monographs on Statistics and Applied Probability, vol. 143. Chapman and Hall/CRC (2015)
Hazimeh, H., Mazumder, R.: Learning hierarchical interactions at scale: a convex optimization approach. In: Chiappa, S. and Calandra, R., editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp. 1833–1843. PMLR
Hazimeh, H., Mazumder, R., Saab, A.: Sparse regression at scale: branch-and-bound rooted in first-order optimization. arXiv preprint arXiv:2004.06152 (2020)
Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring “on/off” constraints. Comput. Opt. Appl. 52(2), 537–558 (2012)
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. Rev. J. Inst. Math. Stat. 27(4),(2012)
Jeon, H., Linderoth, J., Miller, A.: Quadratic cone cutting surfaces for quadratic programs with on-off constraints. Dis. Opt. 24, 32–50 (2017)
Kılınç-Karzan, F., Yıldız, S.: Two-term disjunctions on the second-order cone. In: International Conference on Integer Programming and Combinatorial Optimization, pp. 345–356. Springer. (2014)
Küçükyavuz, S., Shojaie, A., Manzour, H., Wei, L.: Consistent second-order conic integer programming for learning Bayesian networks. arXiv preprint arXiv:2005.14346 (2020)
Manzour, H., Küçükyavuz, S., Wu, H.-H., Shojaie, A.: Integer programming for learning directed acyclic graphs from continuous data. INFORMS J. Opt. 3(1), 46–73 (2021)
Miller, A.: Subset selection in regression. Chapman and Hall/CRC (2002)
Modaresi, S., Kılınç, M.R., Vielma, J.P.: Intersection cuts for nonlinear integer programming: convexification techniques for structured sets. Math. Program. 155(1–2), 575–611 (2016)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Pilanci, P., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. 151, 63–87 (2015)
Richard, J.-P.P., Tawarmalani, M.: Lifting inequalities: a framework for generating strong cuts for nonlinear programs. Math. Program. 121(1), 61–104 (2010)
Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Opt. Appl. 64(3), 865–880 (2016)
Stubbs, R.A., Mehrotra, S.: A branch-and-cut method for 0–1 mixed convex programming. Math. Program. 86(3), 515–532 (1999)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol., pp. 267–288 (1996)
Vielma, J.P.: Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Program. 177(1–2), 21–53 (2019)
Wang, A. L., Kılınç-Karzan, F.: The generalized trust region subproblem: solution complexity and convex hull results. Forthcoming in Math. Program. (2020a)
Wang, A.L., Kılınç-Karzan, F.: On convex hulls of epigraphs of QCQPs. In: Bienstock, D., Zambelli, G. (eds.) Integer Programming and Combinatorial Optimization, pp. 419–432. Cham. Springer International Publishing (2020b)
Wang, A. L., Kılınç-Karzan, F.: On the tightness of SDP relaxations of QCQPs. Forthcoming in Math. Program. (2021)
Wei, L., Gómez, A., Küçükyavuz, S.: On the convexification of constrained quadratic optimization problems with indicator variables. In: Bienstock, D., Zambelli, G. (eds.) Integer Programming and Combinatorial Optimization, pp. 433–447. Cham. Springer International Publishing (2020)
Wu, B., Sun, X., Li, D., Zheng, X.: Quadratic convex reformulations for semicontinuous quadratic programming. SIAM J. Opt. 27(3), 1531–1553 (2017)
Xie, W., Deng, X.: Scalable algorithms for the sparse ridge regression. SIAM J. Opt. 30(4), 3359–3386 (2020)
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zheng, X., Sun, X., Li, D.: Improving the performance of MIQP solvers for quadratic programs with cardinality and minimum threshold constraints: a semidefinite program approach. INFORMS J. Comput. 26(4), 690–703 (2014)
Acknowledgements
We thank the AE and two referees whose comments expanded and improved our computational study, and also led to the result in Appendix 1.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported, in part, by ONR grant N00014-19-1-2321, and NSF grants 1818700, 2006762, and 2007814.
A preliminary version of this work appeared in [60]
A The special case when \({{\,\mathrm{conv}\,}}(Q)\) is compact
A The special case when \({{\,\mathrm{conv}\,}}(Q)\) is compact
In this section, we give an extended formulation of \({{\,\mathrm{cl \ conv}\,}}(Z_Q)\) based on an extended formulation of \({{\,\mathrm{conv}\,}}(Q_0)\). In particular, this alternative formulation is more favorable in cases when the number of facets of \({{\,\mathrm{conv}\,}}(Q)\) is polynomially bounded while \({{\,\mathrm{conv}\,}}(Q_0)\) has an exponential number of facets. We denote the facets of \({{\,\mathrm{conv}\,}}(Q)\) which do not contain zero by \(\{F_\ell \}_{1 \le \ell \le k}\), and we write each \(F_\ell \) as \(F_\ell := \{z \; | \; A_\ell z \le b_\ell \}\). Angulo et al. [2] prove that \({{\,\mathrm{conv}\,}}(Q_0) = {{\,\mathrm{conv}\,}}\left( \bigcup _{1 \le \ell \le k} F_\ell \right) \), and a natural extended formulation of \({{\,\mathrm{conv}\,}}(Q_0)\) is as follows:
Theorem 4
Proof Let
First we show that \({{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z) \subseteq {{\,\mathrm{cl \ conv}\,}}(Z_Q)\). Given any \((z,{{\hat{z}}}, \lambda , \beta ,t)\in Z\), note that constraints \(z\in {{\,\mathrm{conv}\,}}(Q)\) and \(t \ge f(\mathbf{1 }^\top \beta )\) defining \( {{\,\mathrm{cl \ conv}\,}}(Z_Q)\) are trivially satisfied. It remains to show that \(t \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\). For each \(\pi \in {\mathcal {F}}\), we have
where the inequality follows from the fact that we must have either \(\lambda _\ell = 0\) and \({{\hat{z}}}_\ell = \mathbf{0 }\) or \(\lambda _\ell > 0\) and \(\frac{{{\hat{z}}}_\ell }{\lambda _\ell } \in F_\ell \) since each \(F_\ell \) is a polytope contained in the half-space defined by inequality \(\mathbf{1 }^\top z\ge 1\). Thus, from Lemma 1, we have \(t \ge (\mathbf{1 }^\top \lambda )f\left( \frac{\mathbf{1 }^\top \beta }{\mathbf{1 }^\top \lambda }\right) \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\), hence \({{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z) \subseteq {{\,\mathrm{cl \ conv}\,}}(Z_Q)\).
Now, it remains to prove that \({{\,\mathrm{cl \ conv}\,}}(Z_Q) \subseteq {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). For any \((z , \beta , t) \in {{\,\mathrm{cl \ conv}\,}}(Z_Q)\) if \(z \in {{\,\mathrm{conv}\,}}(Q_0)\), then there exist \({{\hat{z}}}_\ell \) and \(\lambda _\ell \) that satisfy (32) and \(\mathbf{1} ^\top \lambda = 1\). Since \(t \ge f(\mathbf{1 }^\top \beta )\) for all \((z , \beta , t) \in {{\,\mathrm{cl \ conv}\,}}(Z_Q)\), \((z, \beta , t) \in {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). If \(z \in {{\,\mathrm{conv}\,}}(Q)\backslash {{\,\mathrm{conv}\,}}(Q_0)\), then, from Lemma 2, we can write z as \(z = \lambda _0 z_0\), \(0 \le \lambda _0 < 1\), and we may assume \(z_0\) is on one of the facets of \({{\,\mathrm{conv}\,}}(Q_0)\) defined by \({{\hat{\pi }}}^\top z_0 = 1\) for some \({{\hat{\pi }}} \in {\mathcal {F}}\). By definition, \(\forall \pi \in {\mathcal {F}} \; \;\) \(\pi ^\top z_0 \ge {{\hat{\pi }}}^\top z_0 = 1\) which implies \(\lambda _0 = {{\hat{\pi }}}^\top z = \min _{\pi \in {\mathcal {F}}} \pi ^\top z\). Since \(z_0 \in {{\,\mathrm{conv}\,}}(Q_0)\), there exists \({{\hat{z}}}_\ell , \lambda _\ell \) such that \(z_0 = \sum _{\ell \in [k]} {{\hat{z}}}_\ell \) and (32b)–(32c) hold. Then
and we have \(\sum _{\ell \in [k]} \lambda _0 \lambda _\ell = \lambda _0 = \min _{\pi \in {\mathcal {F}}} \pi ^\top z\). Using Lemma 1, we find that \(t \ge (\pi ^\top z)f\left( \frac{\mathbf{1 }^\top \beta }{\pi ^\top z}\right) ,\; \; \forall \pi \in {\mathcal {F}}\) implies that \(t \ge (\sum _{\ell \in [k]} \lambda _0 \lambda _\ell ) f\left( \frac{\mathbf{1 }^\top \beta }{\sum _{\ell \in [k]} \lambda _0 \lambda _\ell }\right) \ge (\sum _{\ell \in [k]} \lambda _\ell ) f\left( \frac{\mathbf{1 }^\top \beta }{\sum _{\ell \in [k]} \lambda _\ell }\right) \). Hence, \((z, \beta , t) \in {{\,\mathrm{proj}\,}}_{(z,\beta ,t)}(Z)\). \(\square \)
Rights and permissions
About this article
Cite this article
Wei, L., Gómez, A. & Küçükyavuz, S. Ideal formulations for constrained convex optimization problems with indicator variables. Math. Program. 192, 57–88 (2022). https://doi.org/10.1007/s10107-021-01734-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01734-y