{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T01:40:25Z","timestamp":1725241225618},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,6,17]],"date-time":"2021-06-17T00:00:00Z","timestamp":1623888000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,17]],"date-time":"2021-06-17T00:00:00Z","timestamp":1623888000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004434","name":"Universit\u00e0 degli Studi di Firenze","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004434","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Optim Appl"],"published-print":{"date-parts":[[2021,9]]},"abstract":"Abstract<\/jats:title>In this paper, the problem of best subset selection in logistic regression is addressed. In particular, we take into account formulations of the problem resulting from the adoption of information criteria, such as AIC or BIC, as goodness-of-fit measures. There exist various methods to tackle this problem. Heuristic methods are computationally cheap, but are usually only able to find low quality solutions. Methods based on local optimization suffer from similar limitations as heuristic ones. On the other hand, methods based on mixed integer reformulations of the problem are much more effective, at the cost of higher computational requirements, that become unsustainable when the problem size grows. We thus propose a new approach, which combines mixed-integer programming and decomposition techniques in order to overcome the aforementioned scalability issues. We provide a theoretical characterization of the proposed algorithm properties. The results of a vast numerical experiment, performed on widely available datasets, show that the proposed method achieves the goal of outperforming state-of-the-art techniques.<\/jats:p>","DOI":"10.1007\/s10589-021-00288-1","type":"journal-article","created":{"date-parts":[[2021,6,17]],"date-time":"2021-06-17T15:02:51Z","timestamp":1623942171000},"page":"1-32","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["An effective procedure for feature subset selection in logistic regression based on information criteria"],"prefix":"10.1007","volume":"80","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-5322-4831","authenticated-orcid":false,"given":"Enrico","family":"Civitelli","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2488-5486","authenticated-orcid":false,"given":"Matteo","family":"Lapucci","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1160-7572","authenticated-orcid":false,"given":"Fabio","family":"Schoen","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-3971-3441","authenticated-orcid":false,"given":"Alessio","family":"Sortino","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,6,17]]},"reference":[{"issue":"6","key":"288_CR1","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","volume":"19","author":"H Akaike","year":"1974","unstructured":"Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716\u2013723 (1974)","journal-title":"IEEE Trans. Autom. Control"},{"key":"288_CR2","doi-asserted-by":"crossref","unstructured":"Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Selected Papers of Hirotugu Akaike, pp. 199\u2013213. Springer (1998)","DOI":"10.1007\/978-1-4612-1694-0_15"},{"issue":"1","key":"288_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2200000015","volume":"4","author":"F Bach","year":"2012","unstructured":"Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4(1), 1\u2013106 (2012). https:\/\/doi.org\/10.1561\/2200000015","journal-title":"Found. Trends Mach. Learn."},{"issue":"3","key":"288_CR4","doi-asserted-by":"publisher","first-page":"1480","DOI":"10.1137\/120869778","volume":"23","author":"A Beck","year":"2013","unstructured":"Beck, A., Eldar, Y.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23(3), 1480\u20131509 (2013)","journal-title":"SIAM J. Optim."},{"key":"288_CR5","volume-title":"Parallel and distributed computation: numerical methods","author":"DP Bertsekas","year":"1989","unstructured":"Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and distributed computation: numerical methods, vol. 23. Prentice Hall, Englewood Cliffs (1989)"},{"key":"288_CR6","unstructured":"Bertsimas, D., Digalakis\u00a0Jr, V.: The backbone method for ultra-high dimensional sparse machine learning. arXiv:2006.06592 (2020)"},{"key":"288_CR7","doi-asserted-by":"publisher","first-page":"813","DOI":"10.1214\/15-AOS1388","volume":"44","author":"D Bertsimas","year":"2016","unstructured":"Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44, 813\u2013852 (2016)","journal-title":"Ann. Stat."},{"issue":"3","key":"288_CR8","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1214\/16-STS602","volume":"32","author":"D Bertsimas","year":"2017","unstructured":"Bertsimas, D., King, A., et al.: Logistic regression: from art to science. Stat. Sci. 32(3), 367\u2013384 (2017)","journal-title":"Stat. Sci."},{"key":"288_CR9","volume-title":"Pattern Recognition and Machine Learning","author":"CM Bishop","year":"2006","unstructured":"Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006)"},{"issue":"2","key":"288_CR10","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1016\/j.disopt.2006.10.011","volume":"5","author":"P Bonami","year":"2008","unstructured":"Bonami, P., Biegler, L.T., Conn, A.R., Cornu\u00e9jols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., et al.: An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5(2), 186\u2013204 (2008)","journal-title":"Discrete Optim."},{"issue":"1","key":"288_CR11","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1006\/jmps.1999.1277","volume":"44","author":"H Bozdogan","year":"2000","unstructured":"Bozdogan, H.: Akaike\u2019s information criterion and recent developments in information complexity. J. Math. Psychol. 44(1), 62\u201391 (2000)","journal-title":"J. Math. Psychol."},{"key":"288_CR12","doi-asserted-by":"crossref","unstructured":"Burnham, K.P., Anderson, D.R.: Practical use of the information-theoretic approach. In: Model Selection and Inference, pp. 75\u2013117. Springer (1998)","DOI":"10.1007\/978-1-4757-2917-7_3"},{"issue":"2","key":"288_CR13","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1177\/0049124104268644","volume":"33","author":"KP Burnham","year":"2004","unstructured":"Burnham, K.P., Anderson, D.R.: Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33(2), 261\u2013304 (2004)","journal-title":"Sociol. Methods Res."},{"issue":"3","key":"288_CR14","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1007\/s10589-019-00134-5","volume":"74","author":"L Di Gangi","year":"2019","unstructured":"Di Gangi, L., Lapucci, M., Schoen, F., Sortino, A.: An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series. Comput. Optim. Appl. 74(3), 919\u2013948 (2019)","journal-title":"Comput. Optim. Appl."},{"key":"288_CR15","unstructured":"Dua, D., Graff, C.: UCI machine learning repository (2017). http:\/\/archive.ics.uci.edu\/ml"},{"issue":"3","key":"288_CR16","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1007\/BF02592064","volume":"36","author":"MA Duran","year":"1986","unstructured":"Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36(3), 307\u2013339 (1986)","journal-title":"Math. Program."},{"key":"288_CR17","first-page":"191","volume-title":"Mathematical Methods for Digital Computers","author":"MA Efroymson","year":"1960","unstructured":"Efroymson, M.A.: Multiple regression analysis. In: Ralston, A., Wilf, H.S. (eds.) Mathematical Methods for Digital Computers, pp. 191\u2013203. Wiley, New York (1960)"},{"key":"288_CR18","first-page":"1871","volume":"9","author":"R-E Fan","year":"2008","unstructured":"Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871\u20131874 (2008)","journal-title":"J. Mach. Learn. Res."},{"key":"288_CR19","doi-asserted-by":"crossref","unstructured":"G\u00f3mez, A., Prokopyev, O.A.: A mixed-integer fractional optimization approach to best subset selection. INFORMS J. Comput. (2021, accepted)","DOI":"10.1287\/ijoc.2020.1031"},{"key":"288_CR20","unstructured":"Gurobi\u00a0Optimization, L.: Gurobi optimizer reference manual (2020). http:\/\/www.gurobi.com"},{"issue":"2","key":"288_CR21","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1111\/j.2517-6161.1979.tb01072.x","volume":"41","author":"EJ Hannan","year":"1979","unstructured":"Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.) 41(2), 190\u2013195 (1979)","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"288_CR22","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining, Inference, and Prediction","author":"T Hastie","year":"2009","unstructured":"Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2009)"},{"key":"288_CR23","doi-asserted-by":"publisher","DOI":"10.1002\/9781118548387","volume-title":"Applied Logistic Regression","author":"DW Hosmer Jr","year":"2013","unstructured":"Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, Hoboken (2013)"},{"key":"288_CR24","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-7138-7","volume-title":"An Introduction to Statistical Learning","author":"G James","year":"2013","unstructured":"James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, Berlin (2013)"},{"key":"288_CR25","unstructured":"J\u00f6rnsten, K., N\u00e4sberg, M., Smeds, P.: Variable Splitting: A New Lagrangean Relaxation Approach to Some Mathematical Programming Models. LiTH MAT R.: Matematiska Institutionen. University of Link\u00f6ping, Department of Mathematics (1985)"},{"key":"288_CR26","unstructured":"Kamiya, S., Miyashiro, R., Takano, Y.: Feature subset selection for the multinomial logit model via mixed-integer optimization. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1254\u20131263 (2019)"},{"issue":"4","key":"288_CR27","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1109\/JSTSP.2007.910971","volume":"1","author":"S-J Kim","year":"2007","unstructured":"Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale $$\\ell _1$$-regularized least squares. IEEE J. Sel. Top. Signal Process. 1(4), 606\u2013617 (2007)","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"288_CR28","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-71887-3","volume-title":"Information Criteria and Statistical Modeling","author":"S Konishi","year":"2008","unstructured":"Konishi, S., Kitagawa, G.: Information Criteria and Statistical Modeling. Springer, Berlin (2008)"},{"key":"288_CR29","first-page":"401","volume":"6","author":"S-I Lee","year":"2006","unstructured":"Lee, S.-I., Lee, H., Abbeel, P., Ng, A.Y.: Efficient $$\\ell _1$$ regularized logistic regression. AAAI 6, 401\u2013408 (2006)","journal-title":"AAAI"},{"issue":"1\u20133","key":"288_CR30","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1007\/BF01589116","volume":"45","author":"DC Liu","year":"1989","unstructured":"Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1\u20133), 503\u2013528 (1989)","journal-title":"Math. Program."},{"issue":"1","key":"288_CR31","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/s11590-014-0757-3","volume":"9","author":"G Liuzzi","year":"2015","unstructured":"Liuzzi, G., Rinaldi, F.: Solving $$\\ell _0$$-penalized problems with simple constraints via the Frank\u2013Wolfe reduced dimension method. Optim. Lett. 9(1), 57\u201374 (2015)","journal-title":"Optim. Lett."},{"issue":"4","key":"288_CR32","doi-asserted-by":"publisher","first-page":"2448","DOI":"10.1137\/100808071","volume":"23","author":"Z Lu","year":"2013","unstructured":"Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448\u20132478 (2013)","journal-title":"SIAM J. Optim."},{"key":"288_CR33","doi-asserted-by":"publisher","DOI":"10.1201\/9781420035933","volume-title":"Subset Selection in Regression","author":"A Miller","year":"2002","unstructured":"Miller, A.: Subset Selection in Regression. Chapman and Hall\/CRC, Boca Raton (2002)"},{"issue":"3","key":"288_CR34","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1016\/j.ejor.2015.06.081","volume":"247","author":"R Miyashiro","year":"2015","unstructured":"Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721\u2013731 (2015a)","journal-title":"Eur. J. Oper. Res."},{"issue":"1","key":"288_CR35","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1016\/j.eswa.2014.07.056","volume":"42","author":"R Miyashiro","year":"2015","unstructured":"Miyashiro, R., Takano, Y.: Subset selection by Mallows\u2019 Cp: a mixed integer programming approach. Expert Syst. Appl. 42(1), 325\u2013331 (2015b)","journal-title":"Expert Syst. Appl."},{"key":"288_CR36","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011)","journal-title":"J. Mach. Learn. Res."},{"issue":"3","key":"288_CR37","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1007\/s10589-008-9202-9","volume":"46","author":"F Rinaldi","year":"2010","unstructured":"Rinaldi, F., Schoen, F., Sciandrone, M.: Concave programming for minimizing the zero-norm over polyhedral sets. Comput. Optim. Appl. 46(3), 467\u2013486 (2010)","journal-title":"Comput. Optim. Appl."},{"issue":"3","key":"288_CR38","doi-asserted-by":"publisher","first-page":"865","DOI":"10.1007\/s10589-016-9832-2","volume":"64","author":"T Sato","year":"2016","unstructured":"Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl. 64(3), 865\u2013880 (2016)","journal-title":"Comput. Optim. Appl."},{"issue":"2","key":"288_CR39","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1214\/aos\/1176344136","volume":"6","author":"G Schwarz","year":"1978","unstructured":"Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461\u2013464 (1978)","journal-title":"Ann. Stat."},{"issue":"5","key":"288_CR40","doi-asserted-by":"publisher","first-page":"807","DOI":"10.1007\/s10463-012-0396-3","volume":"65","author":"X Shen","year":"2013","unstructured":"Shen, X., Pan, W., Zhu, Y., Zhou, H.: On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 65(5), 807\u2013832 (2013)","journal-title":"Ann. Inst. Stat. Math."},{"issue":"1","key":"288_CR41","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267\u2013288 (1996)","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"issue":"3","key":"288_CR42","doi-asserted-by":"publisher","first-page":"475","DOI":"10.1023\/A:1017501703105","volume":"109","author":"P Tseng","year":"2001","unstructured":"Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475\u2013494 (2001)","journal-title":"J. Optim. Theory Appl."},{"key":"288_CR43","doi-asserted-by":"publisher","unstructured":"Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S.J., Brett, M., Wilson, J., Millman, K. Jarrod, Mayorov, N., Nelson, A.R.J., Jones, E., Kern, R., Larson, E., Carey, C., Polat, I., Feng, Y., Moore, E.W., Vander Plas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E.A., Harris, C.R., Archibald, A.M., Ribeiro, A.H., Pedregosa, F., van Mulbregt, P.: Fundamental algorithms for scientific computing in python and SciPy 1.0 contributors. SciPy 1.0. Nat. Methods 17, 261\u2013272 (2020). https:\/\/doi.org\/10.1038\/s41592-019-0686-2","DOI":"10.1038\/s41592-019-0686-2"},{"issue":"Mar","key":"288_CR44","first-page":"1439","volume":"3","author":"J Weston","year":"2003","unstructured":"Weston, J., Elisseeff, A., Sch\u00f6lkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3(Mar), 1439\u20131461 (2003)","journal-title":"J. Mach. Learn. Res."},{"issue":"64","key":"288_CR45","first-page":"1999","volume":"13","author":"G-X Yuan","year":"2012","unstructured":"Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for L1-regularized logistic regression. J. Mach. Learn. Res. 13(64), 1999\u20132030 (2012)","journal-title":"J. Mach. Learn. Res."},{"issue":"3","key":"288_CR46","doi-asserted-by":"publisher","first-page":"627","DOI":"10.1111\/rssb.12037","volume":"76","author":"Z Zheng","year":"2014","unstructured":"Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 76(3), 627\u2013649 (2014)","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"}],"container-title":["Computational Optimization and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10589-021-00288-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10589-021-00288-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10589-021-00288-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,2]],"date-time":"2024-09-02T01:27:36Z","timestamp":1725240456000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10589-021-00288-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,17]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,9]]}},"alternative-id":["288"],"URL":"https:\/\/doi.org\/10.1007\/s10589-021-00288-1","relation":{},"ISSN":["0926-6003","1573-2894"],"issn-type":[{"type":"print","value":"0926-6003"},{"type":"electronic","value":"1573-2894"}],"subject":[],"published":{"date-parts":[[2021,6,17]]},"assertion":[{"value":"3 October 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 June 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 June 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no confict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Confict of interest"}}]}}