Abstract
This paper is concerned with a store-choice model for investigating consumers’ store-choice behavior based on scanner panel data. Our store-choice model enables us to evaluate the effects of the consumer/product attributes not only on the consumer’s store choice but also on his/her purchase quantity. Moreover, we adopt a mixed-integer optimization (MIO) approach to selecting the best set of explanatory variables with which to construct the store-choice model. We devise two MIO models for hierarchical variable selection in which the hierarchical structure of product categories is used to enhance the reliability and computational efficiency of the variable selection. We assess the effectiveness of our MIO models through computational experiments on actual scanner panel data. These experiments are focused on the consumer’s choice among three types of stores in Japan: convenience stores, drugstores, and (grocery) supermarkets. The computational results demonstrate that our method has several advantages over the common methods for variable selection, namely, the stepwise method and \(L_1\)-regularized regression. Furthermore, our analysis reveals that convenience stores are most strongly chosen for gift cards and garbage disposal permits, drugstores are most strongly chosen for products that are specific to drugstores, and supermarkets are most strongly chosen for health food products by women with families.
Similar content being viewed by others
References
Arthanari TS, Dodge Y (1981) Mathematical programming in statistics. Wiley, New York
Bach F (2008) Exploring large feature spaces with hierarchical multiple kernel learning. In: Proceedings of the 21st international conference on neural information processing systems, pp 105–112
Baker J, Parasuraman A, Grewal D, Voss GB (2002) The influence of multiple store environment cues on perceived merchandise value and patronage intentions. J Mark 66:120–141
Bertsimas D, King A (2016) An algorithmic approach to linear regression. Oper Res 64:2–16
Bertsimas D, King A (2017) Logistic regression: from art to science. Stat Sci 32:367–384
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852
Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. Ann Stat 41:1111–1141
Bloemer J, de Ruyter K (1998) On the relationship between store image, store satisfaction and store loyalty. Eur J Mark 32:499–513
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
Briesch RA, Chintagunta PK, Fox EJ (2009) How does assortment affect grocery store choice? J Mark Res 46:176–189
Chernev A (2006) Decision focus and consumer choice among assortments. J Consum Res 33:50–59
Efroymson MA (1960) Multiple regression analysis. Math Methods Digit Comput 1:191–203
Furnival GM, Wilson RW (2000) Regressions by leaps and bounds. Technometrics 42:69–79
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Huang J, Zhang T, Metaxas D (2011) Learning with structured sparsity. J Mach Learn Res 12:3371–3412
Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th international conference on machine learning, pp 433–440
Jenatton R, Audibert JY, Bach F (2011a) Structured variable selection with sparsity-inducing norms. J Mach Learn Res 12:2777–2824
Jenatton R, Mairal J, Obozinski G, Bach F (2011b) Proximal methods for hierarchical sparse coding. J Mach Learn Res 12:2297–2334
Kahn BE, Lehmann DR (1991) Modeling choice among assortments. J Retail 67:274–299
Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th international conference on machine learning, pp 543–550
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Konno H, Yamamoto R (2009) Choosing the best set of variables in regression analysis using integer programming. J Glob Optim 44:273–282
Leszczyc PTP, Timmermans H (2002) Experimental choice analysis of shopping strategies. J Retail 77:493–509
Liu H, Motoda H (eds) (2007) Computational methods of feature selection. Chapman & Hall, Boca Raton
McFadden D (1986) The choice theory approach to market research. Mark Sci 5:275–297
Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inf Sci 279:163–175
Miyashiro R, Takano Y (2015a) Subset selection by Mallows’ $C_p$: a mixed integer programming approach. Expert Syst Appl 42:325–331
Miyashiro R, Takano Y (2015b) Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur J Oper Res 247:721–731
Pan Y, Zinkhan GM (2006) Determinants of retail patronage: a meta-analytical perspective. J Retail 82:229–243
Reutterer T, Teller C (2009) Store format choice and shopping trip types. Int J Retail Distrib Manag 37:695–710
Sato T, Takano Y, Miyashiro R, Yoshise A (2016a) Feature subset selection for logistic regression via mixed integer optimization. Comput Optim Appl 64:865–880
Sato T, Takano Y, Nakahara T (2016b) Using mixed integer optimisation to select variables for a store choice model. Int J Knowl Eng Soft Data Paradig 5:123–134
Sato T, Takano Y, Miyashiro R (2017) Piecewise-linear approximation for feature subset selection in a sequential logit model. J Oper Res Soc Jpn 60:1–14
Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2016) Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. Optimization Online. http://www.optimization-online.org/DB_HTML/2016/09/5655.html
Tamura R, Kobayashi K, Takano Y, Miyashiro R, Nakata K, Matsui T (2017) Best subset selection for eliminating multicollinearity. J Oper Res Soc Jpn 60:321–336
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B58:267–288
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc B67:91–108
Tversky A, Sattath S (1979) Preference trees. Psychol Rev 86:542–573
Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 102:349–391
Wilson ZT, Sahinidis NV (2017) The ALAMO approach to machine learning. Comput Chem Eng 106:785–795
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B68:49–67
Yusta SC (2009) Different metaheuristic strategies to solve the feature selection problem. Pattern Recognit Lett 30:525–534
Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37:3468–3497
Acknowledgements
This work was partially supported by JSPS KAKENHI Grant Numbers JP15K17146, JP17K12983 and a Grant-in-Aid of Joint Research from the Institute of Information Science, Senshu University.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sato, T., Takano, Y. & Nakahara, T. Investigating consumers’ store-choice behavior via hierarchical variable selection. Adv Data Anal Classif 13, 621–639 (2019). https://doi.org/10.1007/s11634-018-0327-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-018-0327-0
Keywords
- Store choice
- Variable selection
- Mixed-integer optimization
- Multiple regression analysis
- Scanner panel data