Abstract
The price demand relation is a fundamental concept that models how price affects the sale of a product. It is critical to have an accurate estimate of its parameters, as it will impact the company’s revenue. The learning has to be performed very efficiently using a small window of a few test points, because of the rapid changes in price demand parameters due to seasonality and fluctuations. However, there are conflicting goals when seeking the two objectives of revenue maximization and demand learning, known as the learn/earn trade-off. This is akin to the exploration/exploitation trade-off that we encounter in machine learning and optimization algorithms. In this paper, we consider the problem of price demand function estimation, taking into account its exploration–exploitation characteristic. We design a new objective function that combines both aspects. This objective function is essentially the revenue minus a term that measures the error in parameter estimates. Recursive algorithms that optimize this objective function are derived. The proposed method outperforms other existing approaches.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Op Res 57(5):1169–1188
Araman VF, Caldentey R (2010) Revenue management with incomplete demand information. Wiley Encyclopedia of Operations Research and Management Science
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Sig Proces Mag 34(6):26–38
Asiain E, Clempner JB, Poznyak AS (2019) Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies. Soft Comput 23(11):3591–3604
Atiya AF, Aly MA, Parlos AG (2005) Sparse basis selection: new results and application to adaptive prediction of video source traffic. IEEE Trans Neural Netw 16(5):1136–1146
Atiya AF, Abdel-Gawad AH, Fayed HA (2020) A new monte carlo based exact algorithm for the gaussian process classification problem. Adv Mathe Mod Appl 5(3):261–288
Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor Comput Sci 410(19):1876–1902
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J Mach Learn Res 3:397–422
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256
Aviv Y, Pazgal A (2002) Pricing of short life-cycle products through active learning. Olin School of Business, Washington University, St, Louis, Tech. rep
Aviv Y, Vulcano G (2012) Dynamic list pricing. In: The Oxford handbook of pricing management
Awad NH, Ali MZ, Duwairi RM (2017) Multi-objective differential evolution based on normalization and improved mutation strategy. Nat Comput 16(4):661–675
Aydin G, Ziya S (2009) Personalized dynamic pricing of limited inventories. Op Res 57(6):1523–1531
Ban GY, Keskin NB (2020) Personalized dynamic pricing with machine learning: High dimensional features and heterogeneous elasticity. Forthcoming, Management Science
Bayoumi AEM, Saleh M, Atiya AF, Aziz HA (2013) Dynamic pricing for hotel revenue management using price multipliers. J Rev Pric Manag 12(3):271–285
Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach. In: Mathematical and computational models for congestion charging, Springer, pp 45–79
Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Manag Sci 61(4):723–739
Besbes O, Gur Y, Zeevi A (2014) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. arXiv preprint arXiv:14053316
Bisht DC, Srivastava PK (2019) Fuzzy optimization and decision making. In: Advanced fuzzy logic approaches in engineering science, IGI Global, pp 310–326
den Boer AV (2015) Dynamic pricing and learning: historical origins, current research, and new directions. Surv Op Res Manage Sci 20(1):1–18
den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing. Manag Sci 60(3):770–783
Byrd RH, Hribar ME, Nocedal J (1999) An interior point algorithm for large-scale nonlinear programming. SIAM J Optim 9(4):877–900
Cao P, Zhao N, Wu J (2019) Dynamic pricing with bayesian demand learning and reference price effect. Eur J Op Res 279(2):540–556
Carvalho AX, Puterman ML (2005) Learning and pricing in an internet environment with binomial demands. J Rev Pric Manag 3(4):320–336
Caviglione L, Gaggero M, Paolucci M, Ronco R (2020) Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters. Soft Comput pp 1–20
Chen B, Chao X (2019) Parametric demand learning with limited price explorations in a backlog stochastic inventory system. IISE Trans 51(6):605–613
Chen HM, Hu CF, Yeh WC (2019) Option pricing and the greeks under gaussian fuzzy environments. Soft Comput 23(24):13351–13374
Cheng Y (2008) Dynamic pricing decision for perishable goods: a q-learning approach. In: Wireless communications, networking and mobile computing. WiCOM’08. 4th International Conference on, IEEE, pp 1–5
Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Ope Res 65(6):1722–1731
Črepinšek M, Liu SH, Mernik M (2013) Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput Surv (CSUR) 45(3):35
Crombecq K, Gorissen D, Deschrijver D, Dhaene T (2011) A novel hybrid sequential design strategy for global surrogate modeling of computer experiments. SIAM J Sci Comput 33(4):1948–1974
Curiel IT, Di Giannatale SB, Herrera JA, Rodríguez K (2012) Pareto frontier of a dynamic principal-agent model with discrete actions: an evolutionary multi-objective approach. Comput Econ 40(4):415–443
den Boer A (2012) Dynamic pricing and learning. PhD thesis, Vrije Universiteit Amsterdam, naam instelling promotie: VU Vrije Universiteit Naam instelling onderzoek: VU Vrije Universiteit
Diao J, Zhu K, Gao Y (2011) Agent-based simulation of durables dynamic pricing. Syst Eng Proc 2:205–212
Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression: I. Biometrika 37(3/4):409–428
Elreedy D, Atiya AF (2019) A comprehensive analysis of synthetic minority oversampling technique (smote) for handling class imbalance. Inform Sci 505:32–64
Elreedy D, Atiya AF, Fayed H, Saleh M (2017) A framework for an agent-based dynamic pricing for broadband wireless price rate plans. J Simul, pp 1–15
Elreedy D, Atiya F, A, I Shaheen S, (2019) A novel active learning regression framework for balancing the exploration-exploitation trade-off. Entropy 21(7):651
Elreedy D, Atiya AF, Shaheen SI (2021) Multi-step look-ahead optimization methods for dynamic pricing with demand learning. IEEE Access
Farahani MS, Hajiagha SHR (2021) Forecasting stock price using integrated artificial neural network and metaheuristic algorithms compared to time series models. Soft Comput, pp 1–31
Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Op Res 58(1):16–29
Fazakis N, Kanas VG, Aridas CK, Karlos S, Kotsiantis S (2019) Combination of active learning and semi-supervised learning under a self-training scheme. Entropy 21(10):988
Gao R, Wu W, Liu J (2021) Asian rainbow option pricing formulas of uncertain stock model. Soft Comput, pp 1–25
Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58
Gillespie A (2014) Foundations of economics. Oxford University Press
Gwartney JD, Stroup RL, Sobel RS, Macpherson DA (2014) Economics: Private and public choice. Nelson Education
Han W, Liu L, Zheng H (2008) (2008) Dynamic pricing by multiagent reinforcement learning. Electronic Commerce and Security. International Symposium on, IEEE, pp 226–229
Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: learning and earning under a binary prior distribution. Manag Sci 58(3):570–586
Ibrahim MN, Atiya AF (2016) Analytical solutions to the dynamic pricing problem for time-normalized revenue. Eur J Op Res 254(2):632–643
Ishii S, Yoshida W, Yoshimoto J (2002) Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw 15(4–6):665–687
Jerebic J, Mernik M, Liu SH, Ravber M, Baketarić M, Mernik L, Črepinšek M (2021) A novel direct measure of exploration and exploitation based on attraction basins. Exp Syst Appl 167:114353
Ji X, Zhou J (2015) Option pricing for an uncertain stock model with jumps. Soft Comput 19(11):3323–3329
Kastius A, Schlosser R (2021) Dynamic pricing under competition using reinforcement learning. J Rev Pric Manag, pp 1–14
Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: asymptotically optimal semi-myopic policies. Op Res 62(5):1142–1167
Kutschinski E, Uthmann T, Polani D (2003) Learning competitive pricing strategies by multi-agent reinforcement learning. J Econ Dyn Control 27(11–12):2207–2218
Li W, Wang X, Zhang R, Cui Y, Mao J, Jin R (2010) Exploitation and exploration in a performance based contextual advertising system. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 27–36
Li Y, Wang B, Fu A, Watada J (2020) Fuzzy portfolio optimization for time-inconsistent investors: a multi-objective dynamic approach. Soft Comput 24(13):9927–9941
Library CUM (2001) Musdaers electronic data archive, red meats yearbook. “http://usda.mannlib.cornell.edu/”
Liu J, Pang Z, Qi L (2020) Dynamic pricing and inventory management with demand learning: a bayesian approach. Comput Op Res 124:105078
Lobo MS, Boyd S (2003) Pricing and learning with uncertain demand. In: INFORMS revenue management conference
Mahesh A, Sushnigdha G (2021) A novel search space reduction optimization algorithm. Soft Comput pp 1–28
Makridakis S, Spiliotis E, Assimakopoulos V (2020) The m5 accuracy competition: results, findings and conclusions. Int J Forecast
Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonom Rob 27(2):93–103
McAfee RP, Te Velde V (2006) Dynamic pricing in the airline industry. Forthcoming in handbook on economics and information systems, Ed: TJ Hendershott, Elsevier
Morales-Enciso S, Branke J (2012) Revenue maximization through dynamic pricing under unknown market behaviour. In: OASIcs-OpenAccess Series in Informatics, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, vol 22
of Nicosia TU (2020) M5 forecasting - accuracy. https://www.kaggle.com/c/m5-forecasting-accuracy
Pandey S, Agarwal D, Chakrabarti D, Josifovski V (2007) Bandits for taxonomies: A model-based approach. In: Proceedings of the 2007 SIAM international conference on data mining, SIAM, pp 216–227
Price I, Fowkes J, Hopman D (2019) Gaussian processes for unconstraining demand. Eur J Op Res 275(2):621–634
Rana R, Oliveira FS (2015) Dynamic pricing policies for interdependent perishable products or services using reinforcement learning. Exp Syst Appl 42(1):426–436
Rezaei F, Safavi HR (2020) Guaspso: a new approach to hold a better exploration-exploitation balance in pso algorithm. Soft Comput 24(7):4855–4875
Rhuggenaath J, da Costa PRdO, Akcay A, Zhang Y, Kaymak U (2019) A heuristic policy for dynamic pricing and demand learning with limited price changes and censored demand. 2019 IEEE international conference on systems. Man and Cybernetics (SMC), IEEE, pp 3693–3698
Rhuggenaath J, da Costa PRdO, Zhang Y, Akcay A, Kaymak U (2020) Dynamic pricing using thompson sampling with fuzzy events. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 653–666
Robbins H (1985) Some aspects of the sequential design of experiments. In: Herbert Robbins Selected Papers, Springer, pp 169–177
Rothschild M (1974) A two-armed bandit theory of market pricing. J Econ Theory 9(2):185–202
Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the first international conference on genetic algorithms and their applications (1985) Lawrence Erlbaum Associates. Publishers, Inc
Schultz H (1933) A comparison of elasticities of demand obtained by different methods. Econometrica J Econ Soc pp 274–308
Settles B (2009) Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, Tech. rep
Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065
Singh A, Deep K (2019) Exploration-exploitation balance in artificial bee colony algorithm: a critical analysis. Soft Comput 23(19):9525–9536
Srinivasan S, Kamalakannan T (2018) Multi criteria decision making in financial risk management with a multi-objective genetic algorithm. Comput Econ 52(2):443–457
Sun Y (2011) Coke demand estimation dataset. http://leeds-faculty.colorado.edu/ysun/doc/Demand_estimation_worksheet.doc
Sun Y, Yao K, Dong J (2018) Asian option pricing problems of uncertain mean-reverting stock model. Soft Comput 22(17):5583–5592
Taieb SB, Atiya AF (2015) A bias and variance analysis for multistep-ahead time series forecasting. IEEE Trans Neural Netw Learn Syst 27(1):62–76
Tang R, Wang S, Li H (2019) Game theory based interactive demand side management responding to dynamic pricing in price-based demand response of smart grids. Appl Energy 250:118–130
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294
Thompson WR (1935) On the theory of apportionment. Am J Math 57(2):450–456
Tokic M (2010) Adaptive \(\varepsilon \)-greedy exploration in reinforcement learning based on value differences. In: Annual conference on artificial intelligence, Springer, pp 203–210
Triki C, Violi A (2009) Dynamic pricing of electricity in retail markets. 4OR 7(1):21–36
Trovo F, Paladino S, Restelli M, Gatti N (2015) Multi–armed bandit for pricing. In: Proceedings of the european workshop on reinforcement learning (EWRL)
Valizadegan H, Jin R, Wang S (2011) Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 204–212
Vermorel J, Mohri M (2005) Multi-armed bandit algorithms and empirical evaluation. In: European conference on machine learning, Springer, pp 437–448
Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Stat Sci A Rev J Inst Math Stat 30(2):199
Wang Z, Deng S, Ye Y (2014) Close the gaps: a learning-while-doing algorithm for single-product revenue management problems. Op Res 62(2):318–331
Xia CH, Dube P (2007) Dynamic pricing in e-services under demand uncertainty. Prod Op Manag 16(6):701–712
Zavarella L (2018) Price elasticity dataset. https://towardsdatascience.com/price-elasticity-data-understanding-and-data-exploration-first-of-all-ae4661da2ecb
Zhong S, Wang X, Zhao J, Li W, Li H, Wang Y, Deng S, Zhu J (2021) Deep reinforcement learning framework for dynamic pricing demand response of regenerative electric heating. Appl Energy 288:116623
Zhu Z, Peng J, Liu K, Zhang X (2020) A game-based resource pricing and allocation mechanism for profit maximization in cloud computing. Soft Comput 24(6):4191–4203
Funding
This research received no external funding.
Author information
Authors and Affiliations
Contributions
Dina Elreedy, Amir F. Atiya and Samir I. Shaheen were involved in the conceptualization; Dina Elreedy and Amir F. Atiya were involved in the formal analysis; Dina Elreedy and Amir F. Atiya were involved in the methodology; Amir F. Atiya and Samir I. Shaheen contributed to the project administration; Amir F. Atiya and Samir I. Shaheen contributed to resources; Dina Elreedy contributed to software; Amir F. Atiya and Samir I. Shaheen were involved in the supervision; Dina Elreedy and Amir F. Atiya were involved in the validation; Dina Elreedy, Amir F. Atiya and Samir I. Shaheen contributed to the writing.
Corresponding author
Ethics declarations
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Conflicts of interest
The authors of this work declare no conflict of interest.
Code availability
The source code of the current work is available from the corresponding author on reasonable request.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Derivation of utility derivatives for the three proposed formulations
Derivation of utility derivatives for the three proposed formulations
1.1 Formulation 1
According to Section 5, the expected utility of our first formulation is defined as:
The first derivative of the expected utility, \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\), can be calculated as:
Since \(tr[A+B]=tr[A]+tr[B]\), then \(\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*}\) would be:
It can be observed that the first derivative term in Eq. (39) evaluates to zero. Evaluating the second term of Eq. (39), and letting \(x_n\) be denoted as \(x^*\):
Let \(A= \frac{{x^*}{x^*}^T\Sigma _{\beta }(n-1)}{\sigma ^2\gamma ^2+\gamma {x^*}^T \Sigma _{\beta }(n-1){x^*}}\), accordingly Eq. (40) can be evaluated as follows:
However, from trace properties \(tr[BAC]=tr[ACB]\), then:
Then, from trace derivative properties:
Accordingly, substitute from Eq.(43) into Eq. (42) where \(B={\Sigma _{\beta }^2(n-1)}\) and \(x=p^*\), accordingly \(\frac{\partial B}{\partial x}\) evaluates to zero, and Eq. (42) is simplified to:
Simplifying matrix A:
where \(\Sigma _\beta (n-1)= \begin{pmatrix} {\sigma _a}^2&{} \sigma _{ab} \\ \sigma _{ab} &{} {\sigma _b}^2 \end{pmatrix} \).
Then, evaluating \(\frac{\partial A}{\partial p^*}\):
Let \(g(p^*)={(\sigma ^2\gamma ^2+\gamma [{\sigma _a^2}+2\sigma _{ab}p^*+{\sigma _b}^2{p^*}^2])}\), then:
where \(Z(p*)\) matrix elements: \(Z_{11}\), \(Z_{12}\), and \(Z_{22}\) are evaluated as follows:
Substituting from Eq. (44) and Eq. (47) into Eq. (38):
1.2 Formulation 2
As presented in Section 5, as defined in Section 5, can be evaluated as follows:
Using Eq. (7) to substitute for \(\Sigma _\beta (n)\), and let \(A=\Sigma _\beta (n-1)x^*{x^*}^T\Sigma _\beta (n-1)\), and calculate the derivative of utility \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\) with respect to \(p^*\), this results in the following equation:
where \(A_{11}\) is the first row and column entry in matrix A. \(A_{11}\) and \(A_{22}\) are evaluated as follows:
Substituting Eq. (51) into Eq. (50) and evaluating \(\frac{\partial }{\partial p^*}[\frac{A_{11}}{g(p^*)}]\) and \(\frac{\partial }{\partial p^*}[\frac{A_{22}}{g(p^*)}]\) terms results in:
Simplifying Eq. (52) results in the following equation:
1.3 Formulation 3
The expected utility of the third proposed formulation, the expected utility of our second formulation is defined as:
The derivative of the expected utility \(U(p^*)_n\) w.r.t. \(p^*\) is calculated as follows:
where \(\Sigma _\beta (n-1)=\begin{pmatrix}{\sigma _a}^2&{} \sigma _{ab} \\ \sigma _{ab} &{} {\sigma _b}^2 \end{pmatrix}\). Thus, the derivative of expected utility with respect to \(p^*\), \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\) can be simplified into:
Simplifying Eq. (55) results in the following equation:
Rights and permissions
About this article
Cite this article
Elreedy, D., Atiya, A.F. & Shaheen, S.I. Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation framework. Soft Comput 25, 11711–11733 (2021). https://doi.org/10.1007/s00500-021-06047-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06047-y