Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation framework | Soft Computing Skip to main content

Advertisement

Log in

Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation framework

  • Soft computing in decision making and in modeling in economics
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The price demand relation is a fundamental concept that models how price affects the sale of a product. It is critical to have an accurate estimate of its parameters, as it will impact the company’s revenue. The learning has to be performed very efficiently using a small window of a few test points, because of the rapid changes in price demand parameters due to seasonality and fluctuations. However, there are conflicting goals when seeking the two objectives of revenue maximization and demand learning, known as the learn/earn trade-off. This is akin to the exploration/exploitation trade-off that we encounter in machine learning and optimization algorithms. In this paper, we consider the problem of price demand function estimation, taking into account its exploration–exploitation characteristic. We design a new objective function that combines both aspects. This objective function is essentially the revenue minus a term that measures the error in parameter estimates. Recursive algorithms that optimize this objective function are derived. The proposed method outperforms other existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Araman VF, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Op Res 57(5):1169–1188

    Article  MathSciNet  MATH  Google Scholar 

  • Araman VF, Caldentey R (2010) Revenue management with incomplete demand information. Wiley Encyclopedia of Operations Research and Management Science

  • Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Sig Proces Mag 34(6):26–38

    Article  Google Scholar 

  • Asiain E, Clempner JB, Poznyak AS (2019) Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies. Soft Comput 23(11):3591–3604

    Article  MATH  Google Scholar 

  • Atiya AF, Aly MA, Parlos AG (2005) Sparse basis selection: new results and application to adaptive prediction of video source traffic. IEEE Trans Neural Netw 16(5):1136–1146

    Article  Google Scholar 

  • Atiya AF, Abdel-Gawad AH, Fayed HA (2020) A new monte carlo based exact algorithm for the gaussian process classification problem. Adv Mathe Mod Appl 5(3):261–288

    Google Scholar 

  • Audibert JY, Munos R, Szepesvári C (2009) Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor Comput Sci 410(19):1876–1902

    Article  MathSciNet  MATH  Google Scholar 

  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J Mach Learn Res 3:397–422

    MathSciNet  MATH  Google Scholar 

  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2–3):235–256

    Article  MATH  Google Scholar 

  • Aviv Y, Pazgal A (2002) Pricing of short life-cycle products through active learning. Olin School of Business, Washington University, St, Louis, Tech. rep

    Google Scholar 

  • Aviv Y, Vulcano G (2012) Dynamic list pricing. In: The Oxford handbook of pricing management

  • Awad NH, Ali MZ, Duwairi RM (2017) Multi-objective differential evolution based on normalization and improved mutation strategy. Nat Comput 16(4):661–675

    Article  MathSciNet  Google Scholar 

  • Aydin G, Ziya S (2009) Personalized dynamic pricing of limited inventories. Op Res 57(6):1523–1531

    Article  MATH  Google Scholar 

  • Ban GY, Keskin NB (2020) Personalized dynamic pricing with machine learning: High dimensional features and heterogeneous elasticity. Forthcoming, Management Science

  • Bayoumi AEM, Saleh M, Atiya AF, Aziz HA (2013) Dynamic pricing for hotel revenue management using price multipliers. J Rev Pric Manag 12(3):271–285

    Google Scholar 

  • Bertsimas D, Perakis G (2006) Dynamic pricing: A learning approach. In: Mathematical and computational models for congestion charging, Springer, pp 45–79

  • Besbes O, Zeevi A (2015) On the (surprising) sufficiency of linear models for dynamic pricing with demand learning. Manag Sci 61(4):723–739

    Article  Google Scholar 

  • Besbes O, Gur Y, Zeevi A (2014) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. arXiv preprint arXiv:14053316

  • Bisht DC, Srivastava PK (2019) Fuzzy optimization and decision making. In: Advanced fuzzy logic approaches in engineering science, IGI Global, pp 310–326

  • den Boer AV (2015) Dynamic pricing and learning: historical origins, current research, and new directions. Surv Op Res Manage Sci 20(1):1–18

    MathSciNet  Google Scholar 

  • den Boer AV, Zwart B (2013) Simultaneously learning and optimizing using controlled variance pricing. Manag Sci 60(3):770–783

    Article  Google Scholar 

  • Byrd RH, Hribar ME, Nocedal J (1999) An interior point algorithm for large-scale nonlinear programming. SIAM J Optim 9(4):877–900

    Article  MathSciNet  MATH  Google Scholar 

  • Cao P, Zhao N, Wu J (2019) Dynamic pricing with bayesian demand learning and reference price effect. Eur J Op Res 279(2):540–556

    Article  MathSciNet  MATH  Google Scholar 

  • Carvalho AX, Puterman ML (2005) Learning and pricing in an internet environment with binomial demands. J Rev Pric Manag 3(4):320–336

    Google Scholar 

  • Caviglione L, Gaggero M, Paolucci M, Ronco R (2020) Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters. Soft Comput pp 1–20

  • Chen B, Chao X (2019) Parametric demand learning with limited price explorations in a backlog stochastic inventory system. IISE Trans 51(6):605–613

    Article  Google Scholar 

  • Chen HM, Hu CF, Yeh WC (2019) Option pricing and the greeks under gaussian fuzzy environments. Soft Comput 23(24):13351–13374

    Article  MATH  Google Scholar 

  • Cheng Y (2008) Dynamic pricing decision for perishable goods: a q-learning approach. In: Wireless communications, networking and mobile computing. WiCOM’08. 4th International Conference on, IEEE, pp 1–5

  • Cheung WC, Simchi-Levi D, Wang H (2017) Dynamic pricing and demand learning with limited price experimentation. Ope Res 65(6):1722–1731

    Article  MathSciNet  MATH  Google Scholar 

  • Črepinšek M, Liu SH, Mernik M (2013) Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput Surv (CSUR) 45(3):35

    Article  MATH  Google Scholar 

  • Crombecq K, Gorissen D, Deschrijver D, Dhaene T (2011) A novel hybrid sequential design strategy for global surrogate modeling of computer experiments. SIAM J Sci Comput 33(4):1948–1974

    Article  MathSciNet  MATH  Google Scholar 

  • Curiel IT, Di Giannatale SB, Herrera JA, Rodríguez K (2012) Pareto frontier of a dynamic principal-agent model with discrete actions: an evolutionary multi-objective approach. Comput Econ 40(4):415–443

    Article  Google Scholar 

  • den Boer A (2012) Dynamic pricing and learning. PhD thesis, Vrije Universiteit Amsterdam, naam instelling promotie: VU Vrije Universiteit Naam instelling onderzoek: VU Vrije Universiteit

  • Diao J, Zhu K, Gao Y (2011) Agent-based simulation of durables dynamic pricing. Syst Eng Proc 2:205–212

    Article  Google Scholar 

  • Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression: I. Biometrika 37(3/4):409–428

    Article  MathSciNet  MATH  Google Scholar 

  • Elreedy D, Atiya AF (2019) A comprehensive analysis of synthetic minority oversampling technique (smote) for handling class imbalance. Inform Sci 505:32–64

    Article  Google Scholar 

  • Elreedy D, Atiya AF, Fayed H, Saleh M (2017) A framework for an agent-based dynamic pricing for broadband wireless price rate plans. J Simul, pp 1–15

  • Elreedy D, Atiya F, A, I Shaheen S, (2019) A novel active learning regression framework for balancing the exploration-exploitation trade-off. Entropy 21(7):651

  • Elreedy D, Atiya AF, Shaheen SI (2021) Multi-step look-ahead optimization methods for dynamic pricing with demand learning. IEEE Access

  • Farahani MS, Hajiagha SHR (2021) Forecasting stock price using integrated artificial neural network and metaheuristic algorithms compared to time series models. Soft Comput, pp 1–31

  • Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Op Res 58(1):16–29

    Article  MATH  Google Scholar 

  • Fazakis N, Kanas VG, Aridas CK, Karlos S, Kotsiantis S (2019) Combination of active learning and semi-supervised learning under a self-training scheme. Entropy 21(10):988

    Article  Google Scholar 

  • Gao R, Wu W, Liu J (2021) Asian rainbow option pricing formulas of uncertain stock model. Soft Comput, pp 1–25

  • Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58

    Article  Google Scholar 

  • Gillespie A (2014) Foundations of economics. Oxford University Press

  • Gwartney JD, Stroup RL, Sobel RS, Macpherson DA (2014) Economics: Private and public choice. Nelson Education

  • Han W, Liu L, Zheng H (2008) (2008) Dynamic pricing by multiagent reinforcement learning. Electronic Commerce and Security. International Symposium on, IEEE, pp 226–229

  • Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: learning and earning under a binary prior distribution. Manag Sci 58(3):570–586

    Article  Google Scholar 

  • Ibrahim MN, Atiya AF (2016) Analytical solutions to the dynamic pricing problem for time-normalized revenue. Eur J Op Res 254(2):632–643

    Article  MathSciNet  MATH  Google Scholar 

  • Ishii S, Yoshida W, Yoshimoto J (2002) Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Netw 15(4–6):665–687

    Article  Google Scholar 

  • Jerebic J, Mernik M, Liu SH, Ravber M, Baketarić M, Mernik L, Črepinšek M (2021) A novel direct measure of exploration and exploitation based on attraction basins. Exp Syst Appl 167:114353

    Article  Google Scholar 

  • Ji X, Zhou J (2015) Option pricing for an uncertain stock model with jumps. Soft Comput 19(11):3323–3329

    Article  MATH  Google Scholar 

  • Kastius A, Schlosser R (2021) Dynamic pricing under competition using reinforcement learning. J Rev Pric Manag, pp 1–14

  • Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: asymptotically optimal semi-myopic policies. Op Res 62(5):1142–1167

    Article  MathSciNet  MATH  Google Scholar 

  • Kutschinski E, Uthmann T, Polani D (2003) Learning competitive pricing strategies by multi-agent reinforcement learning. J Econ Dyn Control 27(11–12):2207–2218

    Article  MathSciNet  MATH  Google Scholar 

  • Li W, Wang X, Zhang R, Cui Y, Mao J, Jin R (2010) Exploitation and exploration in a performance based contextual advertising system. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 27–36

  • Li Y, Wang B, Fu A, Watada J (2020) Fuzzy portfolio optimization for time-inconsistent investors: a multi-objective dynamic approach. Soft Comput 24(13):9927–9941

    Article  Google Scholar 

  • Library CUM (2001) Musdaers electronic data archive, red meats yearbook. “http://usda.mannlib.cornell.edu/

  • Liu J, Pang Z, Qi L (2020) Dynamic pricing and inventory management with demand learning: a bayesian approach. Comput Op Res 124:105078

    Article  MathSciNet  MATH  Google Scholar 

  • Lobo MS, Boyd S (2003) Pricing and learning with uncertain demand. In: INFORMS revenue management conference

  • Mahesh A, Sushnigdha G (2021) A novel search space reduction optimization algorithm. Soft Comput pp 1–28

  • Makridakis S, Spiliotis E, Assimakopoulos V (2020) The m5 accuracy competition: results, findings and conclusions. Int J Forecast

  • Martinez-Cantin R, de Freitas N, Brochu E, Castellanos J, Doucet A (2009) A bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot. Autonom Rob 27(2):93–103

    Article  Google Scholar 

  • McAfee RP, Te Velde V (2006) Dynamic pricing in the airline industry. Forthcoming in handbook on economics and information systems, Ed: TJ Hendershott, Elsevier

  • Morales-Enciso S, Branke J (2012) Revenue maximization through dynamic pricing under unknown market behaviour. In: OASIcs-OpenAccess Series in Informatics, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, vol 22

  • of Nicosia TU (2020) M5 forecasting - accuracy. https://www.kaggle.com/c/m5-forecasting-accuracy

  • Pandey S, Agarwal D, Chakrabarti D, Josifovski V (2007) Bandits for taxonomies: A model-based approach. In: Proceedings of the 2007 SIAM international conference on data mining, SIAM, pp 216–227

  • Price I, Fowkes J, Hopman D (2019) Gaussian processes for unconstraining demand. Eur J Op Res 275(2):621–634

    Article  MathSciNet  MATH  Google Scholar 

  • Rana R, Oliveira FS (2015) Dynamic pricing policies for interdependent perishable products or services using reinforcement learning. Exp Syst Appl 42(1):426–436

    Article  Google Scholar 

  • Rezaei F, Safavi HR (2020) Guaspso: a new approach to hold a better exploration-exploitation balance in pso algorithm. Soft Comput 24(7):4855–4875

    Article  Google Scholar 

  • Rhuggenaath J, da Costa PRdO, Akcay A, Zhang Y, Kaymak U (2019) A heuristic policy for dynamic pricing and demand learning with limited price changes and censored demand. 2019 IEEE international conference on systems. Man and Cybernetics (SMC), IEEE, pp 3693–3698

  • Rhuggenaath J, da Costa PRdO, Zhang Y, Akcay A, Kaymak U (2020) Dynamic pricing using thompson sampling with fuzzy events. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 653–666

  • Robbins H (1985) Some aspects of the sequential design of experiments. In: Herbert Robbins Selected Papers, Springer, pp 169–177

  • Rothschild M (1974) A two-armed bandit theory of market pricing. J Econ Theory 9(2):185–202

    Article  MathSciNet  Google Scholar 

  • Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the first international conference on genetic algorithms and their applications (1985) Lawrence Erlbaum Associates. Publishers, Inc

  • Schultz H (1933) A comparison of elasticities of demand obtained by different methods. Econometrica J Econ Soc pp 274–308

  • Settles B (2009) Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, Tech. rep

  • Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065

    Article  Google Scholar 

  • Singh A, Deep K (2019) Exploration-exploitation balance in artificial bee colony algorithm: a critical analysis. Soft Comput 23(19):9525–9536

    Article  Google Scholar 

  • Srinivasan S, Kamalakannan T (2018) Multi criteria decision making in financial risk management with a multi-objective genetic algorithm. Comput Econ 52(2):443–457

    Article  Google Scholar 

  • Sun Y (2011) Coke demand estimation dataset. http://leeds-faculty.colorado.edu/ysun/doc/Demand_estimation_worksheet.doc

  • Sun Y, Yao K, Dong J (2018) Asian option pricing problems of uncertain mean-reverting stock model. Soft Comput 22(17):5583–5592

    Article  MATH  Google Scholar 

  • Taieb SB, Atiya AF (2015) A bias and variance analysis for multistep-ahead time series forecasting. IEEE Trans Neural Netw Learn Syst 27(1):62–76

    Article  MathSciNet  Google Scholar 

  • Tang R, Wang S, Li H (2019) Game theory based interactive demand side management responding to dynamic pricing in price-based demand response of smart grids. Appl Energy 250:118–130

    Article  Google Scholar 

  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294

    Article  MATH  Google Scholar 

  • Thompson WR (1935) On the theory of apportionment. Am J Math 57(2):450–456

    Article  MathSciNet  MATH  Google Scholar 

  • Tokic M (2010) Adaptive \(\varepsilon \)-greedy exploration in reinforcement learning based on value differences. In: Annual conference on artificial intelligence, Springer, pp 203–210

  • Triki C, Violi A (2009) Dynamic pricing of electricity in retail markets. 4OR 7(1):21–36

  • Trovo F, Paladino S, Restelli M, Gatti N (2015) Multi–armed bandit for pricing. In: Proceedings of the european workshop on reinforcement learning (EWRL)

  • Valizadegan H, Jin R, Wang S (2011) Learning to trade off between exploration and exploitation in multiclass bandit prediction. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 204–212

  • Vermorel J, Mohri M (2005) Multi-armed bandit algorithms and empirical evaluation. In: European conference on machine learning, Springer, pp 437–448

  • Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges. Stat Sci A Rev J Inst Math Stat 30(2):199

    MathSciNet  MATH  Google Scholar 

  • Wang Z, Deng S, Ye Y (2014) Close the gaps: a learning-while-doing algorithm for single-product revenue management problems. Op Res 62(2):318–331

    Article  MathSciNet  MATH  Google Scholar 

  • Xia CH, Dube P (2007) Dynamic pricing in e-services under demand uncertainty. Prod Op Manag 16(6):701–712

    Article  Google Scholar 

  • Zavarella L (2018) Price elasticity dataset. https://towardsdatascience.com/price-elasticity-data-understanding-and-data-exploration-first-of-all-ae4661da2ecb

  • Zhong S, Wang X, Zhao J, Li W, Li H, Wang Y, Deng S, Zhu J (2021) Deep reinforcement learning framework for dynamic pricing demand response of regenerative electric heating. Appl Energy 288:116623

    Article  Google Scholar 

  • Zhu Z, Peng J, Liu K, Zhang X (2020) A game-based resource pricing and allocation mechanism for profit maximization in cloud computing. Soft Comput 24(6):4191–4203

    Article  Google Scholar 

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

Dina Elreedy, Amir F. Atiya and Samir I. Shaheen were involved in the conceptualization; Dina Elreedy and Amir F. Atiya were involved in the formal analysis; Dina Elreedy and Amir F. Atiya were involved in the methodology; Amir F. Atiya and Samir I. Shaheen contributed to the project administration; Amir F. Atiya and Samir I. Shaheen contributed to resources; Dina Elreedy contributed to software; Amir F. Atiya and Samir I. Shaheen were involved in the supervision; Dina Elreedy and Amir F. Atiya were involved in the validation; Dina Elreedy, Amir F. Atiya and Samir I. Shaheen contributed to the writing.

Corresponding author

Correspondence to Dina Elreedy.

Ethics declarations

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Conflicts of interest

The authors of this work declare no conflict of interest.

Code availability

The source code of the current work is available from the corresponding author on reasonable request.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Derivation of utility derivatives for the three proposed formulations

Derivation of utility derivatives for the three proposed formulations

1.1 Formulation 1

According to Section 5, the expected utility of our first formulation is defined as:

$$\begin{aligned}&E[U(p^*)_n]= b{p^*}^2+ap^*-\eta \frac{1}{2 \sqrt{tr[\Sigma _{\beta (n)}]}}\nonumber \\&\quad tr\Big [{1\over {\gamma }}\Sigma _{\beta }(n-1)-{{\Sigma _{\beta }(n-1)x_n {x_n}^T\Sigma _{\beta }(n-1)}\over {\sigma ^2\gamma ^2+\gamma {x_n}^T\Sigma _{\beta }(n-1)}x_n}\Big ] \end{aligned}$$

The first derivative of the expected utility, \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\), can be calculated as:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}=a+2bp^* -\eta \frac{1}{2 \sqrt{tr[\Sigma _{\beta (n)}]}}\nonumber \\&\quad \frac{\partial tr[{1\over {\gamma }}\Sigma _{\beta }(n-1)-{{\Sigma _{\beta }(n-1)x_n {x_n}^T\Sigma _{\beta }(n-1)}\over {\sigma ^2\gamma ^2+\gamma {x_n}^T\Sigma _{\beta }(n-1)}x_n}]}{\partial p^*} \end{aligned}$$
(38)

Since \(tr[A+B]=tr[A]+tr[B]\), then \(\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*}\) would be:

$$\begin{aligned}&\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*}=\frac{\partial tr[{1\over {\gamma }}\Sigma _{\beta }(n-1)]}{\partial p^*}\nonumber \\&\quad -\frac{\partial tr[{{\Sigma _{\beta }(n-1)x_n {x_n}^T\Sigma _{\beta }(n-1)}\over {\sigma ^2\gamma ^2+\gamma {x_n}^T\Sigma _{\beta }(n-1)}x_n}]}{\partial p^*} \end{aligned}$$
(39)

It can be observed that the first derivative term in Eq. (39) evaluates to zero. Evaluating the second term of Eq. (39), and letting \(x_n\) be denoted as \(x^*\):

$$\begin{aligned}&\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*} = \frac{\partial tr[{{\Sigma _{\beta }(n-1){x^*}{x^*}^T\Sigma _{\beta }(n-1)}\over {\sigma ^2\gamma ^2+\gamma {x^*}^T\Sigma _{\beta }(n-1)}{x^*}}]}{\partial p^*} \end{aligned}$$
(40)

Let \(A= \frac{{x^*}{x^*}^T\Sigma _{\beta }(n-1)}{\sigma ^2\gamma ^2+\gamma {x^*}^T \Sigma _{\beta }(n-1){x^*}}\), accordingly Eq. (40) can be evaluated as follows:

$$\begin{aligned}&\quad \frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*}= \frac{\partial tr[{{\Sigma _{\beta }(n-1)A\Sigma _{\beta }(n-1)}}]}{\partial p^*} \end{aligned}$$
(41)

However, from trace properties \(tr[BAC]=tr[ACB]\), then:

$$\begin{aligned}&\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*} = \frac{\partial tr[{{A{\Sigma _{\beta }^2(n-1)}}}]}{\partial p^*} \end{aligned}$$
(42)

Then, from trace derivative properties:

$$\begin{aligned} \frac{\partial tr[AB]}{\partial x}= & {} \frac{\partial<A^T,B>_F}{\partial x} \nonumber \\= & {} <B^T,\frac{\partial A}{\partial x}>_F +<A^T,\frac{\partial B}{\partial x}>_F \nonumber \\= & {} tr[B\frac{\partial A}{\partial x}+A\frac{\partial B}{\partial x}] \end{aligned}$$
(43)

Accordingly, substitute from Eq.(43) into Eq. (42) where \(B={\Sigma _{\beta }^2(n-1)}\) and \(x=p^*\), accordingly \(\frac{\partial B}{\partial x}\) evaluates to zero, and Eq. (42) is simplified to:

$$\begin{aligned}&\frac{\partial tr[{\Sigma _\beta }(n)]}{\partial p^*}= tr\left[ {\Sigma _{\beta }^2(n-1)}\frac{\partial A}{\partial p^*}\right] \end{aligned}$$
(44)

Simplifying matrix A:

$$\begin{aligned} A=\frac{\begin{pmatrix} 1&{} p^* \\ p^* &{} {p^*}^2 \end{pmatrix}}{\sigma ^2\gamma ^2+\gamma [{\sigma _a^2}+2\sigma _{ab}p^*+{\sigma _b}^2{p^*}^2]} \end{aligned}$$
(45)

where \(\Sigma _\beta (n-1)= \begin{pmatrix} {\sigma _a}^2&{} \sigma _{ab} \\ \sigma _{ab} &{} {\sigma _b}^2 \end{pmatrix} \).

Then, evaluating \(\frac{\partial A}{\partial p^*}\):

$$\begin{aligned}&\frac{\partial A}{\partial p^*} =\frac{1}{{(\sigma ^2\gamma ^2+\gamma ({\sigma _a^2}+2\sigma _{ab}p^*+{\sigma _b}^2{p^*}^2))}^2}\nonumber \\&\quad \Big [{(\sigma ^2\gamma ^2+\gamma ({\sigma _a^2}+2\sigma _{ab}p^*+{\sigma _b}^2{p^*}^2))}\begin{pmatrix} 0&{} 1 \\ 1 &{} 2{p^*} \end{pmatrix}\nonumber \\&\quad -\gamma \begin{pmatrix}1&{} p^* \\ p^* &{} {p^*}^2 \end{pmatrix}(2\sigma _{ab}+2\sigma _b^2 p^*)\Big ] \end{aligned}$$
(46)

Let \(g(p^*)={(\sigma ^2\gamma ^2+\gamma [{\sigma _a^2}+2\sigma _{ab}p^*+{\sigma _b}^2{p^*}^2])}\), then:

$$\begin{aligned}&\frac{\partial A}{\partial p^*}=\frac{1}{{g^2(p^*)}}Z(p*)=\frac{1}{{g^2(p^*)}}{\begin{pmatrix}Z_{11} &{} Z_{12} \\ Z_{12}&{} Z_{22}\end{pmatrix}} \end{aligned}$$
(47)

where \(Z(p*)\) matrix elements: \(Z_{11}\), \(Z_{12}\), and \(Z_{22}\) are evaluated as follows:

$$\begin{aligned} Z_{11}= & {} -2\gamma (\sigma _{ab}+{\sigma _b}^2 p^*) \nonumber \\ Z_{12}= & {} \gamma (\sigma ^2\gamma +\sigma _a^2-\sigma _b^2{p^*}^2) \nonumber \\ Z_{22}= & {} \gamma (2\sigma ^2\gamma p^* +2\sigma _a^2p^*+2\sigma _{ab}{p^*}^2) \end{aligned}$$
(48)

Substituting from Eq. (44) and Eq. (47) into Eq. (38):

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}=a+2bp^* \nonumber \\&\quad + \eta \frac{1}{2 \sqrt{tr[\Sigma _{\beta (n)}]}}tr\big [{\frac{1}{{g^2(p^*)}}\Sigma _{\beta }^2(n-1)Z(p*)}\big ] \end{aligned}$$
(49)

1.2 Formulation 2

As presented in Section 5, as defined in Section 5, can be evaluated as follows:

$$\begin{aligned}&E[U(p^*)_n]=E[R(p^*)_n] -\eta \Big (\frac{\sqrt{\Sigma _\beta (n)_{11}}}{a}+\frac{\sqrt{\Sigma _\beta (n)_{22}}}{|b|}\Big ) \end{aligned}$$

Using Eq. (7) to substitute for \(\Sigma _\beta (n)\), and let \(A=\Sigma _\beta (n-1)x^*{x^*}^T\Sigma _\beta (n-1)\), and calculate the derivative of utility \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\) with respect to \(p^*\), this results in the following equation:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}= a+2bp^*\nonumber \\&\quad +\eta \Bigg (\frac{1}{2a\sqrt{{\Sigma _\beta }(n)_{11}}} \frac{\partial }{\partial p^*}\bigg [\frac{A_{11}}{g(p^*)}\bigg ]\nonumber \\&\quad +\frac{1}{2|b|\sqrt{{\Sigma _\beta }(n)_{22}}} \frac{\partial }{\partial p^*}\bigg [\frac{A_{22}}{g(p^*)}\bigg ]\Bigg ) \end{aligned}$$
(50)

where \(A_{11}\) is the first row and column entry in matrix A. \(A_{11}\) and \(A_{22}\) are evaluated as follows:

$$\begin{aligned} A_{11}= & {} {\sigma _{ab}}^2{p^*}^2+2{\sigma _a}^2\sigma _{ab}p^*+{\sigma _a}^4\nonumber \\ A_{22}= & {} {\sigma _{b}}^4{p^*}^2+2{\sigma _{ab}}{\sigma _b}^2 p^*+{\sigma _{ab}}^2 \end{aligned}$$
(51)

Substituting Eq. (51) into Eq. (50) and evaluating \(\frac{\partial }{\partial p^*}[\frac{A_{11}}{g(p^*)}]\) and \(\frac{\partial }{\partial p^*}[\frac{A_{22}}{g(p^*)}]\) terms results in:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*} = a+2bp^*\nonumber \\&\quad + \eta \bigg ( \Big [\frac{2g(p^*)(\sigma _{ab}{\sigma _a}^2+\sigma _{ab}^2p^*)-2\gamma A_{11}(\sigma _{ab}+\sigma _b^2p^*)}{2a g^2(p^*)\sqrt{{\Sigma _\beta }_{11}(n)}}\Big ]\nonumber \\&\quad +\Big [\frac{2g(p^*)(\sigma _{ab}{\sigma _b}^2+\sigma _{b}^4p^*)-2\gamma A_{22}(\sigma _{ab}+\sigma _b^2p^*))}{2bg^2(p^*)\sqrt{{\Sigma _\beta }_{22}(n)}}\Big ]\bigg )\nonumber \\ \end{aligned}$$
(52)

Simplifying Eq. (52) results in the following equation:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*} = a+2bp^* + \eta \frac{\gamma }{2ag^2(p^*)\sqrt{{\Sigma _\beta }_{11}(n)}} \times \nonumber \\&\Big ({p^*}^2({\sigma _{ab}}^3-\sigma _{ab}\sigma _a^2\sigma _b^2)\nonumber \\&\quad +p^*(\sigma ^2\gamma \sigma _{ab}^2-\sigma _a^4\sigma _b^2)+\sigma ^2\gamma \sigma _a^2\sigma _{ab}\Big ) \nonumber \\&\quad + \eta \frac{\gamma }{2|b|\sqrt{{\Sigma _\beta }_{22}(n)}g^2(p^*)} \Big (p^*(\sigma ^2\gamma \sigma _{b}^4\nonumber \\&\quad -\sigma _{ab}^2\sigma _b^2+ \sigma _b^4\sigma _a^2)+(\sigma ^2\gamma \sigma _{ab}\sigma _b^2\nonumber \\&\quad +\sigma _b^2\sigma _a^2\sigma _{ab}-\sigma _{ab}^3)\Big ) \end{aligned}$$
(53)

1.3 Formulation 3

The expected utility of the third proposed formulation, the expected utility of our second formulation is defined as:

$$\begin{aligned} E[U(p^*)_n]=E[R(p^*)_n]-\eta {p^*} \sqrt{{x^*}^T\Sigma _{\beta (n-1)} x^*+\sigma ^2} \end{aligned}$$

The derivative of the expected utility \(U(p^*)_n\) w.r.t. \(p^*\) is calculated as follows:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}= a+2bp^*\nonumber \\&\quad - \frac{\eta }{2\sqrt{{p^*}^2({x^*}^T\Sigma _{\beta (n-1)} x^*+\sigma ^2)}}\nonumber \\&\quad \frac{\partial \Big ({p^*}^2({x^*}^T\Sigma _{\beta (n-1)} x^* +\sigma ^2)\Big )}{\partial p^*} \end{aligned}$$
(54)

where \(\Sigma _\beta (n-1)=\begin{pmatrix}{\sigma _a}^2&{} \sigma _{ab} \\ \sigma _{ab} &{} {\sigma _b}^2 \end{pmatrix}\). Thus, the derivative of expected utility with respect to \(p^*\), \(\frac{\partial E[U(p^*)_n]}{\partial p^*}\) can be simplified into:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}= a+ 2bp^*\nonumber \\&\quad - \frac{\eta \Big (2p^*({\sigma _a}^2+2\sigma _{ab}p^*+{p^*}^2{\sigma _b}^2+\sigma ^2)+2{p^*}^2(\sigma _{ab}+{\sigma _b}^2p^*)\Big )}{2\sqrt{{p^*}^2({x^*}^T\Sigma _{\beta (n-1)} x^*+\sigma ^2)}}\nonumber \\ \end{aligned}$$
(55)

Simplifying Eq. (55) results in the following equation:

$$\begin{aligned}&\frac{\partial E[U(p^*)_n]}{\partial p^*}= a+2bp^*\nonumber \\&\quad -\eta \frac{{2{\sigma _b}^2{p^*}^2 +3\sigma _{ab}p^*+\sigma _a}^2+\sigma ^2}{\sqrt{(\sigma ^2+\sigma _a^2+2\sigma _{ab}p^*+\sigma _b^2{p^*}^2)}} \end{aligned}$$
(56)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elreedy, D., Atiya, A.F. & Shaheen, S.I. Novel pricing strategies for revenue maximization and demand learning using an exploration–exploitation framework. Soft Comput 25, 11711–11733 (2021). https://doi.org/10.1007/s00500-021-06047-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06047-y

Keywords