Uplift modeling with quasi-loss-functions | Data Mining and Knowledge Discovery Skip to main content
Log in

Uplift modeling with quasi-loss-functions

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Uplift modeling, also referred to as heterogeneous treatment effect estimation, is a machine learning technique utilized in marketing for estimating the incremental impact of treatment on the response of each customer. Uplift models face a fundamental challenge in causal inference because the variable of interest (i.e., the uplift itself) remains unobservable. As a result, popular uplift models (such as meta-learners and uplift trees) do not incorporate loss functions for uplifts in their algorithms. This article addresses that gap by proposing uplift models with quasi-loss functions (UpliftQL models), which separately use four specially designed quasi-loss functions for uplift estimation in algorithms. Using simulated data, our analysis reveals that, on average, 55% (34%) of the top five models from a set of 14 are UpliftQL models for binary (continuous) outcomes. Further empirical data analysis shows that over 60% of the top-performing models are consistently UpliftQL models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Angrist JD, Pischke JS (2008) Mostly harmless econometrics: An empiricist’s companion. Princeton University Press

    Book  Google Scholar 

  • Athey S, Imbens GW (2015) Machine learning methods for estimating heterogeneous causal effects. Stat 1050(5):1–26

    Google Scholar 

  • Austin PC (2011) An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res 46(3):399–424

    Article  Google Scholar 

  • Chawla NV (2003) C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML (Vol. 3, p. 66). CIBC, Toronto, ON, Canada

    Google Scholar 

  • Chen H, Harinen T, Lee JY, Yung M, Zhao Z (2020) Causalml: python package for causal machine learning. arXiv preprint arXiv:2002.11631

  • Chickering DM, Heckerman D (2000) Targeted advertising with inventory management. In: Proceedings of the 2nd ACM Conference on Electronic Commerce, pp 145–149

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  Google Scholar 

  • Gubela RM, Lessmann S, Jaroszewicz S (2020) Response transformation and profit decomposition for revenue uplift modeling. Eur J Oper Res 283(2):647–661

    Article  MathSciNet  Google Scholar 

  • Guelman L, Guillén M, Pérez-Marín AM (2012) Random forests for uplift modeling: an insurance customer retention case. In Modeling and Simulation in Engineering, Economics and Management: International Conference, MS 2012, New Rochelle, NY, USA, May 30-June 1 2012 Proceedings. Springer, Berlin Heidelberg, pp 123–133

    Google Scholar 

  • Gutierrez P, Gérardy JY (2017) Causal inference and uplift modelling: a review of the literature. In: International Conference on Predictive Applications and APIs. PMLR, pp 1–13

  • Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems, p 17

  • Hansotia B, Rukstales B (2002) Incremental value modeling. J Interact Mark 16(3):35–46

    Article  Google Scholar 

  • Hirano K, Imbens GW, Ridder G, Rubin DB (2001) Combining panel data sets with attrition and refreshment samples. Econometrica 69(6):1645–1645

    Article  MathSciNet  Google Scholar 

  • Hitsch, GJ, Misra, S (2018) Heterogeneous treatment effects and optimal targeting policy evaluation. Available at SSRN 3111957. https://doi.org/10.2139/ssrn.3111957

  • Hu J (2022) Customer feature selection from high-dimensional bank direct marketing data for uplift modeling. J Market Anal 11(2):160–171

    Article  Google Scholar 

  • Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press

    Book  Google Scholar 

  • Jaskowski M, Jaroszewicz S (2012) Uplift modeling for clinical trial data. In: ICML Workshop on Clinical Data Analysis, vol 46, pp 79–95

  • Künzel SR, Sekhon JS, Bickel PJ, Yu B (2019) Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 116(10):4156–4165

    Article  Google Scholar 

  • Lo VS (2002) The true lift model: a novel data mining approach to response modeling in database marketing. ACM SIGKDD Explorations Newsl 4(2):78–86

    Article  Google Scholar 

  • Louizos C, Shalit U, Mooij JM, Sontag D, Zemel R, Welling M (2017) Causal effect inference with deep latent-variable models. Advances in Neural Information Processing Systems, 30

  • Mani I, Zhang I (2003) KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, vol 126, no. 1. ICML, pp 1–7

  • Nassif H, Kuusisto F, Burnside ES, Shavlik JW (2013) Uplift Modeling with ROC: an SRL Case Study. In: ILP (Late Breaking Papers) pp 40–45

  • Nie, X, Wager, S (2017) Learning objectives for treatment effect estimation. arXiv preprint arXiv:1712.04912

  • Nie X, Wager S (2021) Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 108(2):299–319

    Article  MathSciNet  Google Scholar 

  • Radcliffe NJ (2007) Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Anal J 1(3):14–21

    Google Scholar 

  • Radcliffe NJ (2008) Hillstrom’s MineThatData email analytics challenge: an approach using uplift modelling. Stochastic Solutions Ltd., Edinburgh

    Google Scholar 

  • Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55

    Article  MathSciNet  Google Scholar 

  • Rößler J, Schoder D (2022) Bridging the Gap: A Systematic Benchmarking of Uplift Modeling and Heterogeneous Treatment Effects Methods. J Interactive Market 57(4):629–650

    Article  Google Scholar 

  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  Google Scholar 

  • Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Annals of Internal Med 127(8_Part_2):757–763

    Article  Google Scholar 

  • Rudaś K, Jaroszewicz S (2018) Linear regression for uplift modeling. Data Min Knowl Disc 32:1275–1305

    Article  MathSciNet  Google Scholar 

  • Rzepakowski P, Jaroszewicz S (2012) Decision trees for uplift modeling with single and multiple treatments. Knowl Inf Syst 32(2):303–327

    Article  Google Scholar 

  • Shaar A, Abdessalem T, Segard O (2016) Pessimistic uplift modeling. arXiv preprint arXiv:1603.09738

  • Sołtys M, Jaroszewicz S, Rzepakowski P (2015) Ensemble methods for uplift modeling. Data Min Knowl Disc 29:1531–1559

    Article  MathSciNet  Google Scholar 

  • Weisberg H, Pontes V (2015) Post hoc subgroups in Clinical Trials: Anathema or Analytics. Clin Trials 12(4):357–364

    Article  Google Scholar 

  • Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727

    Article  Google Scholar 

  • Zhang W, Li J, Liu L (2021) A unified survey of treatment effect heterogeneity modelling and uplift modelling. ACM Computing Surveys (CSUR) 54(8):1–36

    Google Scholar 

  • Zhao Z, Harinen T (2019) Uplift modeling for multiple treatments with cost optimization. In: 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp 422–431

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinping Hu.

Additional information

Responsible editor Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, J., de Haan, E. & Skiera, B. Uplift modeling with quasi-loss-functions. Data Min Knowl Disc 38, 2495–2519 (2024). https://doi.org/10.1007/s10618-024-01042-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-024-01042-x

Keywords

Navigation