A Comparison of Covariate Shift Detection Methods on Medical Datasets | SpringerLink
Skip to main content

A Comparison of Covariate Shift Detection Methods on Medical Datasets

  • Conference paper
  • First Online:
Computer Aided Systems Theory – EUROCAST 2022 (EUROCAST 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13789))

Included in the following conference series:

Abstract

The performance of machine learning models is known to deteriorate on datasets drawn from a different distribution from the one used in model building. In a supervised setting, this deterioration can be assessed by a decrease in the evaluation metrics of the models. When no gold standard information is available, and thus evaluation metrics cannot be determined, one may directly address the problem of detecting whether two datasets differ in distribution. Methods for assessing the difference of distribution from their samples are known as covariate shift detection algorithms.

We investigate the ability of the maximum mean discrepancy method, of univariate tests, and of a domain classifier trained to distinguish two datasets, to detect covariate shift in two datasets: one collected for predicting stroke, and one collected for predicting acute myocardial infarction. For this, we artificially perturb parts of the datasets, and check how well these modified datasets can be distinguished from the remaining portions of the original datasets. We observe that univariate tests compare favorably with the other two methods, that changes can be detected more easily in large datasets, that smaller changes are more difficult to detect than larger changes, and that dimensionality reduction is detrimental to detecting covariate shift.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dreiseitl, S., Osl, M.: Testing the calibration of classification models from first principles. In: Proceedings of the AMIA Annual Fall Symposium 2012, Chicago, USA, pp. 164–169 (2012)

    Google Scholar 

  2. Fortet, R., Mourier, E.: Convergence de la réparation empirique vers la réparation théorique. Annales Scientifiques de l’École Normale Supérieure 70, 266–285 (1953)

    MATH  Google Scholar 

  3. Gama, J.I.Z., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014)

    Article  MATH  Google Scholar 

  4. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Kelly, C., Karthikesalingam, A., Suleyman, M., Corrado, G., King, D.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019)

    Article  Google Scholar 

  6. Kennedy, R., Burton, A., Fraser, H., McStay, L., Harrison, R.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur. Heart J. 17, 1181–1191 (1996)

    Article  Google Scholar 

  7. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019)

    Google Scholar 

  8. Rabanser, S., Günnemann, S., Lipton, Z.: Failing loudly: an empirical study of methods for detecting dataset shift. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1396–1408 (2019)

    Google Scholar 

  9. Riley, R., et al.: External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. Br. Med. J. 353, i3140 (2016)

    Article  Google Scholar 

  10. Smola, A., Gretton, A., Borgwardt, K.: Maximum mean discrepancy. Technical report NICTA-SML-06-001, National ICT Australia (2006)

    Google Scholar 

  11. Song, X., et al.: Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat. Commun. 1, 5668 (2020)

    Article  Google Scholar 

  12. Soriano, F.: Stroke prediction dataset. https://www.kaggle.com/fedesoriano/stroke-prediction-dataset. Accessed 15 July 2021

  13. Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2, 67–93 (2002)

    MathSciNet  MATH  Google Scholar 

  14. Steyerberg, E., Harrell Jr., F.: Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016)

    Google Scholar 

  15. Takahashi, C., Braga, A.: A review of off-line mode dataset shifts. IEEE Comput. Intell. Mag. 15, 16–27 (2020)

    Article  Google Scholar 

  16. Van Looveren, A., Vacanti, G., Klaise, J., Coca, A., Cobb, O.: Alibi detect: algorithms for outlier, adversarial and drift detection. version 0.7.2. https://github.com/SeldonIO/alibi-detect. Accessed 10 July 2021

  17. Yu, K.H., Beam, A., Kohane, I.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Dreiseitl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dreiseitl, S. (2022). A Comparison of Covariate Shift Detection Methods on Medical Datasets. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25312-6_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25311-9

  • Online ISBN: 978-3-031-25312-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics