A Comparison of Covariate Shift Detection Methods on Medical Datasets

Dreiseitl, Stephan

doi:10.1007/978-3-031-25312-6_57

Stephan Dreiseitl ORCID: orcid.org/0000-0001-5647-6153¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13789))

Included in the following conference series:

International Conference on Computer Aided Systems Theory

924 Accesses
1 Citations

Abstract

The performance of machine learning models is known to deteriorate on datasets drawn from a different distribution from the one used in model building. In a supervised setting, this deterioration can be assessed by a decrease in the evaluation metrics of the models. When no gold standard information is available, and thus evaluation metrics cannot be determined, one may directly address the problem of detecting whether two datasets differ in distribution. Methods for assessing the difference of distribution from their samples are known as covariate shift detection algorithms.

We investigate the ability of the maximum mean discrepancy method, of univariate tests, and of a domain classifier trained to distinguish two datasets, to detect covariate shift in two datasets: one collected for predicting stroke, and one collected for predicting acute myocardial infarction. For this, we artificially perturb parts of the datasets, and check how well these modified datasets can be distinguished from the remaining portions of the original datasets. We observe that univariate tests compare favorably with the other two methods, that changes can be detected more easily in large datasets, that smaller changes are more difficult to detect than larger changes, and that dimensionality reduction is detrimental to detecting covariate shift.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Effective sample size, dimensionality, and generalization in covariate shift adaptation

Article 08 January 2022

A selective overview of feature screening for ultrahigh-dimensional data

Article 22 August 2015

Extreme Observations in Biomedical Data

References

Dreiseitl, S., Osl, M.: Testing the calibration of classification models from first principles. In: Proceedings of the AMIA Annual Fall Symposium 2012, Chicago, USA, pp. 164–169 (2012)
Google Scholar
Fortet, R., Mourier, E.: Convergence de la réparation empirique vers la réparation théorique. Annales Scientifiques de l’École Normale Supérieure 70, 266–285 (1953)
MATH Google Scholar
Gama, J.I.Z., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014)
Article MATH Google Scholar
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
MathSciNet MATH Google Scholar
Kelly, C., Karthikesalingam, A., Suleyman, M., Corrado, G., King, D.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019)
Article Google Scholar
Kennedy, R., Burton, A., Fraser, H., McStay, L., Harrison, R.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur. Heart J. 17, 1181–1191 (1996)
Article Google Scholar
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019)
Google Scholar
Rabanser, S., Günnemann, S., Lipton, Z.: Failing loudly: an empirical study of methods for detecting dataset shift. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1396–1408 (2019)
Google Scholar
Riley, R., et al.: External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. Br. Med. J. 353, i3140 (2016)
Article Google Scholar
Smola, A., Gretton, A., Borgwardt, K.: Maximum mean discrepancy. Technical report NICTA-SML-06-001, National ICT Australia (2006)
Google Scholar
Song, X., et al.: Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat. Commun. 1, 5668 (2020)
Article Google Scholar
Soriano, F.: Stroke prediction dataset. https://www.kaggle.com/fedesoriano/stroke-prediction-dataset. Accessed 15 July 2021
Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2, 67–93 (2002)
MathSciNet MATH Google Scholar
Steyerberg, E., Harrell Jr., F.: Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016)
Google Scholar
Takahashi, C., Braga, A.: A review of off-line mode dataset shifts. IEEE Comput. Intell. Mag. 15, 16–27 (2020)
Article Google Scholar
Van Looveren, A., Vacanti, G., Klaise, J., Coca, A., Cobb, O.: Alibi detect: algorithms for outlier, adversarial and drift detection. version 0.7.2. https://github.com/SeldonIO/alibi-detect. Accessed 10 July 2021
Yu, K.H., Beam, A., Kohane, I.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, University of Applied Sciences Upper Austria, 4232, Hagenberg, Austria
Stephan Dreiseitl

Authors

Stephan Dreiseitl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephan Dreiseitl .

Editor information

Editors and Affiliations

University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Roberto Moreno-Díaz
Johannes Kepler University, Linz, Oberösterreich, Austria
Franz Pichler
Department of Computer Science and Institute of Cybernetics, University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Alexis Quesada-Arencibia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dreiseitl, S. (2022). A Comparison of Covariate Shift Detection Methods on Medical Datasets. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_57

Download citation

DOI: https://doi.org/10.1007/978-3-031-25312-6_57
Published: 10 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25311-9
Online ISBN: 978-3-031-25312-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Comparison of Covariate Shift Detection Methods on Medical Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Effective sample size, dimensionality, and generalization in covariate shift adaptation

A selective overview of feature screening for ultrahigh-dimensional data

Extreme Observations in Biomedical Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comparison of Covariate Shift Detection Methods on Medical Datasets

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Effective sample size, dimensionality, and generalization in covariate shift adaptation

A selective overview of feature screening for ultrahigh-dimensional data

Extreme Observations in Biomedical Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation