Abstract
The performance of machine learning models is known to deteriorate on datasets drawn from a different distribution from the one used in model building. In a supervised setting, this deterioration can be assessed by a decrease in the evaluation metrics of the models. When no gold standard information is available, and thus evaluation metrics cannot be determined, one may directly address the problem of detecting whether two datasets differ in distribution. Methods for assessing the difference of distribution from their samples are known as covariate shift detection algorithms.
We investigate the ability of the maximum mean discrepancy method, of univariate tests, and of a domain classifier trained to distinguish two datasets, to detect covariate shift in two datasets: one collected for predicting stroke, and one collected for predicting acute myocardial infarction. For this, we artificially perturb parts of the datasets, and check how well these modified datasets can be distinguished from the remaining portions of the original datasets. We observe that univariate tests compare favorably with the other two methods, that changes can be detected more easily in large datasets, that smaller changes are more difficult to detect than larger changes, and that dimensionality reduction is detrimental to detecting covariate shift.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dreiseitl, S., Osl, M.: Testing the calibration of classification models from first principles. In: Proceedings of the AMIA Annual Fall Symposium 2012, Chicago, USA, pp. 164–169 (2012)
Fortet, R., Mourier, E.: Convergence de la réparation empirique vers la réparation théorique. Annales Scientifiques de l’École Normale Supérieure 70, 266–285 (1953)
Gama, J.I.Z., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46, 1–37 (2014)
Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012)
Kelly, C., Karthikesalingam, A., Suleyman, M., Corrado, G., King, D.: Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019)
Kennedy, R., Burton, A., Fraser, H., McStay, L., Harrison, R.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. Eur. Heart J. 17, 1181–1191 (1996)
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31, 2346–2363 (2019)
Rabanser, S., Günnemann, S., Lipton, Z.: Failing loudly: an empirical study of methods for detecting dataset shift. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), pp. 1396–1408 (2019)
Riley, R., et al.: External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. Br. Med. J. 353, i3140 (2016)
Smola, A., Gretton, A., Borgwardt, K.: Maximum mean discrepancy. Technical report NICTA-SML-06-001, National ICT Australia (2006)
Song, X., et al.: Cross-site transportability of an explainable artificial intelligence model for acute kidney injury prediction. Nat. Commun. 1, 5668 (2020)
Soriano, F.: Stroke prediction dataset. https://www.kaggle.com/fedesoriano/stroke-prediction-dataset. Accessed 15 July 2021
Steinwart, I.: On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2, 67–93 (2002)
Steyerberg, E., Harrell Jr., F.: Prediction models need appropriate internal, internal-external, and external validation. J. Clin. Epidemiol. 69, 245–247 (2016)
Takahashi, C., Braga, A.: A review of off-line mode dataset shifts. IEEE Comput. Intell. Mag. 15, 16–27 (2020)
Van Looveren, A., Vacanti, G., Klaise, J., Coca, A., Cobb, O.: Alibi detect: algorithms for outlier, adversarial and drift detection. version 0.7.2. https://github.com/SeldonIO/alibi-detect. Accessed 10 July 2021
Yu, K.H., Beam, A., Kohane, I.: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Dreiseitl, S. (2022). A Comparison of Covariate Shift Detection Methods on Medical Datasets. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds) Computer Aided Systems Theory – EUROCAST 2022. EUROCAST 2022. Lecture Notes in Computer Science, vol 13789. Springer, Cham. https://doi.org/10.1007/978-3-031-25312-6_57
Download citation
DOI: https://doi.org/10.1007/978-3-031-25312-6_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25311-9
Online ISBN: 978-3-031-25312-6
eBook Packages: Computer ScienceComputer Science (R0)