{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T23:31:39Z","timestamp":1726356699293},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,2,4]],"date-time":"2021-02-04T00:00:00Z","timestamp":1612396800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,4]],"date-time":"2021-02-04T00:00:00Z","timestamp":1612396800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"crossref","award":["326280","326339"],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2021,5]]},"abstract":"Abstract<\/jats:title>Regression analysis is a standard supervised machine learning method used to model an outcome variable in terms of a set of predictor variables. In most real-world applications the true value of the outcome variable we want to predict is unknown outside the training data, i.e., the ground truth is unknown. Phenomena such as overfitting and concept drift make it difficult to directly observe when the estimate from a model potentially is wrong. In this paper we present an efficient framework for estimating the generalization error of regression functions, applicable to any family of regression functions when the ground truth is unknown. We present a theoretical derivation of the framework and empirically evaluate its strengths and limitations. We find that it performs robustly and is useful for detecting concept drift in datasets in several real-world domains.<\/jats:p>","DOI":"10.1007\/s10618-021-00739-7","type":"journal-article","created":{"date-parts":[[2021,2,4]],"date-time":"2021-02-04T17:06:12Z","timestamp":1612458372000},"page":"726-747","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Detecting virtual concept drift of regressors without ground truth values"],"prefix":"10.1007","volume":"35","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-9623-6282","authenticated-orcid":false,"given":"Emilia","family":"Oikarinen","sequence":"first","affiliation":[]},{"given":"Henri","family":"Tiittanen","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4040-6967","authenticated-orcid":false,"given":"Andreas","family":"Henelius","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1819-1047","authenticated-orcid":false,"given":"Kai","family":"Puolam\u00e4ki","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,2,4]]},"reference":[{"key":"739_CR1","doi-asserted-by":"crossref","unstructured":"Bifet A, Frank E (2010) Sentiment knowledge discovery in twitter streaming data. In: Proceedings of 13th international conference on discovery science DS 2010. Springer, LNAI, vol 6332, pp 1\u201315","DOI":"10.1007\/978-3-642-16184-1_1"},{"key":"739_CR2","doi-asserted-by":"crossref","unstructured":"Bingham E, Gionis A, Haiminen N, Hiisil\u00e4 H, Mannila H, Terzi E (2006) Segmentation and dimensionality reduction. In: Proceedings of the 2006 SIAM international conference on data mining, SIAM, pp 372\u2013383","DOI":"10.1137\/1.9781611972764.33"},{"key":"739_CR3","doi-asserted-by":"crossref","unstructured":"Chandola V, Vatsavai RR (2011) A Gaussian process based online change detection algorithm for monitoring periodic time series. In: Proceedings of the 11th SIAM international conference on data mining, SDM, SIAM, pp 95\u2013106","DOI":"10.1137\/1.9781611972818.9"},{"key":"739_CR4","unstructured":"Dasu T, Krishnan S, Venkatasubramanian S, Yi K (2006) An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings of symposium on the interface of statistics, computing science, and applications INTERFACE"},{"issue":"2","key":"739_CR5","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s13748-013-0040-3","volume":"2","author":"H Fanaee-T","year":"2014","unstructured":"Fanaee-T H, Gama J (2014) Event labeling combining ensemble detectors and background knowledge. Prog Artif Intell 2(2):113\u2013127","journal-title":"Prog Artif Intell"},{"issue":"8","key":"739_CR6","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","volume":"27","author":"T Fawcett","year":"2006","unstructured":"Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861\u2013874","journal-title":"Pattern Recognit Lett"},{"key":"739_CR7","unstructured":"FCGI (2019) Finnish Grid and Cloud Infrastructure. Urn:nbn:fi:research-infras-2016072533"},{"issue":"4","key":"739_CR8","doi-asserted-by":"publisher","first-page":"44:1","DOI":"10.1145\/2523813","volume":"46","author":"J Gama","year":"2014","unstructured":"Gama J, \u017dliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1\u201344:37","journal-title":"ACM Comput Surv"},{"key":"739_CR9","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: data mining, inference, and prediction","author":"T Hastie","year":"2009","unstructured":"Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin"},{"key":"739_CR10","doi-asserted-by":"crossref","unstructured":"Huggard H, Koh YS, Riddle P, Olivares G (2018) Predicting air quality from low-cost sensor measurements. In: Proceedings of Australasian conference on data mining AusDM 2018, Springer, CCIS, vol 996, pp 94\u2013106","DOI":"10.1007\/978-981-13-6661-1_8"},{"key":"739_CR11","unstructured":"Hyndman RJ, Athanasopoulos G (2018) Forecasting: principles and practice, 2nd edn. OTexts. https:\/\/otexts.com\/fpp2\/. Accessed 15 May 2020"},{"key":"739_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.compchemeng.2010.07.034","volume":"35","author":"P Kadlec","year":"2011","unstructured":"Kadlec P, Grbi\u0107 R, Gabrys B (2011) Review of adaptation mechanisms for data-driven soft sensors. Comput Chem Eng 35:1\u201324","journal-title":"Comput Chem Eng"},{"key":"739_CR13","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s10994-016-5588-2","volume":"106","author":"V Kuznetsov","year":"2017","unstructured":"Kuznetsov V, Mohri M (2017) Generalization bounds for non-stationary mixing processes. Mach Learn 106:93\u2013117","journal-title":"Mach Learn"},{"key":"739_CR14","unstructured":"Lindstrom P, Delany SJ, Mac\u00a0Namee B (2010) Handling concept drift in a text data stream constrained by high labelling cost. In: Proceedings to the 23rd international FLAIRS conference, pp 32\u201337"},{"issue":"1","key":"739_CR15","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/s12530-012-9061-6","volume":"4","author":"P Lindstrom","year":"2013","unstructured":"Lindstrom P, Namee BM, Delany SJ (2013) Drift detection using uncertainty distribution divergence. Evol Syst 4(1):13\u201325","journal-title":"Evol Syst"},{"issue":"12","key":"739_CR16","doi-asserted-by":"publisher","first-page":"2346","DOI":"10.1109\/TKDE.2019.2894131","volume":"31","author":"J Lu","year":"2019","unstructured":"Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2019) Learning under concept drift: a review. IEEE Trans Knowl Data Eng 31(12):2346\u20132363","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"739_CR17","doi-asserted-by":"publisher","first-page":"4857","DOI":"10.1109\/JIOT.2018.2853660","volume":"5","author":"B Maag","year":"2018","unstructured":"Maag B, Zhou Z, Thiele L (2018) A survey on sensor calibration in air pollution monitoring deployments. IEEE Internet Things J 5:4857\u20134870","journal-title":"IEEE Internet Things J"},{"key":"739_CR18","doi-asserted-by":"crossref","unstructured":"Mohri M, Medina AM (2012) New analysis and algorithm for learning with drifting distributions. In: Algorithmic learning theory. ALT 2012. Springer, LNCS, vol 7568","DOI":"10.1007\/978-3-642-34106-9_13"},{"key":"739_CR19","doi-asserted-by":"crossref","unstructured":"Qahtan AA, Alharbi B, Wang S, Zhang X (2015) A PCA-based change detection framework for multidimensional data streams. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 935\u2013944","DOI":"10.1145\/2783258.2783359"},{"key":"739_CR20","doi-asserted-by":"publisher","first-page":"433","DOI":"10.3389\/fchem.2018.00433","volume":"6","author":"A Rudnitskaya","year":"2018","unstructured":"Rudnitskaya A (2018) Calibration update and drift correction for electronic noses and tongues. Front Chem 6:433","journal-title":"Front Chem"},{"issue":"3","key":"739_CR21","first-page":"317","volume":"1","author":"JC Schlimmer","year":"1986","unstructured":"Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317\u2013354","journal-title":"Mach Learn"},{"key":"739_CR22","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1016\/j.eswa.2017.04.008","volume":"82","author":"TS Sethi","year":"2017","unstructured":"Sethi TS, Kantardzic M (2017) On the reliable detection of concept drift from streaming unlabeled data. Expert Syst Appl 82:77\u201399","journal-title":"Expert Syst Appl"},{"key":"739_CR23","doi-asserted-by":"crossref","unstructured":"Shao J, Ahmadi Z, Kramer S (2014) Prototype-based learning on concept-drifting data streams. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 412\u2013421","DOI":"10.1145\/2623330.2623609"},{"issue":"4","key":"739_CR24","first-page":"462","volume":"19","author":"P Sobolewski","year":"2013","unstructured":"Sobolewski P, Wozniak M (2013) Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors. J Univ Comput Sci 19(4):462\u2013483","journal-title":"J Univ Comput Sci"},{"key":"739_CR25","unstructured":"Tiittanen H, Oikarinen E, Henelius A, Puolam\u00e4ki K (2019) Drifter. https:\/\/github.com\/edahelsinki\/drifter. Accessed 15 May 2020"},{"key":"739_CR26","unstructured":"US Department of Transportation (2017) 2015 Flight Delays and Cancellations. https:\/\/www.kaggle.com\/usdot\/flight-delays. Accessed 15 May 2020"},{"key":"739_CR27","doi-asserted-by":"publisher","first-page":"320","DOI":"10.1016\/j.snb.2012.01.074","volume":"166\u2013167","author":"A Vergara","year":"2012","unstructured":"Vergara A, Vembu S, Ayhan T, Ryan MA, LHomer M, Huerta R (2012) Chemical gas sensor drift compensation using classifier ensembles. Sens Actuators B Chem 166\u2013167:320\u2013329","journal-title":"Sens Actuators B Chem"},{"issue":"2","key":"739_CR28","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1016\/j.snb.2007.09.060","volume":"129","author":"SD Vito","year":"2008","unstructured":"Vito SD, Massera E, Piga M, Martinotto L, Francia GD (2008) On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens Actuators B Chem 129(2):750\u2013757","journal-title":"Sens Actuators B Chem"},{"key":"739_CR29","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1016\/j.csda.2016.11.002","volume":"108","author":"LY Wang","year":"2017","unstructured":"Wang LY, Park C, Yeon K, Choi H (2017) Tracking concept drift using a constrained penalized regression combiner. Comput Stat Data Anal 108:52\u201369","journal-title":"Comput Stat Data Anal"},{"key":"739_CR30","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1007\/978-3-319-26989-4_4","volume-title":"Big data analysis: new algorithms for a new society","author":"I \u017dliobaite","year":"2016","unstructured":"\u017dliobaite I, Pechenizkiy M, Gama J (2016) An overview of concept drift applications. In: Japkowicz N, Stefanowski J (eds) Big data analysis: new algorithms for a new society. Springer, Cham, pp 91\u2013114"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00739-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-021-00739-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00739-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,2]],"date-time":"2021-07-02T09:23:47Z","timestamp":1625217827000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-021-00739-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,4]]},"references-count":30,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5]]}},"alternative-id":["739"],"URL":"https:\/\/doi.org\/10.1007\/s10618-021-00739-7","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,4]]},"assertion":[{"value":"15 May 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}