{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T13:46:31Z","timestamp":1742391991614},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,4,28]],"date-time":"2023-04-28T00:00:00Z","timestamp":1682640000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,28]],"date-time":"2023-04-28T00:00:00Z","timestamp":1682640000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004871","name":"Technische Universit\u00e4t Braunschweig","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004871","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"Abstract<\/jats:title>It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the \u201cgolden-standard\u201d to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.<\/jats:p>","DOI":"10.1186\/s13321-023-00709-9","type":"journal-article","created":{"date-parts":[[2023,4,28]],"date-time":"2023-04-28T04:21:32Z","timestamp":1682655692000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation"],"prefix":"10.1186","volume":"15","author":[{"given":"Thomas-Martin","family":"Dutschmann","sequence":"first","affiliation":[]},{"given":"Lennart","family":"Kinzel","sequence":"additional","affiliation":[]},{"given":"Antonius","family":"ter Laak","sequence":"additional","affiliation":[]},{"given":"Knut","family":"Baumann","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,28]]},"reference":[{"key":"709_CR1","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","volume":"18","author":"J Vamathevan","year":"2019","unstructured":"Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18:463\u2013477. https:\/\/doi.org\/10.1038\/s41573-019-0024-5","journal-title":"Nat Rev Drug Discov"},{"key":"709_CR2","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.1016\/j.drudis.2018.01.039","volume":"23","author":"H Chen","year":"2018","unstructured":"Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today 23:1241\u20131250. https:\/\/doi.org\/10.1016\/j.drudis.2018.01.039","journal-title":"Drug Discov Today"},{"key":"709_CR3","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1002\/qsar.200390007","volume":"22","author":"A Tropsha","year":"2003","unstructured":"Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of qspr models. SAR Comb Sci 22:69\u201377. https:\/\/doi.org\/10.1002\/qsar.200390007","journal-title":"SAR Comb Sci"},{"issue":"2","key":"709_CR4","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1177\/026119290503300209","volume":"33","author":"TI Netzeva","year":"2005","unstructured":"Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MT, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA et al (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships: the report and recommendations of ECVAM Workshop 52. Altern Lab Anim 33(2):155\u2013173. https:\/\/doi.org\/10.1177\/026119290503300209","journal-title":"Altern Lab Anim"},{"key":"709_CR5","doi-asserted-by":"publisher","first-page":"474","DOI":"10.1016\/j.drudis.2020.11.027","volume":"26","author":"LH Mervin","year":"2021","unstructured":"Mervin LH, Johansson S, Semenova E, Giblin KA, Engkvist O (2021) Uncertainty quantification in drug design. Drug Discov Today 26:474\u2013489. https:\/\/doi.org\/10.1016\/j.drudis.2020.11.027","journal-title":"Drug Discov Today"},{"key":"709_CR6","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.strusafe.2008.06.020","volume":"31","author":"AD Kiureghian","year":"2009","unstructured":"Kiureghian AD, Ditlevsen O (2009) Aleatory or epistemic? Does it matter? Struct Saf 31:105\u2013112. https:\/\/doi.org\/10.1016\/j.strusafe.2008.06.020","journal-title":"Struct Saf"},{"key":"709_CR7","doi-asserted-by":"publisher","unstructured":"Tagasovska N, Lopez-Paz D, Single-model uncertainties for deep learning. https:\/\/doi.org\/10.48550\/arXiv.1811.00908","DOI":"10.48550\/arXiv.1811.00908"},{"key":"709_CR8","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1002\/minf.201501019","volume":"35","author":"M Mathea","year":"2016","unstructured":"Mathea M, Klingspohn W, Baumann K (2016) Chemoinformatic classification methods and their applicability domain. Mol Inf 35:160\u2013180. https:\/\/doi.org\/10.1002\/minf.201501019","journal-title":"Mol Inf"},{"key":"709_CR9","unstructured":"Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Sch\u00f6lkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, MA, pp 61\u201372"},{"key":"709_CR10","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Support-vector networks. Machine learningMach Learn 20:273\u2013297. https:\/\/doi.org\/10.1007\/BF00994018","journal-title":"Support-vector networks. Machine learningMach Learn"},{"key":"709_CR11","unstructured":"Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1996) Support vector regression machines. In: Mozer M, Jordan M, Petsche T (eds) Advances in neural information processing systems, MIT Press, Cambridge, MA, vol\u00a09, pp 155\u2013161. https:\/\/proceedings.neurips.cc\/paper\/1996\/file\/d38901788c533e8286cb6400b40b386d-Paper.pdf"},{"key":"709_CR12","doi-asserted-by":"publisher","unstructured":"Dietterich T (2000) Ensemble methods in machine learning. In: Lecture Notes in Computer Science 1857, International Workshop on Multiple Classifier Systems, Cagliari, Italy, 21\u201323 June 2000, pp 1\u201315, https:\/\/doi.org\/10.1007\/3-540-45014-9_1","DOI":"10.1007\/3-540-45014-9_1"},{"key":"709_CR13","unstructured":"Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, NIPS\u201917, pp 6405\u20136416"},{"key":"709_CR14","doi-asserted-by":"publisher","first-page":"3770","DOI":"10.1021\/acs.jcim.0c00502","volume":"60","author":"L Hirschfeld","year":"2020","unstructured":"Hirschfeld L, Swanson K, Yang K, Barzilay R, Coley CW (2020) Uncertainty quantification using neural networks for molecular property prediction. J Chem Inf Model 60:3770\u20133780. https:\/\/doi.org\/10.1021\/acs.jcim.0c00502","journal-title":"J Chem Inf Model"},{"key":"709_CR15","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1038\/s41524-022-00794-8","volume":"8","author":"G Palmer","year":"2022","unstructured":"Palmer G, Du S, Politowicz A, Emory JP, Yang X, Gautam A, Gupta G, Li Z, Jacobs R, Morgan D (2022) Calibration after bootstrap for accurate uncertainty quantification in regression models. NPJ Comput Mater 8:115. https:\/\/doi.org\/10.1038\/s41524-022-00794-8","journal-title":"NPJ Comput Mater"},{"key":"709_CR16","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1007\/s10994-021-05946-3","volume":"110","author":"E H\u00fcllermeier","year":"2021","unstructured":"H\u00fcllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110:457\u2013506. https:\/\/doi.org\/10.1007\/s10994-021-05946-3","journal-title":"Mach Learn"},{"key":"709_CR17","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332. https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach Learn"},{"key":"709_CR18","doi-asserted-by":"publisher","first-page":"6514","DOI":"10.3390\/molecules26216514","volume":"26","author":"TM Dutschmann","year":"2021","unstructured":"Dutschmann TM, Baumann K (2021) Evaluating high-variance leaves as uncertainty measure for random forest regression. Molecules 26:6514. https:\/\/doi.org\/10.3390\/molecules26216514","journal-title":"Molecules"},{"key":"709_CR19","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929\u20131958. https:\/\/jmlr.org\/papers\/v15\/srivastava14a.html"},{"key":"709_CR20","unstructured":"Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International conference on machine learning, PMLR, New York, New York, USA, Proceedings of Machine Learning Research, vol\u00a048, pp 1050\u20131059. http:\/\/proceedings.mlr.press\/v48\/gal16.pdf"},{"key":"709_CR21","doi-asserted-by":"publisher","unstructured":"Hara K, Saitoh D, Shouno H, Analysis of dropout learning regarded as ensemble learning. https:\/\/doi.org\/10.48550\/arXiv.1706.06859","DOI":"10.48550\/arXiv.1706.06859"},{"key":"709_CR22","doi-asserted-by":"publisher","first-page":"3330","DOI":"10.1021\/acs.jcim.9b00297","volume":"59","author":"I Cortes-Ciriano","year":"2019","unstructured":"Cortes-Ciriano I, Bender A (2019) Reliable prediction errors for deep neural networks using test-time dropout. J Chem Inf Model 59:3330\u20133339. https:\/\/doi.org\/10.1021\/acs.jcim.9b00297","journal-title":"J Chem Inf Model"},{"issue":"100","key":"709_CR23","doi-asserted-by":"publisher","first-page":"014","DOI":"10.1016\/j.ailsci.2021.100014","volume":"1","author":"TB Kimber","year":"2021","unstructured":"Kimber TB, Gagnebin M, Volkamer A (2021) Maxsmi: maximizing molecular property prediction performance with confidence estimation using smiles augmentation and deep learning. Artif Intell Life Sci 1(100):014. https:\/\/doi.org\/10.1016\/j.ailsci.2021.100014","journal-title":"Artif Intell Life Sci"},{"key":"709_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00551-x","volume":"13","author":"D Wang","year":"2021","unstructured":"Wang D, Yu J, Chen L, Li X, Jiang H, Chen K, Zheng M, Luo X (2021) A hybrid framework for improving uncertainty quantification in deep learning-based qsar regression modeling. J Cheminform 13:1\u201317. https:\/\/doi.org\/10.1186\/s13321-021-00551-x","journal-title":"J Cheminform"},{"key":"709_CR25","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.inffus.2021.05.008","volume":"76","author":"M Abdar","year":"2021","unstructured":"Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR, Makarenkov V, Nahavandi S (2021) A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inform Fusion 76:243\u2013297. https:\/\/doi.org\/10.1016\/j.inffus.2021.05.008","journal-title":"Inform Fusion"},{"key":"709_CR26","doi-asserted-by":"publisher","first-page":"1356","DOI":"10.1021\/acscentsci.1c00546","volume":"7","author":"AP Soleimany","year":"2021","unstructured":"Soleimany AP, Amini A, Goldman S, Rus D, Bhatia SN, Coley CW (2021) Evidential deep learning for guided molecular property prediction and discovery. ACS Cent Sci 7:1356\u20131367. https:\/\/doi.org\/10.1021\/acscentsci.1c00546","journal-title":"ACS Cent Sci"},{"key":"709_CR27","unstructured":"Pearce T, Leibfried F, Brintrup A (2020) Uncertainty in neural networks: approximately bayesian ensembling. In: International conference on artificial intelligence and statistics, PMLR, pp 234\u2013244. http:\/\/proceedings.mlr.press\/v108\/pearce20a\/pearce20a.pdf"},{"key":"709_CR28","doi-asserted-by":"publisher","unstructured":"Grisoni F, Consonni V, Todeschini R (2018) Impact of molecular descriptors on computational models. In: Computational chemogenomics, Springer, Humana Press, New York, NY, pp 171\u2013209. https:\/\/doi.org\/10.1007\/978-1-4939-8639-2_5","DOI":"10.1007\/978-1-4939-8639-2_5"},{"key":"709_CR29","doi-asserted-by":"publisher","unstructured":"Raghunathan S, Priyakumar UD (2021) Molecular representations for machine learning applications in chemistry. Int J Quantum Chem e26870. https:\/\/doi.org\/10.1002\/qua.26870","DOI":"10.1002\/qua.26870"},{"key":"709_CR30","doi-asserted-by":"publisher","unstructured":"Consonni V, Todeschini R (2010) Molecular Descriptors. In: Recent advances in QSAR studies, Springer, Springer, Dordrecht, pp 29\u2013102. https:\/\/doi.org\/10.1007\/978-1-4020-9783-6_3","DOI":"10.1007\/978-1-4020-9783-6_3"},{"key":"709_CR31","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268\u2013276. https:\/\/doi.org\/10.1021\/acscentsci.7b00572","journal-title":"ACS Cent Sci"},{"key":"709_CR32","doi-asserted-by":"publisher","first-page":"5936","DOI":"10.1021\/acs.jcim.0c00416","volume":"60","author":"D Hwang","year":"2020","unstructured":"Hwang D, Yang S, Kwon Y, Lee KH, Lee G, Jo H, Yoon S, Ryu S (2020) Comprehensive study on molecular supervised learning with graph neural networks. J Chem Inf Model 60:5936\u20135945. https:\/\/doi.org\/10.1021\/acs.jcim.0c00416","journal-title":"J Chem Inf Model"},{"key":"709_CR33","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370\u20133388. https:\/\/doi.org\/10.1021\/acs.jcim.9b00237","journal-title":"J Chem Inf Model"},{"key":"709_CR34","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1039\/C8SC04175J","volume":"10","author":"R Winter","year":"2019","unstructured":"Winter R, Montanari F, No\u00e9 F, Clevert DA (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692\u20131701. https:\/\/doi.org\/10.1039\/C8SC04175J","journal-title":"Chem Sci"},{"key":"709_CR35","doi-asserted-by":"publisher","first-page":"1132","DOI":"10.1021\/acs.jcim.8b00054","volume":"58","author":"F Svensson","year":"2018","unstructured":"Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A (2018) Conformal regression for quantitative structure-activity relationship modeling-quantifying prediction uncertainty. J Chem Inf Model 58:1132\u20131140. https:\/\/doi.org\/10.1021\/acs.jcim.8b00054","journal-title":"J Chem Inf Model"},{"key":"709_CR36","doi-asserted-by":"publisher","first-page":"8154","DOI":"10.1039\/C9SC00616H","volume":"10","author":"Y Zhang","year":"2019","unstructured":"Zhang Y et al (2019) Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem Sci 10:8154\u20138163. https:\/\/doi.org\/10.1039\/C9SC00616H","journal-title":"Chem Sci"},{"issue":"015","key":"709_CR37","doi-asserted-by":"publisher","first-page":"012","DOI":"10.1088\/2632-2153\/ac3eb3","volume":"3","author":"J Busk","year":"2021","unstructured":"Busk J, J\u00f8rgensen PB, Bhowmik A, Schmidt MN, Winther O, Vegge T (2021) Calibrated uncertainty for molecular property prediction using ensembles of message passing neural networks. Mach Sci Technol 3(015):012. https:\/\/doi.org\/10.1088\/2632-2153\/ac3eb3","journal-title":"Mach Sci Technol"},{"key":"709_CR38","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1021\/ci010132r","volume":"42","author":"JL Durant","year":"2002","unstructured":"Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comp Sci 42:1273\u20131280. https:\/\/doi.org\/10.1021\/ci010132r","journal-title":"J Chem Inf Comp Sci"},{"key":"709_CR39","unstructured":"Landrum G, RDKit: open-source cheminformatics software. https:\/\/www.rdkit.org. Accessed 16 Mar 2022"},{"key":"709_CR40","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785\u2013794. https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"709_CR41","doi-asserted-by":"publisher","first-page":"1576","DOI":"10.1021\/acs.jcim.6b00136","volume":"56","author":"I Cortes-Ciriano","year":"2016","unstructured":"Cortes-Ciriano I (2016) Benchmarking the predictive power of ligand efficiency indices in qsar. J Chem Inf Model 56:1576\u20131587. https:\/\/doi.org\/10.1021\/acs.jcim.6b00136","journal-title":"J Chem Inf Model"},{"key":"709_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-017-0226-y","volume":"9","author":"A Koutsoukas","year":"2017","unstructured":"Koutsoukas A, Monaghan KJ, Li X, Huan J (2017) Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminf 9:1\u201313. https:\/\/doi.org\/10.1186\/s13321-017-0226-y","journal-title":"J Cheminf"},{"key":"709_CR43","volume-title":"Machine learning: a probabilistic perspective","author":"KP Murphy","year":"2012","unstructured":"Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambrdige, MA"},{"key":"709_CR44","unstructured":"Dutschmann TM, Kinzel L, Cumulative curves for growing ensembles. https:\/\/git.rz.tu-bs.de\/impc\/baumannlab\/supporting-repository-for-ensemble-publication\/-\/tree\/main\/data\/generated_by_notebooks\/plots\/permutated_cumulative_members_curve_plots. Accessed 25 Feb 2023"},{"issue":"e0119","key":"709_CR45","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1371\/journal.pone.0119301","volume":"10","author":"J Balfer","year":"2015","unstructured":"Balfer J, Bajorath J (2015) Systematic artifacts in support vector regression-based compound potency prediction revealed by statistical and activity landscape analysis. PLOS ONE 10(e0119):301. https:\/\/doi.org\/10.1371\/journal.pone.0119301","journal-title":"PLOS ONE"},{"key":"709_CR46","doi-asserted-by":"publisher","first-page":"6371","DOI":"10.1021\/acsomega.7b01079","volume":"2","author":"R Rodriguez-Perez","year":"2017","unstructured":"Rodriguez-Perez R, Vogt M, Bajorath J (2017) Support vector machine classification and regression prioritize different structural features for binary compound activity and potency value prediction. ACS Omega 2:6371\u20136379. https:\/\/doi.org\/10.1021\/acsomega.7b01079","journal-title":"ACS Omega"},{"key":"709_CR47","doi-asserted-by":"publisher","first-page":"1636","DOI":"10.1016\/j.chemosphere.2010.11.043","volume":"82","author":"F Cheng","year":"2011","unstructured":"Cheng F, Shen J, Yu Y, Li W, Liu G, Lee PW, Tang Y (2011) In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. Chemosphere 82:1636\u20131643. https:\/\/doi.org\/10.1016\/j.chemosphere.2010.11.043","journal-title":"Chemosphere"},{"key":"709_CR48","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1007\/s10822-014-9747-x","volume":"28","author":"DL Mobley","year":"2014","unstructured":"Mobley DL, Guthrie JP (2014) FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comput-Aided Mol Des 28:711\u2013720. https:\/\/doi.org\/10.1007\/s10822-014-9747-x","journal-title":"J Comput-Aided Mol Des"},{"key":"709_CR49","doi-asserted-by":"publisher","first-page":"1535","DOI":"10.1021\/ci060117s","volume":"46","author":"GM Maggiora","year":"2006","unstructured":"Maggiora GM (2006) On outliers and activity cliffs\u2014why QSAR often disappoints. J Chem Inf Model 46:1535\u20131535. https:\/\/doi.org\/10.1021\/ci060117s","journal-title":"J Chem Inf Model"},{"key":"709_CR50","doi-asserted-by":"publisher","first-page":"2697","DOI":"10.1021\/acs.jcim.9b00975","volume":"60","author":"G Scalia","year":"2020","unstructured":"Scalia G, Grambow CA, Pernici B, Li YP, Green WH (2020) Evaluating scalable uncertainty estimation methods for deep learning-based molecular property prediction. J Chem Inf Model 60:2697\u20132717. https:\/\/doi.org\/10.1021\/acs.jcim.9b00975","journal-title":"J Chem Inf Model"},{"key":"709_CR51","doi-asserted-by":"publisher","unstructured":"Fort S, Hu H, Lakshminarayanan B (2019) Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757. https:\/\/doi.org\/10.48550\/arXiv.1912.02757","DOI":"10.48550\/arXiv.1912.02757"},{"key":"709_CR52","doi-asserted-by":"publisher","first-page":"1000","DOI":"10.1021\/ci034243x","volume":"44","author":"JS Delaney","year":"2004","unstructured":"Delaney JS (2004) ESOL: estimating aqueous solubility directly from molecular structure. J Chem Inf Comp Sci 44:1000\u20131005. https:\/\/doi.org\/10.1021\/ci034243x","journal-title":"J Chem Inf Comp Sci"},{"key":"709_CR53","unstructured":"Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. O\u2019Reilly Media, Sebastopol, CA"},{"key":"709_CR54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, F\u00e9lix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, De Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminf 12:1\u201316. https:\/\/doi.org\/10.1186\/s13321-020-00456-1","journal-title":"J Cheminf"},{"key":"709_CR55","doi-asserted-by":"publisher","first-page":"D930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, F\u00e9lix E, Magari\u00f1os MP, Mosquera JF, Mutowo P, Nowotka M et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930\u2013D940. https:\/\/doi.org\/10.1093\/nar\/gky1075","journal-title":"Nucleic Acids Res"},{"key":"709_CR56","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures\u2014a technique developed at chemical abstracts service. J Chem Doc 5:107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"key":"709_CR57","unstructured":"Winter RL (2022) Continuous and data-driven descriptors (cddd). https:\/\/github.com\/jrwnter\/cddd. Accessed 16 Mar"},{"key":"709_CR58","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830, https:\/\/www.jmlr.org\/papers\/volume12\/pedregosa11a\/pedregosa11a.pdf"},{"key":"709_CR59","unstructured":"XGBoost Developers (2022) Xgboost python package. https:\/\/xgboost.readthedocs.io\/en\/stable\/python\/. Accessed 17 Mar"},{"key":"709_CR60","unstructured":"Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Man\u00e9 D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Vi\u00e9gas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X, TensorFlow: Large-scale machine learning on heterogeneous systems. https:\/\/www.tensorflow.org\/, software available from tensorflow.org"},{"key":"709_CR61","unstructured":"Dutschmann TM, Kinzel L (2022) ensemble_uncertainties: Framework to evaluate predictive uncertainties by generating k-fold cross-validation ensembles. https:\/\/git.rz.tu-bs.de\/impc\/baumannlab\/ensemble_uncertainties. Accessed 2 Aug"},{"key":"709_CR62","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-014-0047-1","volume":"6","author":"D Baumann","year":"2014","unstructured":"Baumann D, Baumann K (2014) Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation. J Cheminf 6:1\u201319. https:\/\/doi.org\/10.1186\/s13321-014-0047-1","journal-title":"J Cheminf"},{"key":"709_CR63","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1080\/00031305.1985.10479448","volume":"39","author":"TO Kv\u00e5lseth","year":"1985","unstructured":"Kv\u00e5lseth TO (1985) Cautionary note about $$\\rm r ^{2}$$. Am Stat 39:279\u2013285. https:\/\/doi.org\/10.1080\/00031305.1985.10479448","journal-title":"Am Stat"},{"key":"709_CR64","first-page":"333","volume":"49","author":"L Michaelis","year":"1913","unstructured":"Michaelis L, Menten M (1913) Die Kinetik der Invertinwirkung. Biochem Z 49:333\u2013369","journal-title":"Biochem Z"},{"key":"709_CR65","doi-asserted-by":"publisher","first-page":"8264","DOI":"10.1021\/bi201284u","volume":"50","author":"KA Johnson","year":"2011","unstructured":"Johnson KA, Goody RS (2011) The original Michaelis Constant: translation of the 1913 Michaelis-Menten Paper. Biochemistry 50:8264\u20138269. https:\/\/doi.org\/10.1021\/bi201284u","journal-title":"Biochemistry"},{"key":"709_CR66","doi-asserted-by":"crossref","unstructured":"Dutschmann TM, Kinzel L (2023) Supporting Repository for \"Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation\". https:\/\/git.rz.tu-bs.de\/impc\/baumannlab\/supporting-repository-for-ensemble-publication\/. Accessed 25 Feb.","DOI":"10.1186\/s13321-023-00709-9"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00709-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00709-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00709-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T17:04:16Z","timestamp":1702314256000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00709-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,28]]},"references-count":66,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["709"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00709-9","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,28]]},"assertion":[{"value":"3 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"49"}}