{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T16:34:15Z","timestamp":1726763655910},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T00:00:00Z","timestamp":1699401600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T00:00:00Z","timestamp":1699401600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Energy Storage Materials Initiative"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"Abstract<\/jats:title>Deep learning models have proven to be a powerful tool for the prediction of molecular properties for applications including drug design and the development of energy storage materials. However, in order to learn accurate and robust structure\u2013property mappings, these models require large amounts of data which can be a challenge to collect given the time and resource-intensive nature of experimental material characterization efforts. Additionally, such models fail to generalize to new types of molecular structures that were not included in the model training data. The acceleration of material development through uncertainty-guided experimental design has the promise to significantly reduce the data requirements and enable faster generalization to new types of materials. To evaluate the potential of such approaches for electrolyte design applications, we perform comprehensive evaluation of existing uncertainty quantification methods on the prediction of two relevant molecular properties - aqueous solubility and redox potential. We develop novel evaluation methods to probe the utility of the uncertainty estimates for both in-domain and out-of-domain data sets. Finally, we leverage selected uncertainty estimation methods for active learning to evaluate their capacity to support experimental design.<\/jats:p>","DOI":"10.1186\/s13321-023-00753-5","type":"journal-article","created":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T17:02:03Z","timestamp":1699462923000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction"],"prefix":"10.1186","volume":"15","author":[{"given":"Tianzhixi","family":"Yin","sequence":"first","affiliation":[]},{"given":"Gihan","family":"Panapitiya","sequence":"additional","affiliation":[]},{"given":"Elizabeth D.","family":"Coda","sequence":"additional","affiliation":[]},{"given":"Emily G.","family":"Saldanha","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,11,8]]},"reference":[{"key":"753_CR1","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2018.01275","author":"BJ Neves","year":"2018","unstructured":"Neves BJ, Braga RC, Melo-Filho CC, Moreira-Filho JT, Muratov EN, Andrade CH (2018) Qsar-based virtual screening: advances and applications in drug discovery. Front Pharmacol. https:\/\/doi.org\/10.3389\/fphar.2018.01275","journal-title":"Front Pharmacol"},{"issue":"12","key":"753_CR2","doi-asserted-by":"publisher","first-page":"4977","DOI":"10.1021\/jm4004285","volume":"57","author":"A Cherkasov","year":"2014","unstructured":"Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz\u2019min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) Qsar modeling: where have you been? where are you going to? J Med Chem 57(12):4977\u20135010. https:\/\/doi.org\/10.1021\/jm4004285. (PMID: 24351051)","journal-title":"J Med Chem"},{"key":"753_CR3","doi-asserted-by":"publisher","DOI":"10.1063\/1.5023802","author":"JS Smith","year":"2018","unstructured":"Smith JS, Nebgen BT, Lubbers NE, Isayev O, Roitberg AE (2018) Less is more: sampling chemical space with active learning. J Chem Phys. https:\/\/doi.org\/10.1063\/1.5023802","journal-title":"J Chem Phys"},{"issue":"3","key":"753_CR4","doi-asserted-by":"publisher","first-page":"738","DOI":"10.1016\/j.chempr.2020.12.009","volume":"7","author":"SJ Ang","year":"2021","unstructured":"Ang SJ, Wang W, Schwalbe-Koda D, Axelrod S, G\u00f3mez-Bombarelli R (2021) Active learning accelerates ab initio molecular dynamics on reactive energy surfaces. Chem 7(3):738\u2013751. https:\/\/doi.org\/10.1016\/j.chempr.2020.12.009","journal-title":"Chem"},{"issue":"15","key":"753_CR5","doi-asserted-by":"publisher","first-page":"6338","DOI":"10.1021\/acs.chemmater.0c00768","volume":"32","author":"HA Doan","year":"2020","unstructured":"Doan HA, Agarwal G, Qian H, Counihan MJ, Rodr\u00edguez-L\u00f3pez J, Moore JS, Assary RS (2020) Quantum chemistry-informed active learning to accelerate the design and discovery of sustainable energy storage materials. Chem Mater 32(15):6338\u20136346. https:\/\/doi.org\/10.1021\/acs.chemmater.0c00768","journal-title":"Chem Mater"},{"key":"753_CR6","doi-asserted-by":"publisher","first-page":"5441","DOI":"10.1039\/C8SC00148K","volume":"9","author":"A Mayr","year":"2018","unstructured":"Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, Clevert D-A, Hochreiter S (2018) Large-scale comparison of machine learning methods for drug target prediction on chembl. Chem Sci 9:5441\u20135451. https:\/\/doi.org\/10.1039\/C8SC00148K","journal-title":"Chem Sci"},{"issue":"8","key":"753_CR7","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, Palmer A, Settels V, Jaakkola T, Jensen K, Barzilay R (2019) Analyzing learned molecular representations for property prediction. J Chem Inform Model 59(8):3370\u20133388. https:\/\/doi.org\/10.1021\/acs.jcim.9b00237","journal-title":"J Chem Inform Model"},{"issue":"18","key":"753_CR8","doi-asserted-by":"publisher","first-page":"15695","DOI":"10.1021\/acsomega.2c00642","volume":"7","author":"G Panapitiya","year":"2022","unstructured":"Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E (2022) Evaluation of deep learning architectures for aqueous solubility prediction. ACS Omega 7(18):15695\u201315710. https:\/\/doi.org\/10.1021\/acsomega.2c00642","journal-title":"ACS Omega"},{"key":"753_CR9","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/1168\/2\/022022","volume":"1168","author":"X Ying","year":"2019","unstructured":"Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Series 1168:022022. https:\/\/doi.org\/10.1088\/1742-6596\/1168\/2\/022022","journal-title":"J Phys Conf Series"},{"key":"753_CR10","unstructured":"Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe A, Triebel R, Jung P, Roscher R, Shahzad M, Yang W, Bamler R, Zhu XX (2021) A Survey of Uncertainty in Deep Neural Networks. ArXiv. https:\/\/doi.org\/10.48550\/ARXIV.2107.03342."},{"key":"753_CR11","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.inffus.2021.05.008","volume":"76","author":"M Abdar","year":"2021","unstructured":"Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X, Khosravi A, Acharya UR, Makarenkov V, Nahavandi S (2021) A review of uncertainty quantification in deep learning: techniques, applications and challenges. Inform Fusion 76:243\u2013297. https:\/\/doi.org\/10.1016\/j.inffus.2021.05.008","journal-title":"Inform Fusion"},{"key":"753_CR12","volume-title":"Uncertainty in deep learning","author":"Y Gal","year":"2016","unstructured":"Gal Y (2016) Uncertainty in deep learning. University of Cambridge, Cambridge"},{"issue":"8","key":"753_CR13","doi-asserted-by":"publisher","first-page":"3770","DOI":"10.1021\/acs.jcim.0c00502","volume":"60","author":"L Hirschfeld","year":"2020","unstructured":"Hirschfeld L, Swanson K, Yang K, Barzilay R, Coley CW (2020) Uncertainty quantification using neural networks for molecular property prediction. J Chem Inform Model 60(8):3770\u20133780","journal-title":"J Chem Inform Model"},{"issue":"9","key":"753_CR14","doi-asserted-by":"publisher","first-page":"1988","DOI":"10.1007\/s12274-019-2355-2","volume":"12","author":"V Singh","year":"2019","unstructured":"Singh V, Kim S, Kang J, Byon HR (2019) Aqueous organic redox flow batteries. Nano Res 12(9):1988\u20132001. https:\/\/doi.org\/10.1007\/s12274-019-2355-2","journal-title":"Nano Res"},{"key":"753_CR15","unstructured":"Gao P, Andersen A, Jonathan S, Panapitiya GU, Hollas AM, Saldanha EG, Murugesan V, Wang W. Organic molecular database for molecular design in redox flow battery. Publication Pending"},{"key":"753_CR16","doi-asserted-by":"publisher","first-page":"121","DOI":"10.3389\/fonc.2020.00121","volume":"10","author":"Q Cui","year":"2020","unstructured":"Cui Q, Lu S, Ni B, Zeng X, Tan Y, Chen YD, Zhao H (2020) Improved prediction of aqueous solubility of novel compounds by going deeper with deep learning. Front Oncol 10:121. https:\/\/doi.org\/10.3389\/fonc.2020.00121","journal-title":"Front Oncol"},{"key":"753_CR17","unstructured":"Reaxyz. https:\/\/www.reaxys.com\/#\/search\/quick. Accessed: 12 Oct 2020"},{"key":"753_CR18","doi-asserted-by":"publisher","unstructured":"Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. 2015. Pubchem substance and compound databases. Nucl Acids Res 44(D1). https:\/\/doi.org\/10.1093\/nar\/gkv951","DOI":"10.1093\/nar\/gkv951"},{"key":"753_CR19","doi-asserted-by":"crossref","unstructured":"Tagade PM, Adiga SP, Pandian S, Park MS, Hariharan KS, Kolake SM (2019) Attribute driven inverse materials design using deep learning bayesian framework. npj Comput Mater. https:\/\/10.1038\/s41524-019-0263-3.","DOI":"10.1038\/s41524-019-0263-3"},{"key":"753_CR20","unstructured":"Ustimenko A, Prokhorenkova L, Malinin A (2020) Uncertainty in gradient boosting via ensembles. CoRR abs\/2006.10562. arXiv:2006.10562"},{"key":"753_CR21","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Machine Learn Res 12:2825\u20132830","journal-title":"J Machine Learn Res"},{"key":"753_CR22","volume-title":"Simple and scalable predictive uncertainty estimation using deep ensembles of the 31st neural information processing systems","author":"B Lakshminarayanan","year":"2017","unstructured":"Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles of the 31st neural information processing systems. Curran Associates Inc., Red Hook"},{"key":"753_CR23","unstructured":"Gal Y, Ghahramani Z. ( 2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning 48. 1050\u20131059"},{"key":"753_CR24","unstructured":"Zellers R, Holtzman A, Rashkin H, Bisk Y, Farhadi A, Roesner F, Choi Y (2019) Defending against neural fake news. In: Wallach, H., Larochelle, H., Beygelzimer, A, d\u2019 Alch\u00e9-Buc, F, Fox, E, Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 9054\u20139065. Curran Associates, Inc., ???. http:\/\/papers.nips.cc\/paper\/9106-defending-against-neural-fake-news.pdf"},{"key":"753_CR25","doi-asserted-by":"crossref","unstructured":"Nix DA, Weigend AS (1994) Estimating the mean and variance of the target probability distribution. In: Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN\u201994), vol. 1, pp. 55\u2013601. https:\/\/10.1109\/ICNN.1994.374138.","DOI":"10.1109\/ICNN.1994.374138"},{"key":"753_CR26","unstructured":"Amini A, Schwarting W, Soleimany A, Rus D (2020) Deep evidential regression. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 14927\u201314937. Curran Associates, Inc., ???. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/aab085461de182608ee9f607f3f7d18f-Paper.pdf"},{"key":"753_CR27","unstructured":"Huang W, Zhao D, Sun F, Liu H, Chang EY (2015) Scalable gaussian process regression using deep neural networks. In: IJCAI, pp. 3576\u20133582. http:\/\/ijcai.org\/Abstract\/15\/503"},{"key":"753_CR28","unstructured":"Levi D, Gispan L, Giladi N, Fetaya E (2019) Evaluating and calibrating uncertainty prediction in regression tasks. CoRR abs\/1905.11659. arXiv:1905.11659"},{"issue":"29","key":"753_CR29","doi-asserted-by":"publisher","first-page":"861","DOI":"10.21105\/joss.00861","volume":"3","author":"L McInnes","year":"2018","unstructured":"McInnes L, Healy J, Saul N, Gro\u00dfberger L (2018) Umap: Uniform manifold approximation and projection. J Open Source Software 3(29):861. https:\/\/doi.org\/10.21105\/joss.00861","journal-title":"J Open Source Software"},{"key":"753_CR30","volume-title":"Active learning literature survey computer sciences technical report 1648","author":"B Settles","year":"2009","unstructured":"Settles B (2009) Active learning literature survey computer sciences technical report 1648. University of Wisconsin, Madison"},{"key":"753_CR31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/978-1-4471-2099-5_1","volume-title":"SIGIR \u201994","author":"DD Lewis","year":"1994","unstructured":"Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Croft BW, van Rijsbergen CJ (eds) SIGIR \u201994. Springer, London, pp 3\u201312"},{"key":"753_CR32","doi-asserted-by":"crossref","unstructured":"Zhu J, Wang H, Yao T, Tsou BK. Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). 2008. pp. 1137\u20131144. Coling 2008 Organizing Committee, Manchester, UK. https:\/\/aclanthology.org\/C08-1143","DOI":"10.3115\/1599081.1599224"},{"key":"753_CR33","doi-asserted-by":"publisher","unstructured":"Seung HS, Opper M, Sompolinsky H. Query by committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. COLT \u201992. Association for Computing Machinery. 1992. pp. 287\u2013294New York, NY, USA. https:\/\/doi.org\/10.1145\/130385.130417","DOI":"10.1145\/130385.130417"},{"key":"753_CR34","doi-asserted-by":"publisher","unstructured":"Melville P, Mooney RJ. Diverse ensembles for active learning. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML \u201904. Association for Computing Machinery. 2004. p. 74, New York, NY, USA. https:\/\/doi.org\/10.1145\/1015330.1015385","DOI":"10.1145\/1015330.1015385"},{"key":"753_CR35","unstructured":"Settles B, Craven M, Ray S. Multiple-instance active learning. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20. Curran Associates, Inc., ???.2007. https:\/\/proceedings.neurips.cc\/paper\/2007\/file\/a1519de5b5d44b31a01de013b9b51a80-Paper.pdf"},{"key":"753_CR36","doi-asserted-by":"crossref","unstructured":"Donmez P, Carbonell JG, Bennett PN (2007) Dual strategy active learning. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladeni\u010d, D., Skowron, A. (eds.) Machine Learning: ECML 2007, pp. 116\u2013127. Springer, Berlin, Heidelberg","DOI":"10.1007\/978-3-540-74958-5_14"},{"issue":"4","key":"753_CR37","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1021\/ci9803381","volume":"39","author":"D Butina","year":"1999","unstructured":"Butina D (1999) Unsupervised data base clustering based on daylight\u2019s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Inform Computer Sci 39(4):747\u2013750. https:\/\/doi.org\/10.1021\/ci9803381","journal-title":"J Chem Inform Computer Sci"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00753-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00753-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00753-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T17:07:26Z","timestamp":1699463246000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00753-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,8]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["753"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00753-5","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,8]]},"assertion":[{"value":"1 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 November 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"105"}}