{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T08:30:55Z","timestamp":1726389055457},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T00:00:00Z","timestamp":1698364800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T00:00:00Z","timestamp":1698364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Defense Science & Engineering Graduate (NDSEG) Fellowship Program"},{"name":"Duke Science & Technology Initiative"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"Abstract<\/jats:title>Established molecular machine learning models process individual molecules as inputs to predict their biological, chemical, or physical properties. However, such algorithms require large datasets and have not been optimized to predict property differences between molecules, limiting their ability to learn from smaller datasets and to directly compare the anticipated properties of two molecules. Many drug and material development tasks would benefit from an algorithm that can directly compare two molecules to guide molecular optimization and prioritization, especially for tasks with limited available data. Here, we develop DeepDelta, a pairwise deep learning approach that processes two molecules simultaneously and learns to predict property differences between two molecules from small datasets. On 10 ADMET benchmark tasks, our DeepDelta approach significantly outperforms two established molecular machine learning algorithms, the directed message passing neural network (D-MPNN) ChemProp and Random Forest using radial fingerprints, for 70% of benchmarks in terms of Pearson\u2019s r, 60% of benchmarks in terms of mean absolute error (MAE), and all external test sets for both Pearson\u2019s r and MAE. We further analyze our performance and find that DeepDelta is particularly outperforming established approaches at predicting large differences in molecular properties and can perform scaffold hopping. Furthermore, we derive mathematically fundamental computational tests of our models based on mathematical invariants and show that compliance to these tests correlates with overall model performance\u00a0\u2014\u00a0providing an innovative, unsupervised, and easily computable measure of expected model performance and applicability. Taken together, DeepDelta provides an accurate approach to predict molecular property differences by directly training on molecular pairs and their property differences to further support fidelity and transparency in molecular optimization for drug development and the chemical sciences.<\/jats:p>","DOI":"10.1186\/s13321-023-00769-x","type":"journal-article","created":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T02:02:09Z","timestamp":1698372129000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["DeepDelta: predicting ADMET improvements of molecular derivatives with deep learning"],"prefix":"10.1186","volume":"15","author":[{"given":"Zachary","family":"Fralish","sequence":"first","affiliation":[]},{"given":"Ashley","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Paul","family":"Skaluba","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Reker","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,27]]},"reference":[{"key":"769_CR1","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1038\/nrd1032","volume":"2","author":"H van de Waterbeemd","year":"2003","unstructured":"van de Waterbeemd H, Gifford E (2003) ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov 2:192\u2013204. https:\/\/doi.org\/10.1038\/nrd1032","journal-title":"Nat Rev Drug Discov"},{"key":"769_CR2","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1038\/nrd3681","volume":"11","author":"JW Scannell","year":"2012","unstructured":"Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov 11:191\u2013200","journal-title":"Nat Rev Drug Discov"},{"key":"769_CR3","doi-asserted-by":"publisher","first-page":"1702","DOI":"10.1016\/j.drudis.2020.07.001","volume":"25","author":"AH G\u00f6ller","year":"2020","unstructured":"G\u00f6ller AH, Kuhnke L, Montanari F et al (2020) Bayer\u2019s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25:1702\u20131709","journal-title":"Drug Discov Today"},{"key":"769_CR4","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-023-02361-0","author":"C Arnold","year":"2023","unstructured":"Arnold C (2023) Inside the nascent industry of AI-designed drugs. Nat Med. https:\/\/doi.org\/10.1038\/s41591-023-02361-0","journal-title":"Nat Med"},{"key":"769_CR5","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1007\/s10822-020-00317-x","volume":"34","author":"N Brown","year":"2020","unstructured":"Brown N, Ertl P, Lewis R et al (2020) Artificial intelligence in chemistry and drug design. J Comput Aided Mol Des 34:709\u2013715","journal-title":"J Comput Aided Mol Des"},{"key":"769_CR6","doi-asserted-by":"publisher","first-page":"10520","DOI":"10.1021\/acs.chemrev.8b00728","volume":"119","author":"X Yang","year":"2019","unstructured":"Yang X, Wang Y, Byrne R et al (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119:10520\u201310594","journal-title":"Chem Rev"},{"key":"769_CR7","doi-asserted-by":"publisher","first-page":"869","DOI":"10.1021\/CT800011M\/ASSET\/CT800011M.FP.PNG_V03","volume":"4","author":"WL Jorgensen","year":"2008","unstructured":"Jorgensen WL, Thomas LL (2008) Perspective on free-energy perturbation calculations for chemical equilibria. J Chem Theory Comput 4:869\u2013876. https:\/\/doi.org\/10.1021\/CT800011M\/ASSET\/CT800011M.FP.PNG_V03","journal-title":"J Chem Theory Comput"},{"key":"769_CR8","doi-asserted-by":"publisher","first-page":"10911","DOI":"10.1039\/C9SC04606B","volume":"10","author":"J Jim\u00e9nez-Luna","year":"2019","unstructured":"Jim\u00e9nez-Luna J, P\u00e9rez-Benito L, Martinez-Rosell G et al (2019) DeltaDelta neural networks for lead optimization of small molecule potency. Chem Sci 10:10911\u201310918","journal-title":"Chem Sci"},{"key":"769_CR9","doi-asserted-by":"publisher","first-page":"1819","DOI":"10.1021\/acs.jcim.1c01497","volume":"62","author":"AT McNutt","year":"2022","unstructured":"McNutt AT, Koes DR (2022) Improving \u0394\u0394g predictions with a multitask convolutional Siamese network. J Chem Inf Model 62:1819\u20131829","journal-title":"J Chem Inf Model"},{"key":"769_CR10","doi-asserted-by":"publisher","first-page":"724","DOI":"10.1016\/J.DRUDIS.2013.03.003","volume":"18","author":"AG Dossetter","year":"2013","unstructured":"Dossetter AG, Griffen EJ, Leach AG (2013) Matched molecular pair analysis in drug discovery. Drug Discov Today 18:724\u2013731. https:\/\/doi.org\/10.1016\/J.DRUDIS.2013.03.003","journal-title":"Drug Discov Today"},{"key":"769_CR11","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1021\/ci034160g","volume":"43","author":"V Svetnik","year":"2003","unstructured":"Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947\u20131958. https:\/\/doi.org\/10.1021\/ci034160g","journal-title":"J Chem Inf Comput Sci"},{"key":"769_CR12","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"K Yang","year":"2019","unstructured":"Yang K, Swanson K, Jin W et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370\u20133388","journal-title":"J Chem Inf Model"},{"key":"769_CR13","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1021\/acs.jcim.8b00363","volume":"59","author":"S Liu","year":"2018","unstructured":"Liu S, Alnammi M, Ericksen SS et al (2018) Practical model selection for prospective virtual screening. J Chem Inf Model 59:282\u2013293","journal-title":"J Chem Inf Model"},{"key":"769_CR14","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1016\/j.bioorg.2019.03.052","volume":"87","author":"A-M Alaa","year":"2019","unstructured":"Alaa A-M, El-Azab AS, Bua S et al (2019) Design, synthesis, and carbonic anhydrase inhibition activity of benzenesulfonamide-linked novel pyrazoline derivatives. Bioorg Chem 87:425\u2013431","journal-title":"Bioorg Chem"},{"key":"769_CR15","doi-asserted-by":"publisher","first-page":"14245","DOI":"10.1038\/s41598-021-93771-y","volume":"11","author":"R Rodr\u00edguez-P\u00e9rez","year":"2021","unstructured":"Rodr\u00edguez-P\u00e9rez R, Bajorath J (2021) Feature importance correlation from machine learning indicates functional relationships between proteins and similar compound binding characteristics. Sci Rep 11:14245","journal-title":"Sci Rep"},{"key":"769_CR16","doi-asserted-by":"publisher","first-page":"3919","DOI":"10.1039\/C5SC04272K","volume":"7","author":"D Reker","year":"2016","unstructured":"Reker D, Schneider P, Schneider G (2016) Multi-objective active machine learning rapidly improves structure\u2013activity models and reveals new protein\u2013protein interaction inhibitors. Chem Sci 7:3919\u20133927","journal-title":"Chem Sci"},{"key":"769_CR17","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1021\/ci900450m","volume":"50","author":"J Hussain","year":"2010","unstructured":"Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339\u2013348","journal-title":"J Chem Inf Model"},{"key":"769_CR18","doi-asserted-by":"publisher","first-page":"3231","DOI":"10.1021\/acs.jcim.0c00102","volume":"60","author":"S Zheng","year":"2020","unstructured":"Zheng S, Xiong J, Wang Y et al (2020) Quantitative prediction of hemolytic toxicity for small molecules and their potential hemolytic fragments by machine learning and recursive fragmentation methods. J Chem Inf Model 60:3231\u20133245","journal-title":"J Chem Inf Model"},{"key":"769_CR19","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1021\/acs.jcim.5b00642","volume":"56","author":"N-N Wang","year":"2016","unstructured":"Wang N-N, Dong J, Deng Y-H et al (2016) ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J Chem Inf Model 56:763\u2013773","journal-title":"J Chem Inf Model"},{"key":"769_CR20","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1007\/s10822-014-9747-x","volume":"28","author":"DL Mobley","year":"2014","unstructured":"Mobley DL, Guthrie JP (2014) FreeSolv: a database of experimental and calculated hydration free energies, with input files. J Comput Aided Mol Des 28:711\u2013720","journal-title":"J Comput Aided Mol Des"},{"key":"769_CR21","doi-asserted-by":"publisher","first-page":"2042","DOI":"10.1021\/acs.jcim.6b00044","volume":"56","author":"F Lombardo","year":"2016","unstructured":"Lombardo F, Jing Y (2016) In silico prediction of volume of distribution in humans. Extensive data set and the exploration of linear and nonlinear methods coupled with molecular interaction fields descriptors. J Chem Inf Model 56:2042\u20132052","journal-title":"J Chem Inf Model"},{"key":"769_CR22","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1124\/dmd.118.082966","volume":"46","author":"F Lombardo","year":"2018","unstructured":"Lombardo F, Berellini G, Obach RS (2018) Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 1352 drug compounds. Drug Metab Dispos 46:1466\u20131477","journal-title":"Drug Metab Dispos"},{"key":"769_CR23","doi-asserted-by":"publisher","first-page":"3251","DOI":"10.1021\/acs.jcim.9b00180","volume":"59","author":"T Esaki","year":"2019","unstructured":"Esaki T, Ohashi R, Watanabe R et al (2019) Computational model to predict the fraction of unbound drug in the brain. J Chem Inf Model 59:3251\u20133261","journal-title":"J Chem Inf Model"},{"key":"769_CR24","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1016\/j.ejmech.2012.06.043","volume":"57","author":"L Di","year":"2012","unstructured":"Di L, Keefer C, Scott DO et al (2012) Mechanistic insights from comparing intrinsic clearance values between human liver microsomes and hepatocytes to guide drug design. Eur J Med Chem 57:441\u2013448","journal-title":"Eur J Med Chem"},{"key":"769_CR25","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1021\/acs.chemrestox.9b00447","volume":"33","author":"J Chen","year":"2020","unstructured":"Chen J, Yang H, Zhu L et al (2020) In silico prediction of human renal clearance of compounds using quantitative structure-pharmacokinetic relationship models. Chem Res Toxicol 33:640\u2013650","journal-title":"Chem Res Toxicol"},{"key":"769_CR26","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1016\/S1359-6446(04)03365-3","volume":"10","author":"JS Delaney","year":"2005","unstructured":"Delaney JS (2005) Predicting aqueous solubility from structure. Drug Discov Today 10:289\u2013295","journal-title":"Drug Discov Today"},{"key":"769_CR27","unstructured":"Huang K, Fu T, Gao W et al (2021) Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. Preprint arXiv:2102.09548"},{"key":"769_CR28","doi-asserted-by":"publisher","first-page":"1098","DOI":"10.1021\/jm901371v","volume":"53","author":"MVS Varma","year":"2010","unstructured":"Varma MVS, Obach RS, Rotter C et al (2010) Physicochemical space for optimum oral bioavailability: contribution of human intestinal absorption and first-pass elimination. J Med Chem 53:1098\u20131108","journal-title":"J Med Chem"},{"key":"769_CR29","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1016\/S0960-894X(00)00172-4","volume":"10","author":"WL Jorgensen","year":"2000","unstructured":"Jorgensen WL, Duffy EM (2000) Prediction of drug solubility from Monte Carlo simulations. Bioorg Med Chem Lett 10:1155\u20131158","journal-title":"Bioorg Med Chem Lett"},{"key":"769_CR30","doi-asserted-by":"publisher","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2012","unstructured":"Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucl Acids Res 40:D1100\u2013D1107","journal-title":"Nucl Acids Res"},{"key":"769_CR31","first-page":"31","volume":"8","author":"G Landrum","year":"2013","unstructured":"Landrum G (2013) RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 8:31","journal-title":"Greg Landrum"},{"key":"769_CR32","doi-asserted-by":"publisher","first-page":"129307","DOI":"10.1016\/j.cej.2021.129307","volume":"418","author":"FH Vermeire","year":"2021","unstructured":"Vermeire FH, Green WH (2021) Transfer learning for solvation free energies: from quantum chemistry to experiments. Chem Eng J 418:129307","journal-title":"Chem Eng J"},{"key":"769_CR33","first-page":"1","volume":"30","author":"G Ke","year":"2017","unstructured":"Ke G, Meng Q, Finley T et al (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:1","journal-title":"Adv Neural Inf Process Syst"},{"key":"769_CR34","doi-asserted-by":"publisher","first-page":"9097","DOI":"10.1021\/acs.jmedchem.7b00487","volume":"60","author":"F Lombardo","year":"2017","unstructured":"Lombardo F, Desai PV, Arimoto R et al (2017) In silico absorption, distribution, metabolism, excretion, and pharmacokinetics (ADME-PK): utility and best practices. An industry perspective from the international consortium for innovation through quality in pharmaceutical development: miniperspective. J Med Chem 60:9097\u20139113","journal-title":"J Med Chem"},{"key":"769_CR35","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.2174\/15680266113139990033","volume":"13","author":"F Cheng","year":"2013","unstructured":"Cheng F, Li W, Liu G, Tang Y (2013) In silico ADMET prediction: recent advances, current challenges and future trends. Curr Top Med Chem 13:1273\u20131289","journal-title":"Curr Top Med Chem"},{"key":"769_CR36","doi-asserted-by":"publisher","first-page":"3846","DOI":"10.1021\/acs.jcim.1c00670","volume":"61","author":"M Tynes","year":"2021","unstructured":"Tynes M, Gao W, Burrill DJ et al (2021) Pairwise difference regression: a machine learning meta-algorithm for improved prediction and uncertainty quantification in chemical search. J Chem Inf Model 61:3846\u20133857","journal-title":"J Chem Inf Model"},{"key":"769_CR37","doi-asserted-by":"publisher","first-page":"e2214168120","DOI":"10.1073\/pnas.2214168120","volume":"120","author":"KL Saar","year":"2023","unstructured":"Saar KL, McCorkindale W, Fearon D et al (2023) Turning high-throughput structural biology into predictive inhibitor design. Proc Natl Acad Sci 120:e2214168120","journal-title":"Proc Natl Acad Sci"},{"key":"769_CR38","doi-asserted-by":"publisher","first-page":"11086","DOI":"10.1021\/acsomega.1c01266","volume":"6","author":"D Fern\u00e1ndez-Llaneza","year":"2021","unstructured":"Fern\u00e1ndez-Llaneza D, Ulander S, Gogishvili D et al (2021) Siamese Recurrent neural network with a self-attention mechanism for bioactivity prediction. ACS Omega 6:11086\u201311094","journal-title":"ACS Omega"},{"key":"769_CR39","doi-asserted-by":"publisher","first-page":"2087","DOI":"10.1021\/acs.jctc.5b00099","volume":"11","author":"R Ramakrishnan","year":"2015","unstructured":"Ramakrishnan R, Dral PO, Rupp M, Von Lilienfeld OA (2015) Big data meets quantum chemistry approximations: the \u0394-machine learning approach. J Chem Theory Comput 11:2087\u20132096","journal-title":"J Chem Theory Comput"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00769-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00769-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00769-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T02:06:30Z","timestamp":1698372390000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00769-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,27]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["769"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00769-x","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2023-gbchq","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,27]]},"assertion":[{"value":"23 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"D.R. acts as a consultant to the pharmaceutical and biotechnology industry.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}],"article-number":"101"}}