{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,9]],"date-time":"2024-09-09T12:57:52Z","timestamp":1725886672933},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,1,7]],"date-time":"2023-01-07T00:00:00Z","timestamp":1673049600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,7]],"date-time":"2023-01-07T00:00:00Z","timestamp":1673049600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["22J12846"],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"Abstract<\/jats:title>Activity cliffs (AC) are formed by pairs of structural analogues that are active against the same target but have a large difference in potency. While much of our knowledge about ACs has originated from the analysis and comparison of compounds and activity data, several studies have reported AC predictions over the past decade. Different from typical compound classification tasks, AC predictions must be carried out at the level of compound pairs representing ACs or nonACs. Most AC predictions reported so far have focused on individual methods or comparisons of two or three approaches and only investigated a few compound activity classes (from 2 to 10). Although promising prediction accuracy has been reported in most cases, different system set-ups, AC definitions, methods, and calculation conditions were used, precluding direct comparisons of these studies. Therefore, we have carried out a large-scale AC prediction campaign across 100 activity classes comparing machine learning methods of greatly varying complexity, ranging from pair-based nearest neighbor classifiers and decision tree or kernel methods to deep neural networks. The results of our systematic predictions revealed the level of accuracy that can be expected for AC predictions across many different compound classes. In addition, prediction accuracy did not scale with methodological complexity but was significantly influenced by memorization of compounds shared by different ACs or nonACs. In many instances, limited training data were sufficient for building accurate models using different methods and there was no detectable advantage of deep learning over simpler approaches for AC prediction. On a global scale, support vector machine models performed best, by only\u00a0small margins compared to others including simple nearest neighbor classifiers.<\/jats:p>\n Graphical Abstract<\/jats:bold><\/jats:p>","DOI":"10.1186\/s13321-022-00676-7","type":"journal-article","created":{"date-parts":[[2023,1,7]],"date-time":"2023-01-07T11:03:48Z","timestamp":1673089428000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Large-scale prediction of activity cliffs using machine and deep learning methods of increasing complexity"],"prefix":"10.1186","volume":"15","author":[{"given":"Shunsuke","family":"Tamura","sequence":"first","affiliation":[]},{"given":"Tomoyuki","family":"Miyao","sequence":"additional","affiliation":[]},{"given":"J\u00fcrgen","family":"Bajorath","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,1,7]]},"reference":[{"key":"676_CR1","doi-asserted-by":"publisher","first-page":"1535","DOI":"10.1021\/ci060117s","volume":"46","author":"GM Maggiora","year":"2006","unstructured":"Maggiora GM (2006) On outliers and activity CliffsWhy QSAR often disappoints. J Chem Inf Model 46:1535\u20131535. https:\/\/doi.org\/10.1021\/ci060117s","journal-title":"J Chem Inf Model"},{"key":"676_CR2","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1021\/jm401120g","volume":"57","author":"D Stumpfe","year":"2014","unstructured":"Stumpfe D, Hu Y, Dimova D, Bajorath J (2014) Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J Med Chem 57:18\u201328. https:\/\/doi.org\/10.1021\/jm401120g","journal-title":"J Med Chem"},{"key":"676_CR3","doi-asserted-by":"publisher","first-page":"14360","DOI":"10.1021\/acsomega.9b02221","volume":"4","author":"D Stumpfe","year":"2019","unstructured":"Stumpfe D, Hu H, Bajorath J (2019) Evolving concept of activity cliffs. ACS Omega 4:14360\u201314368. https:\/\/doi.org\/10.1021\/acsomega.9b02221","journal-title":"ACS Omega"},{"key":"676_CR4","doi-asserted-by":"publisher","first-page":"2181","DOI":"10.1021\/ci300047k","volume":"52","author":"R Guha","year":"2012","unstructured":"Guha R (2012) Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. J Chem Inf Model 52:2181\u20132191. https:\/\/doi.org\/10.1021\/ci300047k","journal-title":"J Chem Inf Model"},{"key":"676_CR5","doi-asserted-by":"publisher","first-page":"2354","DOI":"10.1021\/ci300306a","volume":"52","author":"K Heikamp","year":"2012","unstructured":"Heikamp K, Hu X, Yan A, Bajorath J (2012) Prediction of activity cliffs using support vector machines. J Chem Inf Model 52:2354\u20132365. https:\/\/doi.org\/10.1021\/ci300306a","journal-title":"J Chem Inf Model"},{"key":"676_CR6","doi-asserted-by":"publisher","first-page":"2000103","DOI":"10.1002\/minf.202000103","volume":"39","author":"S Tamura","year":"2020","unstructured":"Tamura S, Miyao T, Funatsu K (2020) Ligand-based activity cliff prediction models with applicability domain. Mol Inform 39:2000103. https:\/\/doi.org\/10.1002\/minf.202000103","journal-title":"Mol Inform"},{"key":"676_CR7","doi-asserted-by":"publisher","first-page":"4916","DOI":"10.3390\/molecules26164916","volume":"26","author":"S Tamura","year":"2021","unstructured":"Tamura S, Jasial S, Miyao T, Funatsu K (2021) Interpretation of ligand-based activity cliff prediction models using the matched molecular pair kernel. Molecules 26:4916. https:\/\/doi.org\/10.3390\/molecules26164916","journal-title":"Molecules"},{"key":"676_CR8","doi-asserted-by":"publisher","first-page":"1631","DOI":"10.1021\/acs.jcim.6b00359","volume":"56","author":"D Horvath","year":"2016","unstructured":"Horvath D, Marcou G, Varnek A et al (2016) Prediction of activity cliffs using condensed graphs of reaction representations, descriptor recombination, support vector machine classification, and support vector regression. J Chem Inf Model 56:1631\u20131640. https:\/\/doi.org\/10.1021\/acs.jcim.6b00359","journal-title":"J Chem Inf Model"},{"key":"676_CR9","doi-asserted-by":"publisher","first-page":"1062","DOI":"10.1021\/ci500742b","volume":"55","author":"J Husby","year":"2015","unstructured":"Husby J, Bottegoni G, Kufareva I et al (2015) Structure-based predictions of activity cliffs. J Chem Inf Model 55:1062\u20131076. https:\/\/doi.org\/10.1021\/ci500742b","journal-title":"J Chem Inf Model"},{"key":"676_CR10","doi-asserted-by":"publisher","first-page":"100022","DOI":"10.1016\/j.ailsci.2021.100022","volume":"1","author":"J Iqbal","year":"2021","unstructured":"Iqbal J, Vogt M, Bajorath J (2021) Learning functional group chemistry from molecular images leads to accurate prediction of activity cliffs. Artif Intell Life Sci 1:100022. https:\/\/doi.org\/10.1016\/j.ailsci.2021.100022","journal-title":"Artif Intell Life Sci"},{"key":"676_CR11","doi-asserted-by":"publisher","DOI":"10.1007\/s10822-021-00380-y","author":"J Iqbal","year":"2021","unstructured":"Iqbal J, Vogt M, Bajorath J (2021) Prediction of activity cliffs on the basis of images using convolutional neural networks. J Comput Aid Mol Des. https:\/\/doi.org\/10.1007\/s10822-021-00380-y","journal-title":"J Comput Aid Mol Des"},{"key":"676_CR12","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.2c00327","author":"J Park","year":"2022","unstructured":"Park J, Sung G, Lee S et al (2022) ACGCN: graph convolutional networks for activity cliff prediction between matched molecular pairs. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.2c00327","journal-title":"J Chem Inf Model"},{"key":"676_CR13","doi-asserted-by":"publisher","DOI":"10.1039\/d2dd00077f","author":"H Chen","year":"2022","unstructured":"Chen H, Vogt M, Bajorath J (2022) DeepAC-conditional transformer-based chemical language model for the prediction of activity cliffs formed by bioactive compounds. Digital Discov. 1:898\u2013909. https:\/\/doi.org\/10.1039\/d2dd00077f","journal-title":"Digital Discov"},{"key":"676_CR14","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1021\/acs.jcim.1c01163","volume":"62","author":"J Jim\u00e9nez-Luna","year":"2022","unstructured":"Jim\u00e9nez-Luna J, Skalic M, Weskamp N (2022) Benchmarking molecular feature attribution methods with activity cliffs. J Chem Inf Model 62:274\u2013283. https:\/\/doi.org\/10.1021\/acs.jcim.1c01163","journal-title":"J Chem Inf Model"},{"key":"676_CR15","doi-asserted-by":"publisher","unstructured":"Tilborg D van, Alenicheva A, Grisoni F (2022) Exposing the limitations of molecular machine learning with activity cliffs. https:\/\/doi.org\/10.26434\/chemrxiv-2022-mfq52-v3","DOI":"10.26434\/chemrxiv-2022-mfq52-v3"},{"key":"676_CR16","doi-asserted-by":"publisher","first-page":"D930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez D, Gaulton A, Bento AP et al (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930\u2013D940. https:\/\/doi.org\/10.1093\/nar\/gky1075","journal-title":"Nucleic Acids Res"},{"key":"676_CR17","doi-asserted-by":"publisher","first-page":"1138","DOI":"10.1021\/ci3001138","volume":"52","author":"X Hu","year":"2012","unstructured":"Hu X, Hu Y, Vogt M et al (2012) MMP-cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 52:1138\u20131145. https:\/\/doi.org\/10.1021\/ci3001138","journal-title":"J Chem Inf Model"},{"key":"676_CR18","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1021\/ci900450m","volume":"50","author":"J Hussain","year":"2010","unstructured":"Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339\u2013348. https:\/\/doi.org\/10.1021\/ci900450m","journal-title":"J Chem Inf Model"},{"key":"676_CR19","doi-asserted-by":"publisher","first-page":"2944","DOI":"10.1021\/jm200026b","volume":"54","author":"M Wawer","year":"2011","unstructured":"Wawer M, Bajorath J (2011) Local structural changes, global data views: graphical substructure\u2212activity relationship trailing. J Med Chem 54:2944\u20132951. https:\/\/doi.org\/10.1021\/jm200026b","journal-title":"J Med Chem"},{"key":"676_CR20","doi-asserted-by":"publisher","first-page":"2932","DOI":"10.1021\/jm201706b","volume":"55","author":"D Stumpfe","year":"2012","unstructured":"Stumpfe D, Bajorath J (2012) Exploring activity cliffs in medicinal chemistry. J Med Chem 55:2932\u20132942. https:\/\/doi.org\/10.1021\/jm201706b","journal-title":"J Med Chem"},{"key":"676_CR21","doi-asserted-by":"publisher","first-page":"379","DOI":"10.4155\/fmc-2018-0299","volume":"11","author":"H Hu","year":"2019","unstructured":"Hu H, Stumpfe D, Bajorath J (2019) Second-generation activity cliffs identified on the basis of target set-dependent potency difference criteria. Future Med Chem 11:379\u2013394. https:\/\/doi.org\/10.4155\/fmc-2018-0299","journal-title":"Future Med Chem"},{"key":"676_CR22","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742\u2013754. https:\/\/doi.org\/10.1021\/ci100050t","journal-title":"J Chem Inf Model"},{"key":"676_CR23","unstructured":"OEChem Toolkit, OpenEye Scientific Software: Santa Fe, NM."},{"key":"676_CR24","doi-asserted-by":"publisher","first-page":"2325","DOI":"10.1021\/ci300149n","volume":"52","author":"A de Luca","year":"2012","unstructured":"de Luca A, Horvath D, Marcou G et al (2012) Mining chemical reactions using neighborhood behavior and condensed graphs of reactions approaches. J Chem Inf Model 52:2325\u20132338. https:\/\/doi.org\/10.1021\/ci300149n","journal-title":"J Chem Inf Model"},{"key":"676_CR25","unstructured":"RDKit: Cheminformatics and Machine Learning Software. (2013) http:\/\/www.rdkit.org\/. Accessed Nov 8 2022"},{"key":"676_CR26","unstructured":"Paszke A, Gross S, Massa F, et al (2019) PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inform Proc Syst. Vancouver, Canada"},{"key":"676_CR27","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"676_CR28","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, et al (2019) Optuna: a next-generation hyperparameter optimization framework. Anchorage, AK, USA, pp 2623\u20132631","DOI":"10.1145\/3292500.3330701"},{"key":"676_CR29","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3264-1","volume-title":"The nature of statistical learning theory","author":"VN Vapnik","year":"2000","unstructured":"Vapnik VN (2000) The nature of statistical learning theory. Springer, New York. https:\/\/doi.org\/10.1007\/978-1-4757-3264-1"},{"key":"676_CR30","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1016\/j.neunet.2005.07.009","volume":"18","author":"L Ralaivola","year":"2005","unstructured":"Ralaivola L, Swamidass SJ, Saigo H, Baldi P (2005) Graph kernels for chemical informatics. Neural Netw 18:1093\u20131110. https:\/\/doi.org\/10.1016\/j.neunet.2005.07.009","journal-title":"Neural Netw"},{"key":"676_CR31","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/a:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332. https:\/\/doi.org\/10.1023\/a:1010933404324","journal-title":"Mach Learn"},{"key":"676_CR32","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1013203451","author":"JH Friedman","year":"2001","unstructured":"Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat. https:\/\/doi.org\/10.1214\/aos\/1013203451","journal-title":"Ann Stat"},{"key":"676_CR33","unstructured":"Vinod N, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. ICML. pp 807\u2013814. https:\/\/dblp.org\/db\/conf\/icml"},{"key":"676_CR34","doi-asserted-by":"publisher","DOI":"10.48550\/arxiv.1412.6980","author":"DP Kingma","year":"2014","unstructured":"Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Arxiv. https:\/\/doi.org\/10.48550\/arxiv.1412.6980","journal-title":"Arxiv"},{"key":"676_CR35","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1007\/s10822-022-00449-2","volume":"36","author":"I Maeda","year":"2022","unstructured":"Maeda I, Sato A, Tamura S, Miyao T (2022) Ligand-based approaches to activity prediction for the early stage of structure\u2013activity\u2013relationship progression. J Comput Aid Mol Des 36:237\u2013252. https:\/\/doi.org\/10.1007\/s10822-022-00449-2","journal-title":"J Comput Aid Mol Des"},{"key":"676_CR36","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s13321-020-0414-z","volume":"12","author":"B Tang","year":"2020","unstructured":"Tang B, Kramer ST, Fang M et al (2020) A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminformatics 12:15. https:\/\/doi.org\/10.1186\/s13321-020-0414-z","journal-title":"J Cheminformatics"},{"key":"676_CR37","doi-asserted-by":"crossref","unstructured":"Brodersen KH, Ong CS, Stephan KE, Buhmann JM (2010) The balanced accuracy and its posterior distribution. 3121\u20133124","DOI":"10.1109\/ICPR.2010.764"},{"key":"676_CR38","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","volume":"405","author":"BW Matthews","year":"1975","unstructured":"Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442\u2013451. https:\/\/doi.org\/10.1016\/0005-2795(75)90109-9","journal-title":"Biochim Biophys Acta"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00676-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-022-00676-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00676-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,10]],"date-time":"2023-01-10T15:58:49Z","timestamp":1673366329000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-022-00676-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,7]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["676"],"URL":"https:\/\/doi.org\/10.1186\/s13321-022-00676-7","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,7]]},"assertion":[{"value":"8 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 December 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing financial interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"4"}}