{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T16:37:19Z","timestamp":1726850239780},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T00:00:00Z","timestamp":1590710400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T00:00:00Z","timestamp":1590710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010665","name":"H2020 Marie Sk\u0142odowska-Curie Actions","doi-asserted-by":"publisher","award":["676434"],"id":[{"id":"10.13039\/100010665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2020,12]]},"abstract":"Abstract<\/jats:title>Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.<\/jats:p>","DOI":"10.1186\/s13321-020-00441-8","type":"journal-article","created":{"date-parts":[[2020,5,29]],"date-time":"2020-05-29T12:03:08Z","timestamp":1590753788000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":109,"title":["SMILES-based deep generative scaffold decorator for de-novo drug design"],"prefix":"10.1186","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-9860-2944","authenticated-orcid":false,"given":"Josep","family":"Ar\u00fas-Pous","sequence":"first","affiliation":[]},{"given":"Atanas","family":"Patronov","sequence":"additional","affiliation":[]},{"given":"Esben Jannik","family":"Bjerrum","sequence":"additional","affiliation":[]},{"given":"Christian","family":"Tyrchan","sequence":"additional","affiliation":[]},{"given":"Jean-Louis","family":"Reymond","sequence":"additional","affiliation":[]},{"given":"Hongming","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,29]]},"reference":[{"key":"441_CR1","unstructured":"Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. https:\/\/cdn.openai.com\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf"},{"key":"441_CR2","unstructured":"Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196 [cs, stat]. http:\/\/arxiv.org\/abs\/1710.10196. Accessed 19 Feb 2020"},{"key":"441_CR3","unstructured":"Pan Y, Qiu Z, Yao T, Li H, Mei T (2018) To create what you tell: generating videos from captions. arXiv:1804.08264 [cs]. http:\/\/arxiv.org\/abs\/1804.08264. Accessed 19 Feb 2020"},{"key":"441_CR4","unstructured":"Huang CZA, Cooijmans T, Roberts A, Courville A, Eck D (2019) Counterpoint by convolution. arXiv:1903.07227 [cs, eess, stat]. http:\/\/arxiv.org\/abs\/1903.07227. Accessed 19 Feb 2020"},{"issue":"6","key":"441_CR5","doi-asserted-by":"publisher","first-page":"1241","DOI":"10.1016\/j.drudis.2018.01.039","volume":"23","author":"H Chen","year":"2018","unstructured":"Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discov Today 23(6):1241\u20131250. https:\/\/doi.org\/10.1016\/j.drudis.2018.01.039","journal-title":"Drug Discov Today"},{"issue":"9\u201310","key":"441_CR6","doi-asserted-by":"publisher","first-page":"1800041","DOI":"10.1002\/minf.201800041","volume":"37","author":"H Chen","year":"2018","unstructured":"Chen H, Kogej T, Engkvist O (2018) Cheminformatics in drug discovery, an industrial perspective. Mol Inform 37(9\u201310):1800041. https:\/\/doi.org\/10.1002\/minf.201800041","journal-title":"Mol Inform"},{"key":"441_CR7","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2019.01303","author":"L David","year":"2019","unstructured":"David L et al (2019) Applications of deep-learning in exploiting large-scale and heterogeneous compound data in industrial pharmaceutical research. Front Pharmacol. https:\/\/doi.org\/10.3389\/fphar.2019.01303","journal-title":"Front Pharmacol"},{"issue":"8","key":"441_CR8","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u20131780. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput"},{"issue":"Database issue","key":"441_CR9","doi-asserted-by":"publisher","first-page":"D945","DOI":"10.1093\/nar\/gkw1074","volume":"45","author":"A Gaulton","year":"2017","unstructured":"Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(Database issue):D945\u2013D954. https:\/\/doi.org\/10.1093\/nar\/gkw1074","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"441_CR10","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1186\/s13321-019-0393-0","volume":"11","author":"J Ar\u00fas-Pous","year":"2019","unstructured":"Ar\u00fas-Pous J et al (2019) Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 11(1):71. https:\/\/doi.org\/10.1186\/s13321-019-0393-0","journal-title":"J Cheminform"},{"issue":"1","key":"441_CR11","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","volume":"4","author":"MHS Segler","year":"2018","unstructured":"Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4(1):120\u2013131. https:\/\/doi.org\/10.1021\/acscentsci.7b00512","journal-title":"ACS Cent Sci"},{"issue":"4","key":"441_CR12","doi-asserted-by":"publisher","first-page":"1347","DOI":"10.1021\/acs.jcim.8b00902","volume":"59","author":"M Awale","year":"2019","unstructured":"Awale M, Sirockin F, Stiefl N, Reymond J-L (2019) Drug analogs from fragment-based long short-term memory generative neural networks. J Chem Inf Model 59(4):1347\u20131356. https:\/\/doi.org\/10.1021\/acs.jcim.8b00902","journal-title":"J Chem Inf Model"},{"issue":"1","key":"441_CR13","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9(1):48. https:\/\/doi.org\/10.1186\/s13321-017-0235-x","journal-title":"J Cheminform"},{"issue":"1\u20132","key":"441_CR14","doi-asserted-by":"publisher","first-page":"1700123","DOI":"10.1002\/minf.201700123","volume":"37","author":"T Blaschke","year":"2018","unstructured":"Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H (2018) Application of generative autoencoder in de novo molecular design. Mol Inform 37(1\u20132):1700123. https:\/\/doi.org\/10.1002\/minf.201700123","journal-title":"Mol Inform"},{"issue":"2","key":"441_CR15","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 4(2):268\u2013276. https:\/\/doi.org\/10.1021\/acscentsci.7b00572","journal-title":"ACS Cent Sci."},{"issue":"5","key":"441_CR16","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1038\/s42256-020-0174-5","volume":"2","author":"P-C Kotsias","year":"2020","unstructured":"Kotsias P-C, Ar\u00fas-Pous J, Chen H, Engkvist O, Tyrchan C, Bjerrum EJ (2020) Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat Mach Intell 2(5):254\u2013265. https:\/\/doi.org\/10.1038\/s42256-020-0174-5","journal-title":"Nat Mach Intell"},{"key":"441_CR17","doi-asserted-by":"publisher","DOI":"10.26434\/chemrxiv.5309668.v3","author":"B Sanchez-Lengeling","year":"2017","unstructured":"Sanchez-Lengeling B, Outeiral C, Guimaraes GL, Aspuru-Guzik A (2017) Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). ChemRxiv. https:\/\/doi.org\/10.26434\/chemrxiv.5309668.v3","journal-title":"ChemRxiv"},{"issue":"1","key":"441_CR18","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1186\/s13321-019-0397-9","volume":"11","author":"O Prykhodko","year":"2019","unstructured":"Prykhodko O et al (2019) A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 11(1):74. https:\/\/doi.org\/10.1186\/s13321-019-0397-9","journal-title":"J Cheminform"},{"issue":"1","key":"441_CR19","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"key":"441_CR20","unstructured":"Li Y, Vinyals O, Dyer C, Pascanu R, Battaglia P (2018) Learning deep generative models of graphs. arXiv:1803.03324 [cs, stat]. http:\/\/arxiv.org\/abs\/1803.03324. Accessed 18 Feb 2020"},{"issue":"1","key":"441_CR21","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s13321-018-0287-6","volume":"10","author":"Y Li","year":"2018","unstructured":"Li Y, Zhang L, Liu Z (2018) Multi-objective de novo drug design with conditional graph generative model. J Cheminform 10(1):33. https:\/\/doi.org\/10.1186\/s13321-018-0287-6","journal-title":"J Cheminform"},{"key":"441_CR22","unstructured":"Bjerrum EJ (2017) SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv:1703.07076 [cs]. http:\/\/arxiv.org\/abs\/1703.07076. Accessed 19 Feb 2020"},{"issue":"2","key":"441_CR23","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1021\/ci00062a008","volume":"29","author":"D Weininger","year":"1989","unstructured":"Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29(2):97\u2013101. https:\/\/doi.org\/10.1021\/ci00062a008","journal-title":"J Chem Inf Comput Sci"},{"issue":"3","key":"441_CR24","doi-asserted-by":"publisher","first-page":"1175","DOI":"10.1021\/acs.jcim.9b00943","volume":"60","author":"F Grisoni","year":"2020","unstructured":"Grisoni F, Moret M, Lingwood R, Schneider G (2020) Bidirectional molecule generation with recurrent neural networks. J Chem Inf Model 60(3):1175\u20131183. https:\/\/doi.org\/10.1021\/acs.jcim.9b00943","journal-title":"J Chem Inf Model"},{"issue":"4","key":"441_CR25","doi-asserted-by":"publisher","first-page":"1153","DOI":"10.1039\/C9SC04503A","volume":"11","author":"J Lim","year":"2020","unstructured":"Lim J, Hwang S-Y, Moon S, Kim S, Kim WY (2020) Scaffold-based molecular design with a graph generative model. Chem Sci 11(4):1153\u20131164. https:\/\/doi.org\/10.1039\/C9SC04503A","journal-title":"Chem Sci"},{"issue":"1","key":"441_CR26","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1021\/acs.jcim.9b00727","volume":"60","author":"Y Li","year":"2020","unstructured":"Li Y, Hu J, Wang Y, Zhou J, Zhang L, Liu Z (2020) DeepScaffold: a comprehensive tool for Scaffold-based de novo drug discovery using deep learning. J Chem Inf Model 60(1):77\u201391. https:\/\/doi.org\/10.1021\/acs.jcim.9b00727","journal-title":"J Chem Inf Model"},{"issue":"6","key":"441_CR27","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.1111\/j.1476-5381.2010.01127.x","volume":"162","author":"JP Hughes","year":"2011","unstructured":"Hughes JP, Rees S, Kalindjian SB, Philpott KL (2011) Principles of early drug discovery. Br J Pharmacol 162(6):1239\u20131249. https:\/\/doi.org\/10.1111\/j.1476-5381.2010.01127.x","journal-title":"Br J Pharmacol"},{"issue":"7332","key":"441_CR28","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1038\/470042a","volume":"470","author":"PJ Hajduk","year":"2011","unstructured":"Hajduk PJ, Galloway WRJD, Spring DR (2011) A question of library design. Nature 470(7332):42\u201343. https:\/\/doi.org\/10.1038\/470042a","journal-title":"Nature"},{"key":"441_CR29","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1016\/j.csbj.2016.12.003","volume":"15","author":"C Tyrchan","year":"2017","unstructured":"Tyrchan C, Evertsson E (2017) Matched molecular pair analysis in short: algorithms, applications and limitations. Comput Struct Biotechnol J 15:86\u201390. https:\/\/doi.org\/10.1016\/j.csbj.2016.12.003","journal-title":"Comput Struct Biotechnol J"},{"issue":"3","key":"441_CR30","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1021\/ci900450m","volume":"50","author":"J Hussain","year":"2010","unstructured":"Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50(3):339\u2013348. https:\/\/doi.org\/10.1021\/ci900450m","journal-title":"J Chem Inf Model"},{"issue":"2","key":"441_CR31","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1021\/ci0255782","volume":"43","author":"P Ertl","year":"2003","unstructured":"Ertl P (2003) Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J Chem Inf Comput Sci 43(2):374\u2013380. https:\/\/doi.org\/10.1021\/ci0255782","journal-title":"J Chem Inf Comput Sci"},{"issue":"1","key":"441_CR32","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/s13321-020-0412-1","volume":"12","author":"P Ertl","year":"2020","unstructured":"Ertl P (2020) Craig plot 2.0: an interactive navigation in the substituent bioisosteric space. J Cheminform 12(1):8. https:\/\/doi.org\/10.1186\/s13321-020-0412-1","journal-title":"J Cheminform"},{"issue":"1","key":"441_CR33","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1186\/s13321-017-0203-5","volume":"9","author":"J Sun","year":"2017","unstructured":"Sun J et al (2017) ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J Cheminform 9(1):17. https:\/\/doi.org\/10.1186\/s13321-017-0203-5","journal-title":"J Cheminform"},{"issue":"1","key":"441_CR34","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13321-020-0416-x","volume":"12","author":"D Probst","year":"2020","unstructured":"Probst D, Reymond J-L (2020) Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 12(1):12. https:\/\/doi.org\/10.1186\/s13321-020-0416-x","journal-title":"J Cheminform"},{"issue":"3","key":"441_CR35","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1021\/ci970429i","volume":"38","author":"XQ Lewell","year":"1998","unstructured":"Lewell XQ, Judd DB, Watson SP, Hann MM (1998) RECAP retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 38(3):511\u2013522. https:\/\/doi.org\/10.1021\/ci970429i","journal-title":"J Chem Inf Comput Sci"},{"issue":"19","key":"441_CR36","doi-asserted-by":"publisher","first-page":"876","DOI":"10.1016\/S1359-6446(03)02831-9","volume":"8","author":"M Congreve","year":"2003","unstructured":"Congreve M, Carr R, Murray C, Jhoti H (2003) A \u2018Rule of Three\u2019 for fragment-based lead discovery? Drug Discov Today 8(19):876\u2013877. https:\/\/doi.org\/10.1016\/S1359-6446(03)02831-9","journal-title":"Drug Discov Today"},{"issue":"6\u20137","key":"441_CR37","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1002\/minf.201000061","volume":"29","author":"A Tropsha","year":"2010","unstructured":"Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6\u20137):476\u2013488. https:\/\/doi.org\/10.1002\/minf.201000061","journal-title":"Mol Inform"},{"issue":"7743","key":"441_CR38","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1038\/s41586-019-0917-9","volume":"566","author":"J Lyu","year":"2019","unstructured":"Lyu J et al (2019) Ultra-large library docking for discovering new chemotypes. Nature 566(7743):224\u2013229. https:\/\/doi.org\/10.1038\/s41586-019-0917-9","journal-title":"Nature"},{"issue":"8","key":"441_CR39","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1038\/nrd1799","volume":"4","author":"G Schneider","year":"2005","unstructured":"Schneider G, Fechner U (2005) Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov 4(8):649\u2013663. https:\/\/doi.org\/10.1038\/nrd1799","journal-title":"Nat Rev Drug Discov"},{"issue":"1","key":"441_CR40","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-019-0341-z","volume":"11","author":"J Ar\u00fas-Pous","year":"2019","unstructured":"Ar\u00fas-Pous J, Blaschke T, Ulander S, Reymond J-L, Chen H, Engkvist O (2019) Exploring the GDB-13 chemical space using deep generative models. J Cheminform 11(1):20. https:\/\/doi.org\/10.1186\/s13321-019-0341-z","journal-title":"J Cheminform"},{"key":"441_CR41","doi-asserted-by":"crossref","unstructured":"Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025 [cs]. http:\/\/arxiv.org\/abs\/1508.04025. Accessed 19 Feb 2020","DOI":"10.18653\/v1\/D15-1166"},{"issue":"15","key":"441_CR42","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887\u20132893. https:\/\/doi.org\/10.1021\/jm9602928","journal-title":"J Med Chem"},{"issue":"9","key":"441_CR43","doi-asserted-by":"publisher","first-page":"3182","DOI":"10.1021\/jm049032d","volume":"48","author":"SJ Wilkens","year":"2005","unstructured":"Wilkens SJ, Janes J, Su AI (2005) HierS: hierarchical Scaffold clustering using topological chemical graphs. J Med Chem 48(9):3182\u20133193. https:\/\/doi.org\/10.1021\/jm049032d","journal-title":"J Med Chem"},{"issue":"1\u20132","key":"441_CR44","doi-asserted-by":"publisher","first-page":"1700111","DOI":"10.1002\/minf.201700111","volume":"37","author":"A Gupta","year":"2018","unstructured":"Gupta A, M\u00fcller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inform 37(1\u20132):1700111. https:\/\/doi.org\/10.1002\/minf.201700111","journal-title":"Mol Inform"},{"key":"441_CR45","unstructured":"Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors,\u201d arXiv:1207.0580 [cs]. http:\/\/arxiv.org\/abs\/1207.0580. Accessed 19 Feb 2020"},{"key":"441_CR46","unstructured":"Bahdanau D, Cho K, Bengio Y (2016) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 [cs, stat]. http:\/\/arxiv.org\/abs\/1409.0473. Accessed 19 Feb 2020"},{"key":"441_CR47","unstructured":"Vaswani A et al (2017) Attention is all you need. arXiv:1706.03762 [cs]. http:\/\/arxiv.org\/abs\/1706.03762. Accessed 19 Feb 2020"},{"issue":"4","key":"441_CR48","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1021\/ci9803381","volume":"39","author":"D Butina","year":"1999","unstructured":"Butina D (1999) Unsupervised data base clustering based on daylight\u2019s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Inf Comput Sci 39(4):747\u2013750. https:\/\/doi.org\/10.1021\/ci9803381","journal-title":"J Chem Inf Comput Sci"},{"issue":"1","key":"441_CR49","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1145\/2786984.2786995","volume":"19","author":"G Varoquaux","year":"2015","unstructured":"Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2015) Scikit-learn: machine learning without learning the machinery. GetMobile Mobile Comp Comm 19(1):29\u201333. https:\/\/doi.org\/10.1145\/2786984.2786995","journal-title":"GetMobile Mobile Comp Comm"},{"key":"441_CR50","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. arXiv:1907.10902 [cs, stat]. http:\/\/arxiv.org\/abs\/1907.10902. Accessed 19 Feb 2020","DOI":"10.1145\/3292500.3330701"},{"key":"441_CR51","unstructured":"Paszke A et al (2017) Automatic differentiation in PyTorch. https:\/\/openreview.net\/forum?id=BJJsrmfCZ. Accessed 18 Feb 2020"},{"key":"441_CR52","unstructured":"Landrum G (2020) rdkit\/rdkit: 2019_09_3 (Q3 2019) Release. Zenodo"},{"issue":"11","key":"441_CR53","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/2934664","volume":"59","author":"M Zaharia","year":"2016","unstructured":"Zaharia M et al (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56\u201365. https:\/\/doi.org\/10.1145\/2934664","journal-title":"Commun ACM"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00441-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-020-00441-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-00441-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,28]],"date-time":"2021-05-28T23:33:29Z","timestamp":1622244809000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-020-00441-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,29]]},"references-count":53,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["441"],"URL":"https:\/\/doi.org\/10.1186\/s13321-020-00441-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.11638383.v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,29]]},"assertion":[{"value":"4 February 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 May 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 May 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"38"}}