{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T08:31:19Z","timestamp":1723451479424},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T00:00:00Z","timestamp":1702425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T00:00:00Z","timestamp":1702425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["JP22nk0101111"],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"Abstract<\/jats:title>Developing compounds with novel structures is important for the production of new drugs. From an intellectual perspective, confirming the patent status of newly developed compounds is essential, particularly for pharmaceutical companies. The generation of a large number of compounds has been made possible because of the recent advances in artificial intelligence (AI). However, confirming the patent status of these generated molecules has been a challenge because there are no free and easy-to-use tools that can be used to determine the novelty of the generated compounds in terms of patents in a timely manner; additionally, there are no appropriate reference databases for pharmaceutical patents in the world. In this study, two public databases, SureChEMBL and Google Patents Public Datasets, were used to create a reference database of drug-related patented compounds using international patent classification. An exact structure search system was constructed using InChIKey and a relational database system to rapidly search for compounds in the reference database. Because drug-related patented compounds are a good source for generative AI to learn useful chemical structures, they were used as the training data. Furthermore, molecule generation was successfully directed by increasing and decreasing the number of generated patented compounds through incorporation of patent status (i.e., patented or not) into learning. The use of patent status enabled generation of novel molecules with high drug-likeness. The generation using generative AI with patent information would help efficiently propose novel compounds in terms of pharmaceutical patents. Scientific contribution: In this study, a new molecule-generation method that takes into account the patent status of molecules, which has rarely been considered but is an important feature in drug discovery, was developed. The method enables the generation of novel molecules based on pharmaceutical patents with high drug-likeness and will help in the efficient development of effective drug compounds.<\/jats:p>","DOI":"10.1186\/s13321-023-00791-z","type":"journal-article","created":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T09:02:36Z","timestamp":1702458156000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["AI-driven molecular generation of not-patented pharmaceutical compounds using world open patent data"],"prefix":"10.1186","volume":"15","author":[{"given":"Yugo","family":"Shimizu","sequence":"first","affiliation":[]},{"given":"Masateru","family":"Ohta","sequence":"additional","affiliation":[]},{"given":"Shoichi","family":"Ishida","sequence":"additional","affiliation":[]},{"given":"Kei","family":"Terayama","sequence":"additional","affiliation":[]},{"given":"Masanori","family":"Osawa","sequence":"additional","affiliation":[]},{"given":"Teruki","family":"Honma","sequence":"additional","affiliation":[]},{"given":"Kazuyoshi","family":"Ikeda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,13]]},"reference":[{"key":"791_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/wcms.1608","volume":"12","author":"C Bilodeau","year":"2022","unstructured":"Bilodeau C, Jin W, Jaakkola T, Barzilay R, Jensen KF (2022) Generative models for molecular discovery: recent advances and challenges. WIREs Comput Mol Sci 12:1\u201317. https:\/\/doi.org\/10.1002\/wcms.1608","journal-title":"WIREs Comput Mol Sci"},{"key":"791_CR2","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1038\/s41587-020-0418-2","volume":"38","author":"WP Walters","year":"2020","unstructured":"Walters WP, Murcko M (2020) Assessing the impact of generative AI on medicinal chemistry. Nat Biotechnol 38:143\u2013145. https:\/\/doi.org\/10.1038\/s41587-020-0418-2","journal-title":"Nat Biotechnol"},{"key":"791_CR3","first-page":"2672","volume-title":"Advances in neural information processing systems","author":"I Goodfellow","year":"2014","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 2672\u20132680"},{"key":"791_CR4","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1186\/s13321-019-0397-9","volume":"11","author":"O Prykhodko","year":"2019","unstructured":"Prykhodko O, Johansson SV, Kotsias P-C, Ar\u00fas-Pous J, Bjerrum EJ, Engkvist O, Chen H (2019) A de novo molecular generation method using latent vector based generative adversarial network. J Cheminform 11:74. https:\/\/doi.org\/10.1186\/s13321-019-0397-9","journal-title":"J Cheminform"},{"key":"791_CR5","volume-title":"Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations","author":"DE Rumelhart","year":"1987","unstructured":"Rumelhart DE, McClelland JL, Group PR (1987) Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations. MIT press, Cambridge"},{"key":"791_CR6","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-017-0235-x","volume":"9","author":"M Olivecrona","year":"2017","unstructured":"Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo design through deep reinforcement learning. J Cheminform 9:48. https:\/\/doi.org\/10.1186\/s13321-017-0235-x","journal-title":"J Cheminform"},{"key":"791_CR7","first-page":"5998","volume-title":"Advances in neural information processing systems","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems. MIT PRess, Cambridge, pp 5998\u20136008"},{"key":"791_CR8","doi-asserted-by":"publisher","first-page":"2064","DOI":"10.1021\/acs.jcim.1c00600","volume":"62","author":"V Bagal","year":"2022","unstructured":"Bagal V, Aggarwal R, Vinod PK, Priyakumar UD (2022) MolGPT: molecular generation using a transformer-decoder model. J Chem Inf Model 62:2064\u20132076. https:\/\/doi.org\/10.1021\/acs.jcim.1c00600","journal-title":"J Chem Inf Model"},{"key":"791_CR9","unstructured":"Nigam A, Friederich P, Krenn M, Aspuru-Guzik A (2020) Augmenting genetic algorithms with deep neural networks for exploring the chemical space. arXiv:1909.11655 [cs.NE]"},{"key":"791_CR10","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268\u2013276. https:\/\/doi.org\/10.1021\/acscentsci.7b00572","journal-title":"ACS Cent Sci"},{"key":"791_CR11","doi-asserted-by":"publisher","first-page":"10752","DOI":"10.1038\/s41598-019-47148-x","volume":"9","author":"Z Zhou","year":"2019","unstructured":"Zhou Z, Kearnes S, Li L, Zare RN, Riley P (2019) Optimization of molecules via deep reinforcement learning. Sci Rep 9:10752. https:\/\/doi.org\/10.1038\/s41598-019-47148-x","journal-title":"Sci Rep"},{"key":"791_CR12","doi-asserted-by":"publisher","first-page":"3304","DOI":"10.1021\/acs.jcim.1c00679","volume":"61","author":"B Ma","year":"2021","unstructured":"Ma B, Terayama K, Matsumoto S, Isaka Y, Sasakura Y, Iwata H, Araki M, Okuno Y (2021) Structure-based de novo molecular generator combined with artificial intelligence and docking simulations. J Chem Inf Model 61:3304\u20133313. https:\/\/doi.org\/10.1021\/acs.jcim.1c00679","journal-title":"J Chem Inf Model"},{"key":"791_CR13","doi-asserted-by":"publisher","first-page":"5351","DOI":"10.1021\/acs.jcim.2c00787","volume":"62","author":"T Yoshizawa","year":"2022","unstructured":"Yoshizawa T, Ishida S, Sato T, Ohta M, Honma T, Terayama K (2022) Selective inhibitor design for kinase homologs using multiobjective Monte Carlo tree search. J Chem Inf Model 62:5351\u20135360. https:\/\/doi.org\/10.1021\/acs.jcim.2c00787","journal-title":"J Chem Inf Model"},{"key":"791_CR14","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1186\/s13321-021-00561-9","volume":"13","author":"X Liu","year":"2021","unstructured":"Liu X, Ye K, van Vlijmen HWT, Emmerich MTM, IJzerman AP, van Westen GJP (2021) DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology. J Cheminform 13:85. https:\/\/doi.org\/10.1186\/s13321-021-00561-9","journal-title":"J Cheminform"},{"key":"791_CR15","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1039\/D3DD00041A","volume":"2","author":"A Subramanian","year":"2023","unstructured":"Subramanian A, Greenman P, Gervaix K, Yang A, G\u00f3mez-Bombarelli T (2023) Automated patent extraction powers generative modeling in focused chemical spaces. Digit Discov 2:1006\u20131015. https:\/\/doi.org\/10.1039\/D3DD00041A","journal-title":"Digit Discov"},{"key":"791_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.wpi.2021.102055","volume":"66","author":"J Ohms","year":"2021","unstructured":"Ohms J (2021) Current methodologies for chemical compound searching in patents: a case study. World Pat Inf 66:102055. https:\/\/doi.org\/10.1016\/j.wpi.2021.102055","journal-title":"World Pat Inf"},{"key":"791_CR17","unstructured":"Google patents. https:\/\/patents.google.com. Accessed 01 Aug 2023"},{"key":"791_CR18","doi-asserted-by":"publisher","first-page":"D1220","DOI":"10.1093\/nar\/gkv1253","volume":"44","author":"G Papadatos","year":"2016","unstructured":"Papadatos G, Davies M, Dedman N, Chambers J, Gaulton A, Siddle J, Koks R, Irvine SA, Pettersson J, Goncharoff N, Hersey A, Overington JP (2016) SureChEMBL: a large-scale, chemically annotated patent document database. Nucleic Acids Res 44:D1220\u2013D1228. https:\/\/doi.org\/10.1093\/nar\/gkv1253","journal-title":"Nucleic Acids Res"},{"key":"791_CR19","doi-asserted-by":"publisher","first-page":"2241","DOI":"10.1021\/acs.jcim.1c00151","volume":"61","author":"MJ Falaguera","year":"2021","unstructured":"Falaguera MJ, Mestres J (2021) Identification of the core chemical structure in SureChEMBL patents. J Chem Inf Model 61:2241\u20132247. https:\/\/doi.org\/10.1021\/acs.jcim.1c00151","journal-title":"J Chem Inf Model"},{"key":"791_CR20","unstructured":"Google patents public datasets on BigQuery. https:\/\/console.cloud.google.com\/bigquery?p=patents-public-data. Accessed 01 Aug 2023"},{"key":"791_CR21","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1186\/s13321-020-00456-1","volume":"12","author":"AP Bento","year":"2020","unstructured":"Bento AP, Hersey A, F\u00e9lix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, De Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:51. https:\/\/doi.org\/10.1186\/s13321-020-00456-1","journal-title":"J Cheminform"},{"key":"791_CR22","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s13321-015-0068-4","volume":"7","author":"SR Heller","year":"2015","unstructured":"Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:23. https:\/\/doi.org\/10.1186\/s13321-015-0068-4","journal-title":"J Cheminform"},{"key":"791_CR23","unstructured":"RDKit open-source cheminformatics software. https:\/\/www.rdkit.org. Accessed 01 Aug 2023"},{"key":"791_CR24","doi-asserted-by":"publisher","DOI":"10.1002\/wcms.1680","author":"S Ishida","year":"2023","unstructured":"Ishida S, Aasawat T, Sumita M, Katouda M, Yoshizawa T, Yoshizoe K, Tsuda K, Terayama K (2023) ChemTSv2: functional molecular design using de novo molecule generator. WIREs Comput Mol Sci. https:\/\/doi.org\/10.1002\/wcms.1680","journal-title":"WIREs Comput Mol Sci"},{"key":"791_CR25","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"key":"791_CR26","doi-asserted-by":"crossref","unstructured":"Coulom R (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. In: Computers and games: 5th international conference, CG 2006, Turin, Italy, May 29\u201331, 2006, pp 72\u201383","DOI":"10.1007\/978-3-540-75538-8_7"},{"key":"791_CR27","doi-asserted-by":"publisher","first-page":"D930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, F\u00e9lix E, Magari\u00f1os MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Mara\u00f1\u00f3n M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47:D930\u2013D940. https:\/\/doi.org\/10.1093\/nar\/gky1075","journal-title":"Nucleic Acids Res"},{"key":"791_CR28","doi-asserted-by":"publisher","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","volume":"55","author":"T Sterling","year":"2015","unstructured":"Sterling T, Irwin JJ (2015) ZINC 15\u2014ligand discovery for everyone. J Chem Inf Model 55:2324\u20132337. https:\/\/doi.org\/10.1021\/acs.jcim.5b00559","journal-title":"J Chem Inf Model"},{"key":"791_CR29","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: a next-generation hyperparameter optimization framework. In: KDD \u201919: proceedings of the 25th ACM SIGKDD international conference in knowledge discovery & data mining, AK, Anchorage, USA, 4\u20138 Aug 2019, pp 2623\u20132631","DOI":"10.1145\/3292500.3330701"},{"key":"791_CR30","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1186\/s13321-018-0321-8","volume":"10","author":"D Probst","year":"2018","unstructured":"Probst D, Reymond J-L (2018) A probabilistic molecular fingerprint for big data settings. J Cheminform 10:66. https:\/\/doi.org\/10.1186\/s13321-018-0321-8","journal-title":"J Cheminform"},{"key":"791_CR31","unstructured":"Molecular MHFP fingerprints for cheminformatics applications. https:\/\/github.com\/reymond-group\/mhfp. Accessed 01 Aug 2023"},{"key":"791_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3389\/fphar.2020.565644","volume":"11","author":"D Polykovskiy","year":"2020","unstructured":"Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, Golovanov S, Tatanov O, Belyaev S, Kurbanov R, Artamonov A, Aladinskiy V, Veselov M, Kadurin A, Johansson S, Chen H, Nikolenko S, Aspuru-Guzik A, Zhavoronkov A (2020) Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front Pharmacol 11:1\u201310. https:\/\/doi.org\/10.3389\/fphar.2020.565644","journal-title":"Front Pharmacol"},{"key":"791_CR33","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jcim.8b00839","volume":"59","author":"N Brown","year":"2019","unstructured":"Brown N, Fiscato M, Segler MHS, Vaucher AC (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model 59:1096\u20131108. https:\/\/doi.org\/10.1021\/acs.jcim.8b00839","journal-title":"J Chem Inf Model"},{"key":"791_CR34","doi-asserted-by":"publisher","first-page":"D1373","DOI":"10.1093\/nar\/gkac956","volume":"51","author":"S Kim","year":"2023","unstructured":"Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2023) PubChem 2023 update. Nucleic Acids Res 51:D1373\u2013D1380. https:\/\/doi.org\/10.1093\/nar\/gkac956","journal-title":"Nucleic Acids Res"},{"key":"791_CR35","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1016\/j.ddtec.2004.11.007","volume":"1","author":"CA Lipinski","year":"2004","unstructured":"Lipinski CA (2004) Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 1:337\u2013341. https:\/\/doi.org\/10.1016\/j.ddtec.2004.11.007","journal-title":"Drug Discov Today Technol"},{"key":"791_CR36","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-1-8","volume":"1","author":"P Ertl","year":"2009","unstructured":"Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform 1:8. https:\/\/doi.org\/10.1186\/1758-2946-1-8","journal-title":"J Cheminform"},{"key":"791_CR37","doi-asserted-by":"crossref","unstructured":"McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 [stat.ML]","DOI":"10.21105\/joss.00861"},{"key":"791_CR38","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00861","volume":"3","author":"L McInnes","year":"2018","unstructured":"McInnes L, Healy J, Saul N, Gro\u00dfberger L (2018) UMAP: uniform manifold approximation and projection. J Open Source Softw 3:861. https:\/\/doi.org\/10.21105\/joss.00861","journal-title":"J Open Source Softw"},{"key":"791_CR39","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1038\/nchem.1243","volume":"4","author":"GR Bickerton","year":"2012","unstructured":"Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL, Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90\u201398. https:\/\/doi.org\/10.1038\/nchem.1243","journal-title":"Nat Chem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00791-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00791-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00791-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T09:09:32Z","timestamp":1702458572000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00791-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,13]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["791"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00791-z","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,13]]},"assertion":[{"value":"25 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"120"}}