{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T13:28:36Z","timestamp":1742390916007},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2014,5,14]],"date-time":"2014-05-14T00:00:00Z","timestamp":1400025600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2014,12]]},"abstract":"Abstract<\/jats:title>\n \n Background<\/jats:title>\n Research efforts in the field of descriptive and predictive Quantitative Structure-Activity Relationships or Quantitative Structure\u2013Property Relationships produce around one thousand scientific publications annually. All the materials and results are mainly communicated using printed media. The printed media in its present form have obvious limitations when they come to effectively representing mathematical models, including complex and non-linear, and large bodies of associated numerical chemical data. It is not supportive of secondary information extraction or reuse efforts while in silico<\/jats:italic> studies poses additional requirements for accessibility, transparency and reproducibility of the research. This gap can and should be bridged by introducing domain-specific digital data exchange standards and tools. The current publication presents a formal specification of the quantitative structure-activity relationship data organization and archival format called the QSAR DataBank (QsarDB for shorter, or QDB for shortest).<\/jats:p>\n <\/jats:sec>\n \n Results<\/jats:title>\n The article describes QsarDB data schema, which formalizes QSAR concepts (objects and relationships between them) and QsarDB data format, which formalizes their presentation for computer systems. The utility and benefits of QsarDB have been thoroughly tested by solving everyday QSAR and predictive modeling problems, with examples in the field of predictive toxicology, and can be applied for a wide variety of other endpoints. The work is accompanied with open source reference implementation and tools.<\/jats:p>\n <\/jats:sec>\n \n Conclusions<\/jats:title>\n The proposed open data, open source, and open standards design is open to public and proprietary extensions on many levels. Selected use cases exemplify the benefits of the proposed QsarDB data format. General ideas for future development are discussed.<\/jats:p>\n <\/jats:sec>","DOI":"10.1186\/1758-2946-6-25","type":"journal-article","created":{"date-parts":[[2014,5,14]],"date-time":"2014-05-14T23:12:10Z","timestamp":1400109130000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["QSAR DataBank - an approach for the digital organization and archiving of QSAR model information"],"prefix":"10.1186","volume":"6","author":[{"given":"Villu","family":"Ruusmann","sequence":"first","affiliation":[]},{"given":"Sulev","family":"Sild","sequence":"additional","affiliation":[]},{"given":"Uko","family":"Maran","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2014,5,14]]},"reference":[{"key":"602_CR1","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1002\/minf.201000061","volume":"29","author":"A Tropsha","year":"2010","unstructured":"Tropsha A: Best practices for QSAR model development, validation, and exploitation. Mol Inf. 2010, 29: 476-488. 10.1002\/minf.201000061.","journal-title":"Mol Inf"},{"key":"602_CR2","first-page":"241","volume":"20","author":"JC Dearden","year":"2009","unstructured":"Dearden JC, Cronin MT, Kaiser KL: How not to develop a quantitative structure-activity or structure\u2013property relationship (QSAR\/QSPR). SAR QSAR. Environ Res. 2009, 20: 241-266.","journal-title":"Environ Res"},{"key":"602_CR3","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1023\/A:1025358319677","volume":"17","author":"TR Stouch","year":"2003","unstructured":"Stouch TR, Kenyon JR, Johnson SR, Chen XQ, Doweyko A, Li Y: In silico ADME\/Tox: why models fail. J Comput Aided Mol Des. 2003, 17: 83-92. 10.1023\/A:1025358319677.","journal-title":"J Comput Aided Mol Des"},{"key":"602_CR4","volume-title":"The Grid 2: Blueprint for a New Computing Infrastructure","author":"I Foster","year":"2003","unstructured":"Foster I, Kesselman C: The Grid 2: Blueprint for a New Computing Infrastructure. 2003, San Francisco, CA: Morgan Kaufmann Publishers Inc."},{"key":"602_CR5","unstructured":"Open Computing GRID for Molecular Science and Engineering (OpenMolGRID); EU 5-th FP, # IST-2001-37238, duration 2002\u20132005. [http:\/\/www.openmolgrid.org]"},{"key":"602_CR6","doi-asserted-by":"publisher","first-page":"953","DOI":"10.1021\/ci050354f","volume":"46","author":"S Sild","year":"2006","unstructured":"Sild S, Maran U, Lomaka A, Karelson M: Open computing grid for molecular science and engineering. J Chem Inf Model. 2006, 46: 953-959. 10.1021\/ci050354f.","journal-title":"J Chem Inf Model"},{"key":"602_CR7","first-page":"464","volume-title":"Advances in Grid Computing","author":"S Sild","year":"2005","unstructured":"Sild S, Maran U, Romberg M, Schuller B, Benfenati E: OpenMolGRID: Using Automated Workflows in GRID Computing Environment. Advances in Grid Computing. Edited by: Sloot PMA, Hoekstra AG, Priol T, Reinefeld A, Bubak M. 2005, Berlin Heidelberg, LNCS 3470: Springer-Verlag, 464-473."},{"key":"602_CR8","unstructured":"CODESSA PRO. [http:\/\/www.codessa-pro.com]"},{"key":"602_CR9","unstructured":"Grid services based environment to enable innovative research (CHEMOMENTUM), EU 6FP, # IST-5-033437, duration 2006\u20132009. [http:\/\/www.chemomentum.org]"},{"key":"602_CR10","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1007\/978-3-540-78474-6_12","volume-title":"Theoretical Computer Science and General Issues (Euro-Par 2007 Workshops: Parallel Processing)","author":"B Schuller","year":"2008","unstructured":"Schuller B, Demuth B, Mix H, Rasch K, Romberg M, Sild S, Maran U, Ba\u0142a P, del Grosso E, Casalegno M, Piclin N, Pintore M, Sudholt W, Baldridge KK: Chemomentum - UNICORE 6 based infrastructure for complex applications in science and technology. Theoretical Computer Science and General Issues (Euro-Par 2007 Workshops: Parallel Processing). Edited by: Boug\u00e9 L, Forsell M, Larsson Tr\u00e4ff J, Streit A, Ziegler W, Alexander M, Childs S. 2008, Berlin Heidelberg, LNCS 4854: Springer-Verlag, 82-93."},{"key":"602_CR11","unstructured":"QSAR Model Reporting Format (QMRF), Version 1.2. [http:\/\/ihcp.jrc.ec.europa.eu\/our_labs\/computational_toxicology\/qsar_tools\/qrf\/QMRF_version_1.2.pdf]"},{"key":"602_CR12","unstructured":"OECD Principles For The Validation, For Regulatory Purposes, Of (Quantitative) Structure-Activity Relationship Models. [http:\/\/www.oecd.org\/chemicalsafety\/assessmentofchemicals\/37849783.pdf]"},{"key":"602_CR13","unstructured":"(Q)SAR Model Reporting Format Inventory. [http:\/\/qsardb.jrc.it\/qmrf\/]"},{"key":"602_CR14","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/1758-2946-2-5","volume":"2","author":"O Spjuth","year":"2010","unstructured":"Spjuth O, Willighagen EL, Guha R, Eklund M, Wikberg JES: Towards interoperable and reproducible QSAR analyses: exchange of datasets. J Cheminf. 2010, 2: 5-10.1186\/1758-2946-2-5.","journal-title":"J Cheminf"},{"key":"602_CR15","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1186\/1471-2105-10-397","volume":"10","author":"O Spjuth","year":"2009","unstructured":"Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, M\u00e4sak C, Torrance G, Wagener J, Willighagen EL, Steinbeck C, Wikberg JES: Bioclipse 2: a scriptable integration platform for the life sciences. BMC Bioinformatics. 2009, 10: 397-10.1186\/1471-2105-10-397.","journal-title":"BMC Bioinformatics"},{"key":"602_CR16","unstructured":"CTfile Formats. [http:\/\/download.accelrys.com\/freeware\/ctfile-formats\/ctfile-formats.zip]"},{"key":"602_CR17","unstructured":"Convention Over Configuration. [http:\/\/en.wikipedia.org\/wiki\/Convention_over_configuration]"},{"key":"602_CR18","unstructured":"Revision Control. [http:\/\/en.wikipedia.org\/wiki\/Revision_control]"},{"key":"602_CR19","unstructured":"QsarDB Java Reference Implementation (Java RI). [http:\/\/github.com\/qsardb\/qsardb]"},{"key":"602_CR20","unstructured":"QsarDB GUI and Command-line Applications. [http:\/\/github.com\/qsardb\/qsardb-toolkit]"},{"key":"602_CR21","unstructured":"TETRATOX primary publications. [http:\/\/hdl.handle.net\/10967\/7]"},{"key":"602_CR22","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1007\/s10822-013-9664-4","volume":"27","author":"V Ruusmann","year":"2013","unstructured":"Ruusmann V, Maran U: From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions. J Comput Aided Mol Des. 2013, 27: 583-603. 10.1007\/s10822-013-9664-4.","journal-title":"J Comput Aided Mol Des"},{"key":"602_CR23","unstructured":"Marvin 5.5.0, ChemAxon. [http:\/\/www.chemaxon.com]"},{"key":"602_CR24","unstructured":"Check Digit Verification. [https:\/\/www.cas.org\/content\/chemical-substances\/checkdig]"},{"key":"602_CR25","unstructured":"NCI\/CADD Chemical Identifier Resolver. [http:\/\/cactus.nci.nih.gov\/chemical\/structure\/documentation]"},{"key":"602_CR26","unstructured":"Apache Ant. [http:\/\/ant.apache.org\/]"},{"key":"602_CR27","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1021\/ci025584y","volume":"43","author":"C Steinbeck","year":"2003","unstructured":"Steinbeck C, Han Y, Kuhn S, Horlacher O: Luttmann\u2019 E, Willighagen E: The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics. J Chem Inf Comput Sci. 2003, 43: 493-500. 10.1021\/ci025584y.","journal-title":"J Chem Inf Comput Sci"},{"key":"602_CR28","unstructured":"R project. [http:\/\/www.r-project.org\/]"},{"key":"602_CR29","unstructured":"QsarDB R API. [http:\/\/r-qsardb.googlecode.com]"},{"key":"602_CR30","unstructured":"QsarDB Repository. [http:\/\/www.qsardb.org\/repository]"},{"key":"602_CR31","first-page":"217","volume-title":"Annual Reports in Computational Chemistry","author":"EE Bolton","year":"2008","unstructured":"Bolton EE, Wang Y, Thiessen PA, Bryant SH: PubChem: integrated platform of small molecules and biological activities. Annual Reports in Computational Chemistry. Edited by: Ralph AW, David CS. 2008, Amsterdam Oxford: Elsevier, 217-241."},{"issue":"11","key":"602_CR32","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1021\/ed100697w","volume":"87","author":"H Pence","year":"2010","unstructured":"Pence H, Williams A: ChemSpider: an online chemical information resource. J Chem Educ. 2010, 87 (11): 1123-1124. 10.1021\/ed100697w.","journal-title":"J Chem Educ"},{"key":"602_CR33","unstructured":"IUPAC project no. 2001-043-1-800. [http:\/\/www.iupac.org\/web\/ins\/2001-043-1-800]"},{"key":"602_CR34","unstructured":"Data Mining Group. [http:\/\/www.dmg.org]"},{"key":"602_CR35","unstructured":"BibTeX tools. [http:\/\/www.ctan.org\/tex-archive\/biblio\/bibtex\/]"},{"key":"602_CR36","first-page":"257","volume-title":"Proceedings of the IEEE 77","author":"O Patashnik","year":"1988","unstructured":"Patashnik O: BibTeXing. In Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77. 1988, 257-286."},{"key":"602_CR37","unstructured":"Units of Measurements. [http:\/\/en.wikipedia.org\/wiki\/Units_of_measurement]"},{"key":"602_CR38","unstructured":"Unified Code for Units of Measure. [http:\/\/unitsofmeasure.org]"},{"key":"602_CR39","unstructured":"UnitsML. [http:\/\/unitsml.nist.gov]"},{"key":"602_CR40","unstructured":"Chemical Substances - CAS REGISTRY. [http:\/\/www.cas.org\/content\/chemical-substances]"},{"key":"602_CR41","unstructured":"InChI Trust. [http:\/\/www.inchi-trust.org\/]"},{"key":"602_CR42","unstructured":"Chemical MIME. [http:\/\/www.ch.ic.ac.uk\/chemime\/]"},{"key":"602_CR43","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1186\/1758-2946-3-44","volume":"3","author":"P Murray-Rust","year":"2011","unstructured":"Murray-Rust P, Rzepa HS: CML: evolution and design. J Cheminf. 2011, 3: 44-10.1186\/1758-2946-3-44.","journal-title":"J Cheminf"},{"key":"602_CR44","unstructured":"Chemical Markup Language (CML). [http:\/\/www.xml-cml.org\/]"},{"key":"602_CR45","unstructured":"Daylight SMILES. [http:\/\/www.daylight\/dayhtml\/smiles\/index.html]"},{"key":"602_CR46","unstructured":"OpenSMILES. [http:\/\/www.opensmiles.org\/]"},{"key":"602_CR47","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1758-2946-4-22","volume":"4","author":"NM O\u2019Boyle","year":"2012","unstructured":"O\u2019Boyle NM: Towards a universal SMILES representation - a standard method to generate canonical SMILES based on the InChI. J Cheminf. 2012, 4: 22-10.1186\/1758-2946-4-22.","journal-title":"J Cheminf"},{"key":"602_CR48","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model. 1988, 28: 31-36. 10.1021\/ci00057a005.","journal-title":"J Chem Inf Model"},{"key":"602_CR49","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1021\/ci00062a008","volume":"29","author":"D Weininger","year":"1989","unstructured":"Weininger D, Weininger A, Weininger JL: SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Model. 1989, 29: 97-101. 10.1021\/ci00062a008.","journal-title":"J Chem Inf Model"},{"key":"602_CR50","doi-asserted-by":"publisher","first-page":"1465","DOI":"10.1124\/mol.61.6.1465","volume":"61","author":"M Lapinsh","year":"2002","unstructured":"Lapinsh M, Prusis P, Lundstedt T, Wikberg JES: Proteochemometrics modeling of the interaction of amine G-protein coupled receptors with a diverse set of ligands. Mol Pharmacol. 2002, 61: 1465-1475. 10.1124\/mol.61.6.1465.","journal-title":"Mol Pharmacol"},{"key":"602_CR51","unstructured":"Floris F, Willighagen E, Guha R, Rojas M, Hoppe C: The Blue Obelisk Descriptor Ontology. [http:\/\/qsar.sourceforge.net\/dicts\/qsar-descriptors\/index.xhtml]"},{"key":"602_CR52","unstructured":"JOELib\/JOELib2 cheminformatics library. [https:\/\/sourceforge.net\/projects\/joelib\/]"},{"issue":"10","key":"602_CR53","doi-asserted-by":"publisher","first-page":"e25513","DOI":"10.1371\/journal.pone.0025513","volume":"6","author":"J Hastings","year":"2011","unstructured":"Hastings J, Chepelev L, Willighagen E, Adams N, Steinbeck C, Dumontier M: The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web. PLoS ONE. 2011, 6 (10): e25513-10.1371\/journal.pone.0025513.","journal-title":"PLoS ONE"},{"key":"602_CR54","unstructured":"PMML 4.1 - General Structure of a PMML Document. [http:\/\/www.dmg.org\/v4-1\/GeneralStructure.html]"},{"key":"602_CR55","unstructured":"Java PMML API. [http:\/\/www.jpmml.org]"},{"key":"602_CR56","unstructured":"Daylight Theory: SMARTS - A Language for Describing Molecular Patterns. [http:\/\/www.daylight.com\/dayhtml\/doc\/theory\/theory.smarts.html]"},{"key":"602_CR57","first-page":"445","volume":"33","author":"J Jaworska","year":"2005","unstructured":"Jaworska J, Nikolova-Jeliazakova N, Aldenberg T: QSAR applicability domain estimation by projection of the training seti in descriptor space: A review. ATLA. 2005, 33: 445-459.","journal-title":"ATLA"},{"key":"602_CR58","unstructured":"QSAR Prediction Reporting Format (QPRF). [http:\/\/ihcp.jrc.ec.europa.eu\/our_labs\/predictive_toxicology\/qsar_tools\/qrf\/QPRF_version_1.1.pdf]"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-6-25\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-6-25.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-6-25.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T04:05:39Z","timestamp":1630555539000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-6-25"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,5,14]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,12]]}},"alternative-id":["602"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-6-25","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,5,14]]},"assertion":[{"value":"30 December 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2014","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 May 2014","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"25"}}