{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T09:18:22Z","timestamp":1714727902058},"reference-count":35,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T00:00:00Z","timestamp":1665705600000},"content-version":"vor","delay-in-days":13,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/501100008982","name":"National Science Foundation of Sri Lanka","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100008982","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["asistdl.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Proceedings of the Association for Information Science and Technology"],"published-print":{"date-parts":[[2022,10]]},"abstract":"Abstract<\/jats:title>Discovering authoritative links between publications and the datasets that they use can be a labor\u2010intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipeline and then apply it to expand an authoritative bibliography linking thousands of social science studies to the data\u2010related publications in which they are used. The pipeline increases recall for literature to review for inclusion in data\u2010related collections of publications and makes it possible to detect informal data references at scale. We contribute (1) a novel Named Entity Recognition (NER) model that reliably detects informal data references and (2) a dataset connecting items from social science literature with datasets they reference. Together, these contributions enable future work on data reference, data citation networks, and data reuse.<\/jats:p>","DOI":"10.1002\/pra2.614","type":"journal-article","created":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T13:03:06Z","timestamp":1665752586000},"page":"169-178","update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature"],"prefix":"10.1002","volume":"59","author":[{"given":"Sara","family":"Lafia","sequence":"first","affiliation":[{"name":"ICPSR University of Michigan Ann Arbor Michigan USA"}]},{"given":"Lizhou","family":"Fan","sequence":"additional","affiliation":[{"name":"School of Information University of Michigan Ann Arbor Michigan USA"}]},{"given":"Libby","family":"Hemphill","sequence":"additional","affiliation":[{"name":"School of Information University of Michigan Ann Arbor Michigan USA"}]}],"member":"311","published-online":{"date-parts":[[2022,10,14]]},"reference":[{"key":"e_1_2_7_2_1","doi-asserted-by":"publisher","DOI":"10.1086\/686631"},{"key":"e_1_2_7_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33290-6_17"},{"key":"e_1_2_7_4_1","doi-asserted-by":"publisher","DOI":"10.1162\/qss_a_00166"},{"key":"e_1_2_7_5_1","doi-asserted-by":"publisher","DOI":"10.1002\/meet.2011.14504801125"},{"key":"e_1_2_7_6_1","doi-asserted-by":"publisher","DOI":"10.5334\/dsj-2019-009"},{"key":"e_1_2_7_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24454"},{"key":"e_1_2_7_8_1","unstructured":"Fan L. Lafia S. Bleckley D. Moss E. Thomer A. &Hemphill L.(2022).Librarian\u2010in\u2010the\u2010Loop: A Natural Language Processing Paradigm for Detecting Informal Mentions of Research Data in Academic Literature. InarXiv [cs.DL]. arXiv.http:\/\/arxiv.org\/abs\/2203.05112"},{"key":"e_1_2_7_9_1","doi-asserted-by":"publisher","DOI":"10.53731\/r79sf9h-97aq74v-ag4wp"},{"key":"e_1_2_7_10_1","doi-asserted-by":"publisher","DOI":"10.3390\/data6080084"},{"key":"e_1_2_7_11_1","doi-asserted-by":"publisher","DOI":"10.1353\/lib.0.0036"},{"key":"e_1_2_7_12_1","doi-asserted-by":"publisher","DOI":"10.1108\/LHT-12-2016-0158"},{"key":"e_1_2_7_13_1","volume-title":"Ground: A Data Context Service","author":"Hellerstein J. M.","year":"2017"},{"key":"e_1_2_7_14_1","doi-asserted-by":"publisher","DOI":"10.7302\/1639"},{"key":"e_1_2_7_15_1","unstructured":"Honnibal M. Montani I. Van Landeghem S. &Boyd A.(2020).spaCy: Industrial\u2010strength natural language processing in python.https:\/\/spacy.io\/"},{"key":"e_1_2_7_16_1","doi-asserted-by":"publisher","DOI":"10.3389\/frma.2018.00023"},{"key":"e_1_2_7_17_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00028"},{"key":"e_1_2_7_18_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1049096500057607"},{"key":"e_1_2_7_19_1","doi-asserted-by":"publisher","DOI":"10.7302\/1671"},{"key":"e_1_2_7_20_1","doi-asserted-by":"publisher","DOI":"10.1629\/uksg.233"},{"key":"e_1_2_7_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.447"},{"key":"e_1_2_7_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04346-8_62"},{"issue":"1","key":"e_1_2_7_23_1","first-page":"150","article-title":"The location of the citation: changing practices in how publications cite original data in the Dryad Digital Repository","volume":"11","author":"Mayo C.","year":"2016","journal-title":"The location of the citation: changing practices in how publications cite original data in the Dryad Digital Repository."},{"key":"e_1_2_7_24_1","unstructured":"Montani I. &Honnibal M.(2018).Prodigy: A new annotation tool for radically efficient machine teaching.https:\/\/prodi.gy\/"},{"key":"e_1_2_7_25_1","doi-asserted-by":"publisher","DOI":"10.1087\/20110204"},{"key":"e_1_2_7_26_1","first-page":"54","article-title":"Conjectures on world literature","author":"Moretti F.","year":"2000","journal-title":"New Left Review"},{"key":"e_1_2_7_27_1","first-page":"47","volume-title":"Big Data, Big Challenges in Evidence\u2010based Policy Making","author":"Moss E.","year":"2015"},{"key":"e_1_2_7_28_1","unstructured":"Moss E. &Lyle J.(2018).Opaque data citation: Actual citation practice and its implication for tracking data use.https:\/\/deepblue.lib.umich.edu\/handle\/2027.42\/142393"},{"key":"e_1_2_7_29_1","first-page":"81","article-title":"Citances: Citation sentences for semantic analysis of bioscience text","volume":"4","author":"Nakov P. I.","year":"2004","journal-title":"Proceedings of the SIGIR"},{"key":"e_1_2_7_30_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24049"},{"key":"e_1_2_7_31_1","doi-asserted-by":"publisher","DOI":"10.5334\/dsj-2017-008"},{"key":"e_1_2_7_32_1","doi-asserted-by":"publisher","DOI":"10.1029\/2017eo082377"},{"key":"e_1_2_7_33_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0048753"},{"key":"e_1_2_7_34_1","doi-asserted-by":"crossref","unstructured":"Sadvilkar N. &Neumann M.(2020).PySBD: Pragmatic Sentence Boundary Disambiguation. InarXiv [cs.CL]. arXiv.http:\/\/arxiv.org\/abs\/2010.09657","DOI":"10.18653\/v1\/2020.nlposs-1.15"},{"key":"e_1_2_7_35_1","doi-asserted-by":"publisher","DOI":"10.1162\/99608f92.df2262f5"},{"key":"e_1_2_7_36_1","doi-asserted-by":"publisher","DOI":"10.1177\/0162243907306704"}],"container-title":["Proceedings of the Association for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/pra2.614","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/pra2.614","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/pra2.614","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T03:24:46Z","timestamp":1692329086000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/pra2.614"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10]]},"references-count":35,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["10.1002\/pra2.614"],"URL":"https:\/\/doi.org\/10.1002\/pra2.614","archive":["Portico"],"relation":{},"ISSN":["2373-9231","2373-9231"],"issn-type":[{"value":"2373-9231","type":"print"},{"value":"2373-9231","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10]]},"assertion":[{"value":"2022-07-14","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}