{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T07:54:40Z","timestamp":1721634880145},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2018,1,10]],"date-time":"2018-01-10T00:00:00Z","timestamp":1515542400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-15-63785"],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000879","name":"Alfred P. Sloan Foundation","doi-asserted-by":"publisher","award":["G-2015-14017"],"id":[{"id":"10.13039\/100000879","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000030","name":"Centers for Disease Control and Prevention","doi-asserted-by":"publisher","award":["NU90TP000546"],"id":[{"id":"10.13039\/100000030","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,12,1]]},"abstract":"Abstract<\/jats:title>Objective<\/jats:title>We developed a system for the discovery of foodborne illness mentioned in online Yelp restaurant reviews using text classification. The system is used by the New York City Department of Health and Mental Hygiene (DOHMH) to monitor Yelp for foodborne illness complaints.<\/jats:p><\/jats:sec>Materials and Methods<\/jats:title>We built classifiers for 2 tasks: (1) determining if a review indicated a person experiencing foodborne illness and (2) determining if a review indicated multiple people experiencing foodborne illness. We first developed a prototype classifier in 2012 for both tasks using a small labeled dataset. Over years of system deployment, DOHMH epidemiologists labeled 13\u2009526 reviews selected by this classifier. We used these biased data and a sample of complementary reviews in a principled bias-adjusted training scheme to develop significantly improved classifiers. Finally, we performed an error analysis of the best resulting classifiers.<\/jats:p><\/jats:sec>Results<\/jats:title>We found that logistic regression trained with bias-adjusted augmented data performed best for both classification tasks, with F1-scores of 87% and 66% for tasks 1 and 2, respectively.<\/jats:p><\/jats:sec>Discussion<\/jats:title>Our error analysis revealed that the inability of our models to account for long phrases caused the most errors. Our bias-adjusted training scheme illustrates how to improve a classification system iteratively by exploiting available biased labeled data.<\/jats:p><\/jats:sec>Conclusions<\/jats:title>Our system has been instrumental in the identification of 10 outbreaks and 8523 complaints of foodborne illness associated with New York City restaurants since July 2012. Our evaluation has identified strong classifiers for both tasks, whose deployment will allow DOHMH epidemiologists to more effectively monitor Yelp for foodborne illness investigations.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocx093","type":"journal-article","created":{"date-parts":[[2017,9,25]],"date-time":"2017-09-25T19:13:08Z","timestamp":1506366788000},"page":"1586-1592","source":"Crossref","is-referenced-by-count":35,"title":["Discovering foodborne illness in online restaurant reviews"],"prefix":"10.1093","volume":"25","author":[{"given":"Thomas","family":"Effland","sequence":"first","affiliation":[{"name":"Computer Science Department, Data Science Institute, Columbia University, New York, NY, USA"}]},{"given":"Anna","family":"Lawson","sequence":"additional","affiliation":[{"name":"Computer Science Department, Data Science Institute, Columbia University, New York, NY, USA"}]},{"given":"Sharon","family":"Balter","sequence":"additional","affiliation":[{"name":"Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, NY, USA"}]},{"given":"Katelynn","family":"Devinney","sequence":"additional","affiliation":[{"name":"Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, NY, USA"}]},{"given":"Vasudha","family":"Reddy","sequence":"additional","affiliation":[{"name":"Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, NY, USA"}]},{"given":"HaeNa","family":"Waechter","sequence":"additional","affiliation":[{"name":"Bureau of Communicable Disease, New York City Department of Health and Mental Hygiene, Queens, NY, USA"}]},{"given":"Luis","family":"Gravano","sequence":"additional","affiliation":[{"name":"Computer Science Department, Data Science Institute, Columbia University, New York, NY, USA"}]},{"given":"Daniel","family":"Hsu","sequence":"additional","affiliation":[{"name":"Computer Science Department, Data Science Institute, Columbia University, New York, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,1,10]]},"reference":[{"issue":"1","key":"2020110613022650500_ocx093-B1","doi-asserted-by":"crossref","first-page":"16","DOI":"10.3201\/eid1701.P21101","article-title":"E. Foodborne illness acquired in the United States: unspecified agents","volume":"17","author":"Scallan","year":"2011","journal-title":"Emerg Infect Dis."},{"issue":"2","key":"2020110613022650500_ocx093-B2","first-page":"1","article-title":"Surveillance for foodborne disease outbreaks: United States, 1998\u20132008","volume":"62","author":"Gould","year":"2013","journal-title":"MMWR Surveill Summ."},{"issue":"10","key":"2020110613022650500_ocx093-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1004513","article-title":"Combining search, social media, and traditional data sources to improve influenza surveillance","volume":"11","author":"Santillana","year":"2015","journal-title":"PLoS Comput Biol."},{"issue":"135","key":"2020110613022650500_ocx093-B4","first-page":"1","article-title":"Comparing timeliness, content, and disease severity of formal and informal source outbreak reporting","volume":"15","author":"Bahk","year":"2015","journal-title":"BMC Infect Dis."},{"issue":"2","key":"2020110613022650500_ocx093-B5","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1197\/jamia.M2544","article-title":"HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports","volume":"15","author":"Freifeld","year":"2008","journal-title":"J Am Med Inform Assoc."},{"issue":"32","key":"2020110613022650500_ocx093-B6","first-page":"681","article-title":"Health department use of social media to identify foodborne illness: Chicago, Illinois, 2013\u20132014","volume":"63","author":"Harris","year":"2014","journal-title":"MMWR Morb Mortal Wkly Rep."},{"key":"2020110613022650500_ocx093-B7","first-page":"3982","article-title":"Deploying nEmesis: preventing foodborne illness by data mining social media","author":"Sadilek","journal-title":"Proc Conf AAAI Artif Intell"},{"key":"2020110613022650500_ocx093-B8","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.ypmed.2014.08.003","article-title":"Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports","volume":"67","author":"Nsoesie","year":"2014","journal-title":"Prev Med."},{"issue":"20","key":"2020110613022650500_ocx093-B9","first-page":"441","article-title":"Using online reviews by restaurant patrons to identify unreported cases of food-borne illness: New York City, 2012\u20132013","volume":"63","author":"Harrison","year":"2014","journal-title":"MMWR Morb Mortal Wkly Rep."},{"key":"2020110613022650500_ocx093-B10","volume-title":"C4.5: Programs for Machine Learning","author":"Quinlan","year":"1993"},{"key":"2020110613022650500_ocx093-B11","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139924801","volume-title":"Mining of Massive Datasets","author":"Leskovec","year":"2014"},{"key":"2020110613022650500_ocx093-B12","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The regression analysis of binary sequences with discussion","volume":"20","author":"Cox","year":"1958","journal-title":"J R Stat Soc Series B Stat Methodol."},{"issue":"1","key":"2020110613022650500_ocx093-B13","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"1997","journal-title":"Mach Learn."},{"issue":"3","key":"2020110613022650500_ocx093-B14","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach Learn."},{"key":"2020110613022650500_ocx093-B15","doi-asserted-by":"crossref","DOI":"10.1201\/9780429246593","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1994"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/12\/1586\/34150497\/ocx093.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/25\/12\/1586\/34150497\/ocx093.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,27]],"date-time":"2024-06-27T07:04:54Z","timestamp":1719471894000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/25\/12\/1586\/4725036"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,1,10]]},"references-count":15,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2018,1,10]]},"published-print":{"date-parts":[[2018,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocx093","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,12]]},"published":{"date-parts":[[2018,1,10]]}}}