{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:56:05Z","timestamp":1740149765814,"version":"3.37.3"},"reference-count":36,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2016,4,9]],"date-time":"2016-04-09T00:00:00Z","timestamp":1460160000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"Extracting information from social media has become a major focus of companies and researchers in recent years. Aside from the study of the social aspects, it has also been found feasible to exploit the collaborative strength of crowds to help solve classical machine learning problems like object recognition. In this work, we focus on the generally underappreciated problem of building effective datasets for training classifiers by automatically assembling data from social media. We detail some of the challenges of this approach and outline a framework that uses expanded search queries to retrieve more qualified data. In particular, we concentrate on collaboratively tagged media on the social platform Flickr, and on the problem of image classification to evaluate our approach. Finally, we describe a novel entropy-based method to incorporate an information-theoretic principle to guide our framework. Experimental validation against well-known public datasets shows the viability of this approach and marks an improvement over the state of the art in terms of simplicity and performance.<\/jats:p>","DOI":"10.3390\/e18040130","type":"journal-article","created":{"date-parts":[[2016,4,11]],"date-time":"2016-04-11T16:07:36Z","timestamp":1460390856000},"page":"130","source":"Crossref","is-referenced-by-count":1,"title":["An Informed Framework for Training Classifiers from Social Media"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6523-3540","authenticated-orcid":false,"given":"Dong","family":"Cheng","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Hankuk University of Foreign Studies, 81 Oedae-ro, Mohyeon-myeon, Cheoin-gu, Yongin-si, Gyeonggi-do 449-791, South Korea"}]},{"given":"Sami","family":"Abdulhak","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Verona, Strada Le Grazie 15, I-37134 Verona, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2016,4,9]]},"reference":[{"key":"ref_1","unstructured":"Fei-Fei, L., Fergus, R., and Perona, P. (2003, January 13\u201316). A Bayesian approach to unsupervised one-shot learning of object categories. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France."},{"key":"ref_2","unstructured":"Crowston, K. (2012). Shaping the Future of ICT Research. Methods and Approaches, Springer."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/219717.219748","article-title":"WordNet: A lexical database for English","volume":"38","author":"Miller","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_5","unstructured":"Flickr. Available online: http:\/\/www.flickr.com."},{"key":"ref_6","unstructured":"Ames, M., and Naaman, M. (May, January 30). Why we tag: Motivations for annotation in mobile and online media. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, San Jose, CA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1016\/j.ipm.2014.07.005","article-title":"Computational approaches for mining user\u2019s opinions on the Web 2.0","volume":"50","author":"Petz","year":"2014","journal-title":"Inf. Process. Manag."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kennedy, L.S., Chang, S.F., and Kozintsev, I.V. (2006, January 23\u201327). To search or to label?: Predicting the performance of search-based automatic image classifiers. Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR \u201906), Santa Barbara, CA, USA.","DOI":"10.1145\/1178677.1178712"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mandala, R., Tokunaga, T., and Tanaka, H. (1999, January 15\u201319). Combining multiple evidence from different types of thesaurus for query expansion. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, USA.","DOI":"10.1145\/312624.312677"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_11","unstructured":"Visual Object Classes Challenge 2012. Available online: http:\/\/host.robots.ox.ac.uk\/pascal\/VOC\/voc2012\/index.html."},{"key":"ref_12","unstructured":"Imagenet Large Scale Visual Recognition Challenge 2012. Available online: http:\/\/www.image-net.org\/challenges\/LSVRC\/2012\/index."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1007\/s11263-009-0265-6","article-title":"Optimol: Automatic online picture collection via incremental model learning","volume":"88","author":"Li","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","unstructured":"Wang, G., Hoiem, D., and Forsyth, D. (October, January 29). Learning image similarity from Flickr groups using stochastic intersection kernel machines. Proceedings of the 12th International Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Fergus, R., Fei-Fei, L., Perona, P., and Zisserman, A. (2005, January 17\u201321). Learning Object Categories from Google\u2019s Image Search. Proceedings of the 10th International Conference on Computer Vision, Beijing, China.","DOI":"10.1109\/ICCV.2005.142"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chen, X., Shrivastava, A., and Gupta, A. (2013, January 1\u20138). NEIL: Extracting Visual Knowledge from Web Data. Proceedings of the 14th International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.178"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Divvala, S.K., Farhadi, A., and Guestrin, C. (2014, January 23\u201328). Learning Everything about Anything: Webly-Supervised Visual Concept Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR \u201914), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.412"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, L.J., Wang, G., and Fei-Fei, L. (2007, January 17\u201322). OPTIMOL: Automatic Online Picture collecTion via Incremental Model Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR \u201907), Minneapolis, MN, USA.","DOI":"10.1109\/CVPR.2007.383048"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.cviu.2014.07.005","article-title":"Semantically-driven automatic creation of training sets for object recognition","volume":"131","author":"Cheng","year":"2015","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_20","unstructured":"WordNet. Available online: http:\/\/wordnet.princeton.edu."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sun, A., and Bhowmick, S.S. (2009, January 19\u201324). Image Tag Clarity: In Search of Visual-Representative Tags for Social Images. Proceedings of the 1st SIGMM Workshop on Social Media (WSM \u201909), Beijing, China.","DOI":"10.1145\/1631144.1631150"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Spain, M., and Perona, P. (2008, January 12\u201318). Some Objects Are More Equal than Others: Measuring and Predicting Importance. Proceedings of the 10th European Conference on Computer Vision: Part I (ECCV \u201908), Marseille, France.","DOI":"10.1007\/978-3-540-88682-2_40"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Weinberger, K.Q., Slaney, M., and van Zwol, R. (2008, January 26\u201331). Resolving tag ambiguity. Proceedings of the 16th ACM International Conference on Multimedia, ACM, MM \u201908, Vancouver, BC, Canada.","DOI":"10.1145\/1459359.1459375"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R. (2008, January 23\u201325). Personalized Recommendation in Social Tagging Systems Using Hierarchical Clustering. Proceedings of the 2008 ACM Conference on Recommender Systems (RecSys \u201908), Lausanne, Switzerland.","DOI":"10.1145\/1454008.1454048"},{"key":"ref_25","unstructured":"Hassan-Montero, Y., and Herrero-Solana, V. (2006, January 25\u201328). Improving Tag-Clouds as Visual Information Retrieval Interfaces. Proceedings of the International Conference on Multidisciplinary Information Sciences and Technologies (InSciT2006), M\u00e9rida, Spain."},{"key":"ref_26","unstructured":"Ogden, C.K. (1932). Basic English: A General Introduction with Rules and Grammar, Kegan Paul."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Holzinger, A., and Jurisica, I. (2014). Interactive Knowledge Discovery and Data Mining in Biomedical Informatics, Springer.","DOI":"10.1007\/978-3-662-43968-5"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., and Lenc, K. (2015, January 26\u201330). MatConvNet: Convolutional Neural Networks for MATLAB. Proceedings of the 23rd ACM International Conference on Multimedia (MM \u201915), Brisbane, Australia.","DOI":"10.1145\/2733373.2807412"},{"key":"ref_29","unstructured":"Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (2012). Advances in Neural Information Processing Systems 25, Curran Associates Inc."},{"key":"ref_30","unstructured":"Caltech 256. Available online: http:\/\/www.vision.caltech.edu\/Image_Datasets\/Caltech256\/."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"416","DOI":"10.1109\/TPAMI.2006.54","article-title":"Generic object recognition with boosting","volume":"28","author":"Opelt","year":"2006","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","unstructured":"Griffin, G., Holub, A., and Perona, P. Caltech-256 Object Category Dataset. Available online: http:\/\/authors.library.caltech.edu\/7694\/."},{"key":"ref_33","unstructured":"Caltech 101. Available online: http:\/\/www.vision.caltech.edu\/Image_Datasets\/Caltech101\/."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., and Fulkerson, B. (2010, January 25\u201329). Vlfeat: An Open and Portable Library of Computer Vision Algorithms. Proceedings of the 18th ACM International Conference on Multimedia (MM \u201910), Firenze, Italy.","DOI":"10.1145\/1873951.1874249"},{"key":"ref_35","first-page":"1871","article-title":"LIBLINEAR: A library for large linear classification","volume":"9","author":"Fan","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_36","unstructured":"Digital Trends. Available online: http:\/\/www.digitaltrends.com\/mobile\/shazam-music-app-visual-recognition\/."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/18\/4\/130\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T19:30:15Z","timestamp":1717529415000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/18\/4\/130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,4,9]]},"references-count":36,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2016,4]]}},"alternative-id":["e18040130"],"URL":"https:\/\/doi.org\/10.3390\/e18040130","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2016,4,9]]}}}