Abstract
The analysis of traditional and social media is a non-trivial task, requiring the input of human analysts for quality. However, the ready availability of electronic resources has led to a large increase in the amounts of such data to be analysed: the quantities of data (tens of thousands of documents per day) mean that the task becomes too substantial for human analysts to perform in reasonable time frames and with good quality control. In this project, we have explored the use of machine-learning techniques to automate elements of this analysis process in a large media-analysis company. Our classifiers perform in the range of 60%–90%, where an average agreement between human analysts is around 80%. In this paper, we examine the effect of using active-learning techniques to attempt to reduce the amount of data requiring manual analysis, whilst preserving overall accuracy of the system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press (2006)
Clarke, D., Lane, P., Hender, P.: Developing robust models for favourability analysis. In: A. Balahur, E. Boldrini, A. Montoyo, P. Martinez-Barco (eds.) Proceedings of the Second Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), pp. 44–52. Association for Computational Linguistics, Portland, Oregon (2011)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: C. Ndellec, C. Rouveirol (eds.) Machine Learning: ECML-98, Lecture Notes in Computer Science, vol. 1398, pp. 137–142. Springer Berlin / Heidelberg (1998)
Kim, S., Han, K., Rim, H., Myaeng, S.: Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering 18, 1457–1466 (2006)
Krippendorff, K.: Content analysis: An introduction to its methodology. Sage Publications, Inc (2004)
Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine learning 30(2), 195–215 (1998)
Lane, P.C.R., Lyon, C.M., Malcolm, J.A.: Demonstration of the Ferret plagiarism detector. In: Proceedings of the Second International Plagiarism Conference (2006)
McCallum, A., Nigam, K.: Employing em in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, pp. 350–358. Madison, US (1998)
Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pp. 1275–1284. ACM, New York, NY, USA (2009)
Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 330–335 (2005)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86. Association for Computational Linguistics (2002)
Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 287–294 (1992)
Tatzl, G.,Waldhauser, C.: Aggregating opinions: Explorations into Graphs and Media Content Analysis. ACL 2010 p. 93 (2010)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)
Wu, T., Lin, C., Weng, R.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)
Yang, B., Sun, J.T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: The 15th ACM SIGKDD Conference On Knowledge Discovery and Data Mining (KDD) (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this paper
Cite this paper
Clarke, D., Lane, P.C., Hender, P. (2011). Semi-Automatic Analysis of Traditional Media with Machine Learning. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_25
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2318-7_25
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2317-0
Online ISBN: 978-1-4471-2318-7
eBook Packages: Computer ScienceComputer Science (R0)