Semi-Automatic Analysis of Traditional Media with Machine Learning | SpringerLink
Skip to main content

Semi-Automatic Analysis of Traditional Media with Machine Learning

  • Conference paper
  • First Online:
Research and Development in Intelligent Systems XXVIII (SGAI 2011)

Abstract

The analysis of traditional and social media is a non-trivial task, requiring the input of human analysts for quality. However, the ready availability of electronic resources has led to a large increase in the amounts of such data to be analysed: the quantities of data (tens of thousands of documents per day) mean that the task becomes too substantial for human analysts to perform in reasonable time frames and with good quality control. In this project, we have explored the use of machine-learning techniques to automate elements of this analysis process in a large media-analysis company. Our classifiers perform in the range of 60%–90%, where an average agreement between human analysts is around 80%. In this paper, we examine the effect of using active-learning techniques to attempt to reduce the amount of data requiring manual analysis, whilst preserving overall accuracy of the system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  2. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press (2006)

    Google Scholar 

  3. Clarke, D., Lane, P., Hender, P.: Developing robust models for favourability analysis. In: A. Balahur, E. Boldrini, A. Montoyo, P. Martinez-Barco (eds.) Proceedings of the Second Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011), pp. 44–52. Association for Computational Linguistics, Portland, Oregon (2011)

    Google Scholar 

  4. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: C. Ndellec, C. Rouveirol (eds.) Machine Learning: ECML-98, Lecture Notes in Computer Science, vol. 1398, pp. 137–142. Springer Berlin / Heidelberg (1998)

    Google Scholar 

  5. Kim, S., Han, K., Rim, H., Myaeng, S.: Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering 18, 1457–1466 (2006)

    Article  Google Scholar 

  6. Krippendorff, K.: Content analysis: An introduction to its methodology. Sage Publications, Inc (2004)

    Google Scholar 

  7. Kubat, M., Holte, R., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Machine learning 30(2), 195–215 (1998)

    Article  Google Scholar 

  8. Lane, P.C.R., Lyon, C.M., Malcolm, J.A.: Demonstration of the Ferret plagiarism detector. In: Proceedings of the Second International Plagiarism Conference (2006)

    Google Scholar 

  9. McCallum, A., Nigam, K.: Employing em in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, pp. 350–358. Madison, US (1998)

    Google Scholar 

  10. Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pp. 1275–1284. ACM, New York, NY, USA (2009)

    Chapter  Google Scholar 

  11. Osugi, T., Kun, D., Scott, S.: Balancing exploration and exploitation: A new algorithm for active machine learning. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 330–335 (2005)

    Google Scholar 

  12. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)

    Article  Google Scholar 

  13. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pp. 79–86. Association for Computational Linguistics (2002)

    Google Scholar 

  14. Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 287–294 (1992)

    Google Scholar 

  15. Tatzl, G.,Waldhauser, C.: Aggregating opinions: Explorations into Graphs and Media Content Analysis. ACL 2010 p. 93 (2010)

    Google Scholar 

  16. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)

    Google Scholar 

  17. Wu, T., Lin, C., Weng, R.: Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research 5, 975–1005 (2004)

    MathSciNet  MATH  Google Scholar 

  18. Yang, B., Sun, J.T., Wang, T., Chen, Z.: Effective multi-label active learning for text classification. In: The 15th ACM SIGKDD Conference On Knowledge Discovery and Data Mining (KDD) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daoud Clarke .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this paper

Cite this paper

Clarke, D., Lane, P.C., Hender, P. (2011). Semi-Automatic Analysis of Traditional Media with Machine Learning. In: Bramer, M., Petridis, M., Nolle, L. (eds) Research and Development in Intelligent Systems XXVIII. SGAI 2011. Springer, London. https://doi.org/10.1007/978-1-4471-2318-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2318-7_25

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2317-0

  • Online ISBN: 978-1-4471-2318-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics