Interval Semi-supervised LDA: Classifying Needles in a Haystack | SpringerLink
Skip to main content

Interval Semi-supervised LDA: Classifying Needles in a Haystack

  • Conference paper
Advances in Artificial Intelligence and Its Applications (MICAI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

  • 1393 Accesses

Abstract

An important text mining problem is to find, in a large collection of texts, documents related to specific topics and then discern further structure among the found texts. This problem is especially important for social sciences, where the purpose is to find the most representative documents for subsequent qualitative interpretation. To solve this problem, we propose an interval semi-supervised LDA approach, in which certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. We present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)

    MATH  Google Scholar 

  2. Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101 (suppl. 1), 5228–5335 (2004)

    Article  Google Scholar 

  3. Blei, D.M., Lafferty, J.D.: Correlated topic models. Advances in Neural Information Processing Systems 18 (2006)

    Google Scholar 

  4. Li, S.Z.: Markov Random Field Modeling in Image Analysis. Advances in Pattern Recognition. Springer (2009)

    Google Scholar 

  5. Chang, J., Blei, D.M.: Hierarchical relational models for document networks. Annals of Applied Statistics 4(1), 124–150 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 424–433. ACM, New York (2006)

    Chapter  Google Scholar 

  7. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113–120. ACM, New York (2006)

    Google Scholar 

  8. Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)

    Google Scholar 

  9. Blei, D.M., McAuliffe, J.D.: Supervised topic models. Advances in Neural Information Processing Systems 22 (2007)

    Google Scholar 

  10. Lacoste-Julien, S., Sha, F., Jordan, M.I.: DiscLDA: Discriminative learning for dimensionality reduction and classification. In: Advances in Neural Information Processing Systems, vol. 20 (2008)

    Google Scholar 

  11. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp. 487–494. AUAI Press, Arlington (2004)

    Google Scholar 

  12. Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M.: Learning author-topic models from text corpora. ACM Trans. Inf. Syst. 28, 1–38 (2010)

    Article  Google Scholar 

  13. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. Journal of the American Statistical Association 101(476), 1566–1581 (2004)

    Article  MathSciNet  Google Scholar 

  14. Blei, D.M., Jordan, M.I., Griffiths, T.L., Tennenbaum, J.B.: Hierarchical topic models and the nested chinese restaurant process. Advances in Neural Information Processing Systems 13 (2004)

    Google Scholar 

  15. Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Sharing clusters among related groups: Hierarchical Dirichlet processes. Advances in Neural Information Processing Systems 17, 1385–1392 (2005)

    Google Scholar 

  16. Williamson, S., Wang, C., Heller, K.A., Blei, D.M.: The IBP compound Dirichlet process and its application to focused topic modeling. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1151–1158 (2010)

    Google Scholar 

  17. Chen, X., Zhou, M., Carin, L.: The contextual focused topic model. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96–104. ACM, New York (2012)

    Chapter  Google Scholar 

  18. Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proc. 26th Annual International Conference on Machine Learning, ICML 2009, pp. 25–32. ACM, New York (2009)

    Google Scholar 

  19. Andrzejewski, D., Zhu, X.: Latent Dirichlet allocation with topic-in-set knowledge. In: Proc. NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  20. Barth, F.: Introduction. In: Barth, F. (ed.) Ethnic Groups and Boundaries: The Social Organization of Culture Difference, pp. 9–38. George Allen and Unwin, London (1969)

    Google Scholar 

  21. Hechter, M.: Internal colonialism: the Celtic fringe in British national development, pp. 1536–1966. Routledge & Kegan Paul, London (1975)

    Google Scholar 

  22. Hall, S.: Ethnicity: Identity and difference. Radical America 23(4), 9–22 (1991)

    Google Scholar 

  23. Voltmer, K.: The Media in Transitional Democracies. Polity, Cambridge (2013)

    Google Scholar 

  24. Nyamnjoh, F.B.: Africa’s Media, Democracy and the Politics of Belonging. Zed Books, London (2005)

    Google Scholar 

  25. ter Wal, J. (ed.): Racism and cultural diversity in the mass media: An overview of research and examples of good practice in the EU member states, 1995-2000, pp. 1995–2000. European Monitoring Centre on Racism and Xenofobia, Vienna (2002)

    Google Scholar 

  26. Downing, J.D.H., Husbands, C.: Representing Race: Racisms, Ethnicity and the Media. Sage, London (2005)

    Google Scholar 

  27. Wallach, H.M., Murray, I., Salakhutdinov, R., Mimno, D.: Evaluation methods for topic models. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1105–1112. ACM, New York (2009)

    Google Scholar 

  28. Wallach, H.M.: Structured topic models for language. PhD thesis, University of Cambridge (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bodrunova, S., Koltsov, S., Koltsova, O., Nikolenko, S., Shimorina, A. (2013). Interval Semi-supervised LDA: Classifying Needles in a Haystack. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45114-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45113-3

  • Online ISBN: 978-3-642-45114-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics