Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data | SpringerLink
Skip to main content

Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

Abstract

There is considerable interest among both researchers and the mass public in understanding the topics of discussion on social media as they occur over time. Scholars have thoroughly analysed sampling-based topic modelling approaches for various text corpora including social media; however, another LDA topic modelling implementation—Variational Bayesian (VB)—has not been well studied, despite its known efficiency and its adaptability to the volume and dynamics of social media data. In this paper, we examine the performance of the VB-based topic modelling approach for producing coherent topics, and further, we extend the VB approach by proposing a novel time-sensitive Variational Bayesian implementation, denoted as TVB. Our newly proposed TVB approach incorporates time so as to increase the quality of the generated topics. Using a Twitter dataset covering 8 events, our empirical results show that the coherence of the topics in our TVB model is improved by the integration of time. In particular, through a user study, we find that our TVB approach generates less mixed topics than state-of-the-art topic modelling approaches. Moreover, our proposed TVB approach can more accurately estimate topical trends, making it particularly suitable to assist end-users in tracking emerging topics on social media.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A mixed topic contains keywords pertaining to multiple different topic themes.

  2. 2.

    Considering that the number of topics is 10, the top 2 and 7 most coherent topics are reasonable choices for a comprehensive coherence evaluation.

  3. 3.

    3 mutual words in the top 10 words is a reasonable minimum number to indicate a similar topic.

  4. 4.

    p-values (\(\text {p}<0.05\)) are calculated by the t-test using 10 models of each two approaches.

References

  1. Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.-P., Yan, H., Li, X.: Comparing twitter and traditional media using topic models. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20161-5_34

    Chapter  Google Scholar 

  2. Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the SIGIR (2013)

    Google Scholar 

  3. Fang, A., Ounis, I., Habel, P., Macdonald, C., Limsopatham, N.: Topic-centric classification of Twitter user’s political orientation. In: Proceedings of the SIGIR (2015)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228–5235 (2004)

    Article  Google Scholar 

  6. Blei, D.M., Jordan, M.I.: Variational methods for the Dirichlet process. In: Proceedings of the ICML (2004)

    Google Scholar 

  7. Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of SIGKDD (2006)

    Google Scholar 

  8. Hong, L., Dom, B., Gurumurthy, S., Tsioutsiouliklis, K.: A time-dependent topic model for multiple text streams. In: Proceedings of the SIGKDD (2011)

    Google Scholar 

  9. Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. In: Proceedings of the TKDE (2014)

    Google Scholar 

  10. Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: Proceedings of the AAAI (2015)

    Google Scholar 

  11. Weng, J., Lim, E.P., Jiang, J., He, Q.: TwitterRank: finding topic-sensitive influential twitterers. In: Proceedings of the ICWSM (2010)

    Google Scholar 

  12. Braun, M., McAuliffe, J.: Variational inference for large-scale models of discrete choice. J. Am. Stat. Assoc. 105, 324–335 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  13. Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of the ICML (2006)

    Google Scholar 

  14. Guolo, A., Varin, C., et al.: Beta regression for time series analysis of bounded data. Ann. Appl. Stat. 8, 74–88 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Asuncion, A., Welling, M., Smyth, P., Teh, Y.W.: On smoothing and inference for topic models. In: Proceedings of the CUAI, pp. 27–34 (2009)

    Google Scholar 

  16. Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the NAACL-HLT (2015)

    Google Scholar 

  17. Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. In: Proceedings of the TACL (2015)

    Google Scholar 

  18. Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the SIGIR (2016)

    Google Scholar 

  19. Wang, C., Blei, D., Heckerman, D.: Continuous time dynamic topic models. In: Proceeding of the CUAI (2008)

    Google Scholar 

  20. Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Proceedings of the NIPS (2010)

    Google Scholar 

  21. Johnson, N.L., Kotz, S., Balakrishnan, N.: Beta distributions. In: Continuous Univariate Distributions, vol. 2 (1995)

    Google Scholar 

  22. Fang, A., Macdonald, C., Ounis, I., Habel, P.: Topics in tweets: a user study of topic coherence metrics for Twitter data. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 492–504. Springer, Cham (2016). doi:10.1007/978-3-319-30671-1_36

    Chapter  Google Scholar 

  23. Fang, A., Macdonald, C., Ounis, I., Habel, P.: Using word embedding to evaluate the coherence of topics from twitter data. In: Proceedings of the SIGIR (2016)

    Google Scholar 

  24. Fang, A., Macdonald, C., Ounis, I., Habel, P.: Examining the coherence of the top ranked tweet topics. In: Proceedings of the SIGIR (2016)

    Google Scholar 

  25. AlSumait, L., Barbará, D., Gentle, J., Domeniconi, C.: Topic significance ranking of LDA generative models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS (LNAI), vol. 5781, pp. 67–82. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04180-8_22

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anjie Fang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Fang, A., Macdonald, C., Ounis, I., Habel, P., Yang, X. (2017). Exploring Time-Sensitive Variational Bayesian Inference LDA for Social Media Data. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics