Abstract
Collaborative chat tools and large text corpora are ubiquitous in today’s world of real-time communication. As micro teams and start-ups adopt such tools, there is a need to understand the meaning (even at a high level) of chat conversations within collaborative teams. In this study, we propose a technique to segment chat conversations to increase the number of words available (19% on average) for text mining purposes. Using an open source dataset, we answer the question of whether having more words available for text mining can produce more useful information to the end user. Our technique can help micro-teams and start-ups with limited resources to efficiently model their conversations to afford a higher degree of readability and comprehension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fitting Linear Models. http://bit.ly/2dvqYet
Texting Statistics (2015). http://bit.ly/2kjHeF8
Improving the Consumer E-commerce Experience Through Text Mining (2015). http://bit.ly/2z8eYyv
We Just Don’t Speak Anymore (2015). http://bit.ly/2yDXzJ6
Expect More Chatbots (2016). http://bit.ly/2z771cJ
How to Deal with Social Media Overwhelm (2016). http://bit.ly/2yN5e8r
Gain Business Insight with Big Data (2017). http://bit.ly/2zPxmcC
Qualitative Sample Size (2017). http://bit.ly/2hWeh3R
Social Messaging: Catalysing the Next Wave of Digital Revolution in Communication (2017). http://bit.ly/2FekIpz
Stopword Lists (2017). http://bit.ly/2jwKvDa
Ubuntu IRC Logs (2017). https://irclogs.ubuntu.com/
The Value and Benefits of Text Mining (2017). http://bit.ly/2zJcDcl
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)
Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)
Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)
Galton, F.: Regression towards mediocrity in hereditary stature. J. Anthropol. Inst. Great Br. Irel. 15, 246–263 (1886)
Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Jivani, A.G., et al.: A comparative study of stemming algorithms. Int. J. Comput. Technol. Appl. 2(6), 1930–1938 (2011)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Dartmouth Publishing Group, London (1967)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968)
Luhn, H.P.: Key word-in-context index for technical literature (KWIC index). J. Assoc. Inf. Sci. Technol. 11(4), 288–295 (1960)
Manning, D.A.C.: Introduction. In: Manning, D.A.C. (ed.) Introduction to Industrial Minerals, pp. 1–16. Springer, Dordrecht (1995). https://doi.org/10.1007/978-94-011-1242-0_1
Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th Acm International Conference on Information and Knowledge Management, pp. 183–188. ACM (2011)
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016)
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: VS@ HLT-NAACL, pp. 192–200 (2015)
Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 4, pp. 1106–1110. Association for Computational Linguistics (1992)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015)
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014)
Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114. ACM (2016)
Acknlowdgements
The authors would like to personally thank the 24 individuals who took part in our topic modelling comprehension experiment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Dunne, J., Malone, D., Penrose, A. (2018). Bundles: A Framework to Optimise Topic Analysis in Real-Time Chat Discourse. In: Rodrigues, A., Fonseca, B., Preguiça, N. (eds) Collaboration and Technology. CRIWG 2018. Lecture Notes in Computer Science(), vol 11001. Springer, Cham. https://doi.org/10.1007/978-3-319-99504-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-99504-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99503-8
Online ISBN: 978-3-319-99504-5
eBook Packages: Computer ScienceComputer Science (R0)