Abstract
With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can’t achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging’s actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis.




Similar content being viewed by others
References
Xiao, Y., Rayi, V., Sun, B., Du, X., Hu, F., Galloway, M.: A survey of key management schemes in wireless sensor networks. J. Comput. Commun. 30(11–12), 2314–2341 (2007)
Qiu, J., Tian, Z., Du, C., Zuo, Q., Su, S., Fang, B.: A survey on access control in the age of Internet of Things. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2020.2969326
Tian, Z., Gao, X., Su, S., Qiu, J.: Vcash: a novel reputation framework for identifying denial of traffic service in internet of connected vehicles. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2951620
Xiao, Y., Du, X., Zhang, J., Guizani, S.: Internet Protocol Television (IPTV): the killer application for the next generation internet. IEEE Commun. Mag. 45(11), 126–134 (2007)
Li, M., Sun, Y., Lu, H., Maharjan, S., Tian, Z.: Deep reinforcement learning for partially observable data poisoning attack in crowdsensing systems. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2962914
Tian, Z., Su, S., Shi, W., Du, X., Guizani, M., Yu, X.: A data-driven model for future internet route decision modeling. Futur. Gener. Comput. Syst. 95, 212–220 (2019). https://doi.org/10.1016/j.future.2018.12.054
Du, X., Guizani, M., Xiao, Y., Chen, H.H.: Transactions papers, a routing-driven elliptic curve cryptography based key management scheme for heterogeneous sensor networks. IEEE Trans. Wirel. Commun. 8(3), 1223–1229 (2009)
Qiu, J., Du, L., Zhang, D., Su, S., Tian, Z.: Nei-TTE: intelligent traffic time estimation based on fine-grained time derivation of road segments for Smart City. IEEE Trans. Ind Inform. (2019). https://doi.org/10.1109/TII.2019.2943906
Kumaran G, Allan J. Text classification and named entities for new event detection[C]. proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. University of Sheffield, UK, 2004: 297–304
Kumaran G, Allan J. Using names and topics for new event detection[C]. In: Proceedings of the HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6–8 October 2005. Vancouver, British Columbia, Canada. (2005)
Ogilvie P, Allan J, Jensen D, et al.: Extracting and using relationships found in text for topic tracking [J] (2000)
Mei Q, Cai D, Zhang D, et al.: Topic modeling with network regularization[C]. In: Proceedings of the International Conference on World Wide Web, WWW 2008, Beijing, China, April: 101–110 (2008)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation [J]. J Mach. Learn. Res. Arch. 3, 993–1022 (2003)
Li, L., Sun, Y., Han, X., & Wang, C.: [IEEE 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC) – Guangzhou, China (2018.6.18-2018.6.21)] 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC) – Research on Improve Topic Representation over Short Text. (pp. 848–853) (2018)
Nguyen, D. Q., Billingsley, R., Du, L., & Johnson, M.: Improving Topic Models with Latent Feature Word Representations (2018)
Lin T, Tian W, Mei Q, et al.: The dual-sparse topic model: mining focused topics and focused terms in short text[C]. In: Proceedings of the International Conference on World Wide Web, pp. 539–550 (2014)
Zhu J, Zheng X, Zhou L, et al.: Scalable inference in max-margin topic models[C]. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 964–972 (2013)
Chen Y, Amiri H, Li Z, et al.: Emerging topic detection for organizations from microblogs[C]. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52 (2013)
Du J, Jiang J, Song D, et al.: Topic modeling with document relative similarities[C]. In: Proceedings of the International Conference on Artificial Intelligence, pp. 3469–75 (2015)
Daniel R, David H, Ramesh N, et al.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora[C]. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Volume Association for Computational Linguistics, pp. 248–256 (2009)
Bernstein M S, Suh B, Hong L, et al.: Eddi: interactive topic-based browsing of social status streams[C]. In: Proceedings of the ACM Symposium on User Interface Software and Technology, New York, NY, USA, October. pp. 303–312 (2010)
Michelson M, Macskassy S A.: Discovering users’ topics of interest on twitter: a first look[C]. In: Proceedings of the Workshop on Analytics for Noisy Unstructured Text Data, Toronto, Ontario, Canada. DBLP, pp. 73–80 (2010)
Chen, X., Zhou, X., Sellis, T., Li, X.: Social event detection with retweeting behavior correlation. Expert Syst. Appl. 114, 516–523 (2018)
Cui, L., Zhang, X., Zhou, X., et al.: Topical Event Detection on Twitter[C]// Australasian Database Conference. Springer, Cham (2016)
Manna S, Phongpanangam O.: Exploring Topic Models on Short Texts: a Case Study with Crisis Data[C]// IEEE International Conference on Robotic Computing, 2018
Yuan Y , Yao X , Han J , et al. Discriminative Joint-Feature Topic Model With Dual Constraints for WCE Classification[J]. IEEE Transactions on Cybernetics, 2017:1–12.https://doi.org/10.1109/TCYB.2017.2726818
Divya P , Satyanath B , Shirish S , et al. Multi-Label Classification from Multiple Noisy Sources Using Topic Models[J]. Information, 2017, 8(2):52–63
Flaspohler G, Roy N, Girdhar Y.: Feature Discovery and Visualization of Robot Mission Data Using Convolutional Autoencoders and Bayesian Nonparametric Topic Models[C]// IEEE/RSJ International Conference on Intelligent Robots & Systems, (2017)
Li, M., Sun, Y., Lu, H., Maharjan, S., Tian, Z.: Deep reinforcement learning for partially observable data poisoning attack in crowdsensing systems. IEEE Internet of Things J. (2020). https://doi.org/10.1109/JIOT.2019.2962914
Tian, Z., Luo, C., Qiu, J., Du, X., Guizani, M.: A distributed deep learning system for Web attack detection on edge devices. IEEE Trans. Ind. Inform. (2019). https://doi.org/10.1109/TII.2019.2938778
Tian, Z., Shi, W., Wang, Y., Zhu, C., Du, X., Su, S., Sun, Y., Guizani, N.: Real time lateral movement detection based on evidence reasoning network for edge computing environment. IEEE Trans. Ind. Inform. 15(7), 4285–4294 (2019)
Steinbach M, Karypis G, Kumar V.: A comparison of document clustering techniques[C]. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2000)
Tan, Q., Gao, Y., Shi, J., Wang, X., Fang, B., Tian, Z.: Towards a comprehensive insight into the eclipse attacks of Tor hidden services. IEEE Internet of Things J. (2018). https://doi.org/10.1109/JIOT.2018.2846624
David, B., Al, E.: Latent dirichlct allocation [J]. J. Mach. Learn. Res. 3, 993–1002 (2003)
Funding
Funded by NSFC (No. 61972106, U1636215, No.61871140), National Key research and Development Plan (Grant No. 2019QY1406, No. 2018YFB0803504), Guangdong Province Key research and Development Plan (Grant No. 2019B010136003 and No. 2019B010137004). Supported by Guangdong Province Universities and Colleges Pearl River Scholar Funded Scheme (2019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Han, W., Tian, Z., Huang, Z. et al. Topic representation model based on microblogging behavior analysis. World Wide Web 23, 3083–3097 (2020). https://doi.org/10.1007/s11280-020-00822-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-020-00822-x