Abstract
On account of the characteristics of online Chinese-Vietnamese topic detection, we propose a Chinese-Vietnamese bilingual topic model based on the Recurrent Chinese Restaurant Process and integrated with event elements. First, the event elements, including the characters, the place and the time, will be extracted from the new dynamic bilingual news texts. Then the word pairs are tagged and aligned from the bilingual news and comments. Both the event elements and the aligned words are integrated into RCRP algorithm to construct the proposed bilingual topic detection model. Finally, we use the model to determine if the new documents will be grouped into a new category or classified into the existing categories, as a result, to detect a topic. Through the contrast experiment, the proposed model achieves a good effect on topic detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wang, D., Liu, W., Xu, W.: Topic Tracking Based on Event Network. In: 2011 4th International Conference on Cyber, Physical and Social Computing Internet of Things (iThings/CPSCom), pp. 488–493 (2011)
De Smet, W., Moens, M.F.: Cross-language linking of news stories on the web using interlingual topic modelling. In: Proceedings of the 2nd ACM Workshop on Social Web Search and Mining, pp. 57–64. ACM (2009)
Ni., X., Sun, J.-T., Hu, J., Chen, Z.: Cross Lingual Text Classification by Mining Multilingual Topics From Wikipedia. In: Proceedings of the Fourth ACM International Confernce on Web Search and Data Mining, pp. 375–384. ACM (2011)
Ahmed, A., Xing, E.P.: Dynamic Non-parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering. In: SDM (2008)
Ahmed, A., Ho, Q., Eisenstein, J., et al.: Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web, pp. 267–276. ACM (2011)
Ahmed, Q., Ho, C., Teo, J., Eisenstein, A.J., Smola, E.P.: Xing The Online Infinite Topic-Cluster Model: Storylines From Streaming Text. CMU-ML-11-100 (2011)
Blei, D.M., Andrew, Y.N., Michael, I.J.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Sproat, R., Tao, T., Zhai, C.X.: Named Entity Transliteration with Comparable Corpora. In: Proceeding ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 73–80 (2006)
Espla-Gomis, M., Sanchez-Martinez, F., Forcada, M.L.: A Simple Approach to Use Bilingual Information Sources for Word Alignment. Procesamiento del Lenguaje Natural, 93–100 (2012)
Fahrni, A., Strube, M.: HITS’ Cross-lingual Entity Linking System at TAC 2011:One Model for All Languages. In: Proceeding of Text Analysis Conference, November 14-15 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Long, Wx., Gao, Jx., Yu, Zt., Gao, Sx., Hong, Xd. (2014). Online Chinese-Vietnamese Bilingual Topic Detection Based on RCRP Algorithm with Event Elements. In: Zong, C., Nie, JY., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2014. Communications in Computer and Information Science, vol 496. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45924-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-662-45924-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45923-2
Online ISBN: 978-3-662-45924-9
eBook Packages: Computer ScienceComputer Science (R0)