Abstract
Online discussion forums are a valuable source of knowledge. Users may share or exchange ideas by posting content in the form of questions and answers. With the increasing volume of online content in the form of forums, finding relevant information in forums can be a challenging task and knowledge management and quality assurance of this content are of critical importance. Although online discussion forums offer search services, in most cases only keyword search is provided. In keyword search techniques, such as cosine similarity, lexical overlap between query and document terms is considered; however, these techniques do not consider the context or meaning of the terms, thus failed to retrieve the relevant documents. Earlier content-based research efforts for improving the performance of thread retrieval were primarily based on cosine similarity technique. Cosine similarity technique assigns term-weights based on term-frequency and inverse-document frequency; however, this technique does not consider discussion semantics which may lead to less effective document retrieval. To address these issues, we have proposed two thread ranking techniques for online discussion forums: (1) threads are ranked on the basis of a semantic similarity score between posts and (2) threads are ranked based on their participants’ reputation and posts’ quality. The proposed work provides a performance comparison between semantic similarity techniques and cosine similarity techniques along with reputation and post quality features in thread ranking process. Experimental results obtained using a real online forum dataset demonstrate that the proposed techniques have significantly improved thread ranking performance.




Similar content being viewed by others
References
Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, (2008), pp 665–674
Wan X (2007) A novel document similarity measure based on earth mover’s distance. Inf Sci 177:3718–3730
Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, (2008), pp 183–194
Li B, Jin T, Lyu MR, King I, Mak B (2012) Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference Companion on World Wide Web, (2012), pp 775–782
Li C, Yin J, Zhao J (2014) Using improved ICA method for hyperspectral data classification. Arab J Sci Eng 39:181–189
Cong G, Wang L, Lin CY, Song Y-I, Sun Y (2008) Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 467–474
Singh A, Raghu D (2012) Retrieving similar discussion forum threads: a structure based approach. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2012), pp 135–144
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523
Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp 775–780
Vallet D, Cantador I, Jose JM (2010) Personalizing web search with folksonomy-based user and document profiles. In: Advances in information retrieval, ed: Springer, pp 420–431
Varelas G, Voutsakis E, Raftopoulou P, Petrakis EG, Milios EE (2005) Semantic similarity methods in wordNet and their application to information retrieval on the web. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, (2005), pp 10–16
Mohler M, Mihalcea R (2009) Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp 567–575
Liu G, Wang R, Buckley J, Zhou HM (2011) A WordNet-based semantic similarity measure enhanced by internet-based knowledge. In: SEKE, (2011), pp 175–178
Kannan V, Srinivasan G. Yet another way of ranking web documents based on semantic similarity
Bhatia S, Mitra P (2010) Adopting inference networks for online thread retrieval. In: AAAI, pp 1300–1305
Elsas JL, Carbonell JG (2009) It pays to be picky: an evaluation of thread retrieval in online forums. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 714–715
Jain AK, Dubes RC (1988) Algorithms for clustering data, vol 6. Prentice Hall, Englewood Cliffs
Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. WordNet Electron Lex Database 49:265–283
Meng L, Huang R, Gu J (2013) A review of semantic similarity measures in wordnet. Int J Hybrid Inf Technol 6:1–12
Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EG, Milios E (2006) Information retrieval by semantic similarity. Int J Semantic Web Inf Syst 2:55–73
Pasca M, Harabagiu S (2001) The informative role of WordNet in open-domain question answering. In: Proceedings of NAACL-01 Workshop on WordNet and Other Lexical Resources, pp 138–143
Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol 1, pp 752–762
Corley C, Mihalcea R (2005) Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp 13–18
Tari L, Tu PH, Lumpkin B, Leaman R, Gonzalez G, Baral C (2007) Passage relevancy through semantic relatedness. In: TREC
Chahal P, Singh M, Kumar S (2013) Ranking of web documents using semantic similarity. In: International Conference on Information Systems and Computer Networks (ISCON), pp 145–150
kralja Aleksandra B. The role of semantic similarity for intelligent question routing
Seo J, Croft WB, Smith DA (2011) Online community search using conversational structures. Inf Retr 14:547–571
Faisal ChMS, Daud A, Usman A (2017) Expert ranking using reputation and answer quality of co-existing users. Int Arab J Inf Technol 14(2)
Cho JH, Sondhi P, Zhai C, Schatz BR (2014) Resolving healthcare forum posts via similar thread retrieval. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 33–42
Jeon J, Croft WB, Lee JH, Park S (2006) A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (2006), pp 228–235
Lee J-T, Yang M-C, Rim H-C (2014) Discovering high-quality threaded discussions in online forums. J Comput Sci Technol 29:519–531
Wang GA, Wang HJ, Li J, Fan W (2014) Mining knowledge sharing processes in online discussion forums. In: System Sciences (HICSS), 2014 47th Hawaii International Conference on, 2014, pp 3898–3907
Gottipati S, Lo D, Jiang J (2011) Finding relevant answers in software forums. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, (2011), pp 323–332
Wang H, Wang C, Zhai C, Han J (2011) Learning online discussion structures by conditional random fields.In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2011), pp 435–444
Zhu T, Wang B, Wu B, Zhu C (2012) Topic correlation and individual influence analysis in online forums. Expert Syst Appl 39:4222–4232
Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L (2014) Syntactic n-grams as machine learning features for natural language processing. Expert Syst Appl 41:853–860
Kim SN, Wang L, Baldwin T (2010) Tagging and linking web forum posts. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, (2010), pp 192–202
Albaham AT, Salim N, Adekunle OI (2014) Leveraging post level quality indicators in online forum thread retrieval. In: Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), (2014), pp 417–425
Deepak P, Visweswariah K. Unsupervised solution post identification from discussion forums
Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) Cqarank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, (2013), pp 99–108
Hong L, Davison BD (2009) A classification-based approach to question answering in discussion boards.In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 171–178
John BM, Chua AY-K, Goh DH-L (2011) What makes a high-quality user-generated answer? Internet Comput IEEE 15:66–71
Toba H, Ming Z-Y, Adriani M, Chua T-S (2014) Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Inf Sci 261:101–115
Li Y-M, Liao T-F, Lai C-Y (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manag 48:978–994
Wang XJ, Tu X, Feng D, Zhang L (2009) Ranking community answers by modeling question-answer relationships via analogical reasoning. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2009), pp 179–186
Ren Z, Ma J, Wang S, Liu Y (2011) Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, (2011), pp 879–884
Sack W (2003) Conversation map: a content-based Usenet newsgroup browser. In: From Usenet to CoWebs, ed: Springer, 2003, pp 92–109
Shi L, Sun B, Kong L, Zhang Y (2009) Web forum Sentiment analysis based on topics. In: Computer and Information Technology, 2009. CIT’09. Ninth IEEE International Conference on 2009:148–153
Kardan AA, Ebrahimi M (2013) A novel approach to hybrid recommendation systems based on association rules mining for content recommendation in asynchronous discussion groups. Inf Sci 219:93–110
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller KJ (1990) Introduction to wordnet: an on-line lexical database. Int J Lexicogr 3:235–244
Xu Z, Chen M, Weinberger K, Sha F (2012) An alternative text representation to TF-IDF and Bag-of-Words. In: Proceedings of 21st ACM Conference of Information and Knowledge Management (CIKM), (2012)
Grozin VA, Gusarova NF, Dobrenko NV (2015) Feature selection for language independent text forum summarization. In: Knowledge engineering and semantic Web, ed: Springer, 2015, pp 63–71
Montague M, Aslam JA (2001) Relevance score normalization for metasearch. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, (2001), pp 427–433
Gopalan N, Batri K (2007) Adaptive selection of top-m retrieval strategies for data fusion in information retrieval. Int J Soft Comput 2:11–16
Fox EA, Shaw JA (1994) Combination of multiple searches. NIST Special Publication SP, pp 243–243
Biyani P, Bhatia S, Caragea C, Mitra P (2012) Thread specific features are helpful for identifying subjectivity orientation of online forum threads, in COLING, (2012), pp 295–310
Bhatia S, Biyani P, Mitra P (2012) Classifying user messages for managing web forum data
Kardan AA, Omidvar A, Behzadi M (2012) Context based expert finding in online communities using social network analysis. Int J Comput Sci Res Appl 2:79–88
Shah C, Pomerantz J (2010) Evaluating and predicting answer quality in community QA. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, (2010), pp 411–418
Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. Ann Arbor MI 48113:161–175
Kumar N, Srinathan K (2008) Automatic keyphrase extraction from scientific documents using N-gram filtration technique. In: Proceedings of the Eighth ACM Symposium on Document Engineering, (2008), pp 199–208
Shah U, Finin T, Joshi A, Cost RS, Matfield J (2002) Information retrieval on the semantic web, in Proceedings of the Eleventh International Conference on Information and Knowledge Management, (2002), pp 461–468
Wang X, McCallum A, Wei X (2007) Topical n-grams: phrase and topic discovery, with an application to information retrieval. In: Data mining, (2007). ICDM 2007. Seventh IEEE International Conference on 2007:697–702
Baldwin T, Martinez D, Penman RB (2007) Automatic thread classification for Linux user forum information access. In: Proceedings of the Twelfth Australasian Document Computing Symposium (ADCS 2007), 2007, pp 72–9
Duan H, Zhai C (2011) Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: Advances in information retrieval, ed: Springer, (2011), pp 350–361
Lapata M (2006) Automatic evaluation of information ordering: Kendall’s tau. Comput Linguistics 32:471–484
Rijsbergen CJV (1979) Information retrieval. Butterworth-Heinemann, Newton
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2013R1A1A2061978).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Faisal, C.M.S., Daud, A., Imran, F. et al. A novel framework for social web forums’ thread ranking based on semantics and post quality features. J Supercomput 72, 4276–4295 (2016). https://doi.org/10.1007/s11227-016-1839-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1839-z