{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T04:33:10Z","timestamp":1730349190656,"version":"3.28.0"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T00:00:00Z","timestamp":1697068800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T00:00:00Z","timestamp":1697068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"Abstract<\/jats:title>Recent technological advancements have led to a significant increase in digital documents. A document\u2019s key information is generally represented by the keyphrases that provide the abstract description contained therein. With traditional keyphrase techniques, however, it is difficult to identify relevant information based on context. Several studies in the literature have explored graph-based unsupervised keyphrase extraction techniques for automatic keyphrase extraction. However, there is only limited existing work that embeds contextual information for keyphrase extraction. To understand keyphrases, it is essential to grasp both the concept and the context of the document. Hence, a hybrid unsupervised keyphrase extraction technique is presented in this paper called ContextualRank, which embeds contextual information such as sentences and paragraphs that are relevant to keyphrases in the keyphrase extraction process. We propose a hierarchical topic modeling approach for topic discovery based on aggregating the extracted keyphrases from ContextualRank. Based on the evaluation on two short-text datasets and one long-text dataset, ContextualRank obtains remarkable improvements in performance over other baselines in the short-text datasets.<\/jats:p>","DOI":"10.1186\/s40537-023-00833-1","type":"journal-article","created":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T11:04:13Z","timestamp":1697108653000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model"],"prefix":"10.1186","volume":"10","author":[{"given":"Hung","family":"Du","sequence":"first","affiliation":[]},{"given":"Srikanth","family":"Thudumu","sequence":"additional","affiliation":[]},{"given":"Antonio","family":"Giardina","sequence":"additional","affiliation":[]},{"given":"Rajesh","family":"Vasa","sequence":"additional","affiliation":[]},{"given":"Kon","family":"Mouzakis","sequence":"additional","affiliation":[]},{"given":"Li","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"John","family":"Chisholm","sequence":"additional","affiliation":[]},{"given":"Sanat","family":"Bista","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,12]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Hasan KS, Ng V. Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014; pp. 1262\u20131273.","key":"833_CR1","DOI":"10.3115\/v1\/P14-1119"},{"issue":"2","key":"833_CR2","doi-asserted-by":"publisher","first-page":"1339","DOI":"10.1002\/widm.1339","volume":"10","author":"E Papagiannopoulou","year":"2020","unstructured":"Papagiannopoulou E, Tsoumakas G. A review of keyphrase extraction. Wiley Interdiscip Rev Data Min Knowl Discov. 2020;10(2):1339.","journal-title":"Wiley Interdiscip Rev Data Min Knowl Discov"},{"issue":"2","key":"833_CR3","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1007\/s10844-019-00558-9","volume":"54","author":"Z Alami Merrouni","year":"2020","unstructured":"Alami Merrouni Z, Frikh B, Ouhbi B. Automatic keyphrase extraction: a survey and trends. J Intell Inform Syst. 2020;54(2):391\u2013424.","journal-title":"J Intell Inform Syst"},{"doi-asserted-by":"crossref","unstructured":"Hulth A. Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 2003; pp. 216\u2013223.","key":"833_CR4","DOI":"10.3115\/1119355.1119383"},{"doi-asserted-by":"publisher","unstructured":"Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG. Kea: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries. DL \u201999, pp. 254\u2013255. Association for Computing Machinery, New York, NY, USA 1999. https:\/\/doi.org\/10.1145\/313238.313437.","key":"833_CR5","DOI":"10.1145\/313238.313437"},{"doi-asserted-by":"crossref","unstructured":"Wu Y-fB, Li Q, Bot RS, Chen X. Domain-specific keyphrase extraction. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, 2005; pp. 283\u2013284.","key":"833_CR6","DOI":"10.1145\/1099554.1099628"},{"doi-asserted-by":"crossref","unstructured":"Shirakawa M, Hara T, Nishio S. N-gram idf: A global term weighting scheme based on information distance. In: Proceedings of the 24th International Conference on World Wide Web, 2015; pp. 960\u2013970.","key":"833_CR7","DOI":"10.1145\/2736277.2741628"},{"doi-asserted-by":"crossref","unstructured":"Ponte JM, Croft WB. A language modeling approach to information retrieval. In: ACM SIGIR Forum. ACM New York, NY, USA. 2017; vol. 51, pp. 202\u2013208.","key":"833_CR8","DOI":"10.1145\/3130348.3130368"},{"unstructured":"Page L, Brin S, Motwani R, Winograd T. The pagerank citation ranking: Bringing order to the web. Stanford InfoLab: Technical report; 1999.","key":"833_CR9"},{"unstructured":"Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, 2004; pp. 404\u2013411.","key":"833_CR10"},{"doi-asserted-by":"crossref","unstructured":"Litvak M, Last M. Graph-based keyword extraction for single-document summarization. In: Coling 2008: Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization, 2008; pp. 17\u201324.","key":"833_CR11","DOI":"10.3115\/1613172.1613178"},{"unstructured":"Bougouin A, Boudin F, Daille B. Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), 2013; pp. 543\u2013551.","key":"833_CR12"},{"doi-asserted-by":"crossref","unstructured":"Sterckx L, Demeester T, Deleu J, Develder C. Topical word importance for fast keyphrase extraction. In: Proceedings of the 24th International Conference on World Wide Web. 2015; pp. 121\u2013122.","key":"833_CR13","DOI":"10.1145\/2740908.2742730"},{"doi-asserted-by":"crossref","unstructured":"Florescu C, Caragea C. Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2017; pp. 1105\u20131115.","key":"833_CR14","DOI":"10.18653\/v1\/P17-1102"},{"doi-asserted-by":"crossref","unstructured":"Boudin F. Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint. 2018; arXiv:1803.08721.","key":"833_CR15","DOI":"10.18653\/v1\/N18-2105"},{"doi-asserted-by":"crossref","unstructured":"Bennani-Smires K, Musat C, Hossmann A, Baeriswyl M, Jaggi M. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint. 2018; arXiv:1801.04470.","key":"833_CR16","DOI":"10.18653\/v1\/K18-1022"},{"key":"833_CR17","doi-asserted-by":"publisher","first-page":"10896","DOI":"10.1109\/ACCESS.2020.2965087","volume":"8","author":"Y Sun","year":"2020","unstructured":"Sun Y, Qiu H, Zheng Y, Wang Z, Zhang C. Sifrank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access. 2020;8:10896\u2013906.","journal-title":"IEEE Access"},{"doi-asserted-by":"crossref","unstructured":"Danesh S, Sumner T, Martin JH. Sgrank: combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics. 2015; pp. 117\u2013126.","key":"833_CR18","DOI":"10.18653\/v1\/S15-1013"},{"issue":"Jan","key":"833_CR19","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3(Jan):993\u20131022.","journal-title":"J Mach Learn Res"},{"issue":"2","key":"833_CR20","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1109\/TPAMI.2014.2318728","volume":"37","author":"J Paisley","year":"2014","unstructured":"Paisley J, Wang C, Blei DM, Jordan MI. Nested hierarchical dirichlet processes. IEEE Trans Pattern Anal Mach Intell. 2014;37(2):256\u201370.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"doi-asserted-by":"crossref","unstructured":"Viegas F, Cunha W, Gomes C, Pereira A, Rocha L, Goncalves M. Cluhtm-semantic hierarchical topic modeling based on cluwords. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020; pp. 8138\u20138150.","key":"833_CR21","DOI":"10.18653\/v1\/2020.acl-main.724"},{"unstructured":"Zesch T, Gurevych I. Approximate matching for evaluating keyphrase extraction. In: Proceedings of the International Conference RANLP-2009. 2009; pp. 484\u2013489.","key":"833_CR22"},{"doi-asserted-by":"crossref","unstructured":"Hulth A, Karlgren J, Jonsson A, Bostr\u00f6m H, Asker L. Automatic keyword extraction using domain knowledge. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; 2001. pp. 472\u2013482.","key":"833_CR23","DOI":"10.1007\/3-540-44686-9_47"},{"unstructured":"Frank E, Paynter G, Witten I, Gutwin C, Nevill-Manning C. Domain-specific keyphrase extraction 1999.","key":"833_CR24"},{"doi-asserted-by":"crossref","unstructured":"Caragea C, Bulgarov F, Godea A, Gollapalli SD. Citation-enhanced keyphrase extraction from research papers: a supervised approach. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014; pp. 1435\u20131446.","key":"833_CR25","DOI":"10.3115\/v1\/D14-1150"},{"doi-asserted-by":"crossref","unstructured":"Robertson SE, Walker S, Beaulieu M, Gatford M, Payne A. Okapi at trec-4. Nist Special Publication Sp; 1996. 73\u201396.","key":"833_CR26","DOI":"10.6028\/NIST.SP.500-236.routing-city"},{"unstructured":"El-Beltagy SR, Rafea A. Kp-miner: participation in semeval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation. 2010; pp. 190\u2013193.","key":"833_CR27"},{"unstructured":"Kang Y-B, Du H, Forkan ARM, Jayaraman PP, Aryani A, Sellis T. Expfinder: an ensemble expert finding model integrating n-gram vector space model and $$\\mu$$co-hits. arXiv preprint. 2021. arXiv:2101.06821.","key":"833_CR28"},{"doi-asserted-by":"crossref","unstructured":"Tomokiyo T, Hurst M. A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 Workshop on multiword expressions: analysis, acquisition and treatment. 2003; pp. 33\u201340.","key":"833_CR29","DOI":"10.3115\/1119282.1119287"},{"key":"833_CR30","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1016\/j.ins.2019.09.013","volume":"509","author":"R Campos","year":"2020","unstructured":"Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A. Yake! keyword extraction from single documents using multiple local features. Inform Sci. 2020;509:257\u201389.","journal-title":"Inform Sci"},{"doi-asserted-by":"crossref","unstructured":"Wan X, Xiao J. Collabrank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). 2008; pp. 969\u2013976.","key":"833_CR31","DOI":"10.3115\/1599081.1599203"},{"key":"833_CR32","first-page":"855","volume":"8","author":"X Wan","year":"2008","unstructured":"Wan X, Xiao J. Single document keyphrase extraction using neighborhood knowledge. AAAI. 2008;8:855\u201360.","journal-title":"AAAI"},{"unstructured":"Wan X, Yang J, Xiao J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 2007; pp. 552\u2013559.","key":"833_CR33"},{"key":"833_CR34","doi-asserted-by":"publisher","DOI":"10.1016\/j.psychres.2021.114135","volume":"304","author":"J Sarzynska-Wawer","year":"2021","unstructured":"Sarzynska-Wawer J, Wawer A, Pawlak A, Szymanowska J, Stefaniak I, Jarkiewicz M, Okruszek L. Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 2021;304: 114135.","journal-title":"Psychiatry Res"},{"doi-asserted-by":"crossref","unstructured":"Liang X, Wu S, Li M, Li Z. Unsupervised keyphrase extraction by jointly modeling local and global context. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021; pp. 155\u2013164.","key":"833_CR35","DOI":"10.18653\/v1\/2021.emnlp-main.14"},{"unstructured":"Ding H, Luo X. Agrank: Augmented graph-based unsupervised keyphrase extraction. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. 2022; pp. 230\u2013239.","key":"833_CR36"},{"key":"833_CR37","first-page":"547","volume":"34","author":"Z Duan","year":"2021","unstructured":"Duan Z, Xu Y, Chen B, Wang C, Zhou M, et al. Topicnet: Semantic graph-guided topic discovery. Adv Neural Inform Process Syst. 2021;34:547.","journal-title":"Adv Neural Inform Process Syst"},{"key":"833_CR38","first-page":"231","volume-title":"Wordnet","author":"C Fellbaum","year":"2010","unstructured":"Fellbaum C. Wordnet. Dordrecht: Springer, Netherlands; 2010. p. 231\u201343."},{"doi-asserted-by":"crossref","unstructured":"Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014; pp. 1532\u20131543.","key":"833_CR39","DOI":"10.3115\/v1\/D14-1162"},{"key":"833_CR40","first-page":"427","volume":"2017","author":"A Joulin","year":"2017","unstructured":"Joulin A, Grave E, Mikolov PBT. Bag of tricks for efficient text classification. EACL. 2017;2017:427.","journal-title":"EACL"},{"unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint. 2018 arXiv:1810.04805.","key":"833_CR41"},{"doi-asserted-by":"crossref","unstructured":"Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S. Aligning books and movies: towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE International Conference on Computer Vision. 2015; pp. 19\u201327.","key":"833_CR42","DOI":"10.1109\/ICCV.2015.11"},{"doi-asserted-by":"crossref","unstructured":"Augenstein I, Das M, Riedel S, Vikraman L, McCallum A. Semeval 2017 task 10: scienceie-extracting keyphrases and relations from scientific publications. arXiv preprint. 2017 arXiv:1704.02853.","key":"833_CR43","DOI":"10.18653\/v1\/S17-2091"},{"doi-asserted-by":"crossref","unstructured":"Gollapalli SD, Caragea C. Extracting keyphrases from research papers using citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence. 2014; 28.","key":"833_CR44","DOI":"10.1609\/aaai.v28i1.8946"},{"unstructured":"Medelyan O, Witten IH, Milne D. Topic indexing with wikipedia. Proceedings of the AAAI WikiAI Workshop. 2008; 1:19\u201324.","key":"833_CR45"},{"doi-asserted-by":"crossref","unstructured":"Papineni K, Roukos S, Ward T, Zhu W-J. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002; pp. 311\u2013318.","key":"833_CR46","DOI":"10.3115\/1073083.1073135"},{"unstructured":"Lin C-Y. ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain. 2004. pp. 74\u201381. https:\/\/aclanthology.org\/W04-1013.","key":"833_CR47"},{"unstructured":"Boudin F. pke: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, Osaka, Japan. 2016; pp. 69\u201373. http:\/\/aclweb.org\/anthology\/C16-2015.","key":"833_CR48"},{"issue":"1","key":"833_CR49","first-page":"5485","volume":"21","author":"C Raffel","year":"2020","unstructured":"Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(1):5485\u2013551.","journal-title":"J Mach Learn Res"},{"key":"833_CR50","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners. Adv Neural Inform Process Syst. 2020;33:1877\u2013901.","journal-title":"Adv Neural Inform Process Syst"},{"unstructured":"Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, Barham P, Chung HW, Sutton C, Gehrmann S, et al. Palm: Scaling language modeling with pathways. arXiv preprint. 2022 arXiv:2204.02311.","key":"833_CR51"},{"unstructured":"Tay Y, Dehghani M, Tran VQ, Garcia X, Bahri D, Schuster T, Zheng HS, Houlsby N, Metzler D. Unifying language learning paradigms. arXiv preprint. 2022. arXiv:2205.05131.","key":"833_CR52"},{"unstructured":"Rae JW, Borgeaud S, Cai T, Millican K, Hoffmann J, Song F, Aslanides J, Henderson S, Ring R, Young S, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint. 2021. arXiv:2112.11446.","key":"833_CR53"},{"unstructured":"Borgeaud S, Mensch A, Hoffmann J, Cai T, Rutherford E, Millican K, Van Den Driessche GB, Lespiau J-B, Damoc B, Clark A, et al. Improving language models by retrieving from trillions of tokens. In: International Conference on Machine Learning. 2022; 2206\u201340PMLR.","key":"833_CR54"},{"doi-asserted-by":"crossref","unstructured":"Jawahar G, Sagot B, Seddah D. What does bert learn about the structure of language? In: ACL 2019-57th Annual Meeting of the Association for Computational Linguistics. 2019.","key":"833_CR55","DOI":"10.18653\/v1\/P19-1356"},{"unstructured":"Song M, Feng Y, Jing L. Utilizing bert intermediate layers for unsupervised keyphrase extraction. In: Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022). 2022; pp. 277\u2013281.","key":"833_CR56"},{"doi-asserted-by":"crossref","unstructured":"Zheng H, Lapata M. Sentence centrality revisited for unsupervised summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019; pp. 6236\u20136247.","key":"833_CR57","DOI":"10.18653\/v1\/P19-1628"},{"issue":"4","key":"833_CR58","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2023.103356","volume":"60","author":"Z Zhang","year":"2023","unstructured":"Zhang Z, Liang X, Zuo Y, Lin C. Improving unsupervised keyphrase extraction by modeling hierarchical multi-granularity features. Inform Process Manag. 2023;60(4): 103356.","journal-title":"Inform Process Manag"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00833-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-023-00833-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00833-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T18:35:47Z","timestamp":1730313347000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-023-00833-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,12]]},"references-count":58,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["833"],"URL":"https:\/\/doi.org\/10.1186\/s40537-023-00833-1","relation":{},"ISSN":["2196-1115"],"issn-type":[{"type":"electronic","value":"2196-1115"}],"subject":[],"published":{"date-parts":[[2023,10,12]]},"assertion":[{"value":"28 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"156"}}