THC-DAT: a document analysis tool based on topic hierarchy and context information
Abstract
Purpose
The purpose of this paper is to propose a novel within-document analysis tool (DAT) topic hierarchy and context-based document analysis tool (THC-DAT) which enables users to interactively analyze any multi-topic document based on fine-grained and hierarchical topics automatically extracted from it. THC-DAT used hierarchical latent Dirichlet allocation method and took the context information into account so that it can reveal the relationships between latent topics and related texts in a document.
Design/methodology/approach
The methodology is a case study. The authors reviewed the related literature first, then utilized a general “build and test” research model. After explaining the model, interface and functions of THC-DAT, a case study was presented using a scholarly paper that was analyzed with the tool.
Findings
THC-DAT can organize and serve document topics and texts hierarchically and context based, which overcomes the drawbacks of traditional DATs. The navigation, browse, search and comparison functions of THC-DAT enable users to read, search and analyze multi-topic document efficiently and effectively.
Practical implications
It can improve the document organization and services in digital libraries or e-readers, by helping users to interactively read, search and analyze documents efficiently and effectively, exploringly learn about unfamiliar topics with little cognitive burden, or deepen their understanding of a document.
Originality/value
This paper designs a tool THC-DAT to analyze document in a THC way. It contributes to overcoming the coarse-analysis drawbacks of existing within-DATs.
Keywords
Acknowledgements
The authors gratefully acknowledge the financial support for this work provided by National Natural Science Foundation of China (No:71303089, 71273195 and 71420107026) and the National Basic Research Program of China (973 Program, No: 904171200).
Citation
Chen, J., Wang, T.T. and Lu, Q. (2016), "THC-DAT: a document analysis tool based on topic hierarchy and context information", Library Hi Tech, Vol. 34 No. 1, pp. 64-86. https://doi.org/10.1108/LHT-07-2015-0074
Publisher
:Emerald Group Publishing Limited
Copyright © 2016, Emerald Group Publishing Limited