Abstract
Organizing scientific papers helps efficiently derive meaningful insights of the published scientific resources, enables researchers grasp rapid technological change and hence assists new scientific discovery. In this paper, we experiment text mining and data management of scientific publications for collecting and presenting useful information to support research. For efficient data management and fast information retrieval, four data storages are employed: a semantic repository, an index and search repository, a document repository and a graph repository, taking full advantage of their features and strength. The results show that the combination of these four repositories can effectively store and index the publication data with reliability and efficiency and hence supply meaningful information to support scientific research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
DrInventor. http://drinventor.eu/
pdfbox. https://pdfbox.apache.org/
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, July 2002
Thakker, D., Sman, T., Lakin, P.: GATE Jape Grammar Tutorial, Version 1.0, A, Pictures, UK (2009)
Microsoft Academic Search (MAS) API. http://academic.research.microsoft.com/
Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5(2), 199–220
Jin, L., Liu, L.: An ontology definition metamodel based ripple-effect analysis method for ontology evolution. In: Proceedings of 10th International Conference on Computer Supported Cooperative Work in Design, pp. 1–6. doi:10.1109/CSCWD.2006.253032
Fensel, D., Hendler, J., Lieberman, H., Wahlster, W., Berners-Lee, T.: Sesame: an architecture for storing and querying RDF data and schema information. In: MIT Press eBook Chapters: Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, pp. 197–222 (2005)
CouchDB. http://couchdb.apache.org/
Elasticsearch. https://www.elastic.co/products/elasticsearch
Grolinger, K., Higashino, W.A., Tiwari, A., Capretz, M.A.M.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput.: Adv. Syst. Appl. 2(22), 2–22 (2013). doi:10.1186/2192-113X-2-22
Elasticsearch Rivers. https://www.elastic.co/guide/en/elasticsearch/rivers/1.4/index.html
D3. http://d3js.org/
Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., Gleicher Serendip, M.: Topic model-driven visual exploration of text corpora. In: IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182 (2014)
Acknowledgments
The research is supported by Dr Inventor project {the European Union Seventh Framework Programme ([FP7/2007-2013]) Dr Inventor under grant agreement no. 611383} and CARRE project {the Seventh Framework Programme of European Commission – ICT under agreement of FP7-ICT-611140}.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wei, H. et al. (2016). Data Mining, Management and Visualization in Large Scientific Corpuses. In: El Rhalibi, A., Tian, F., Pan, Z., Liu, B. (eds) E-Learning and Games. Edutainment 2016. Lecture Notes in Computer Science(), vol 9654. Springer, Cham. https://doi.org/10.1007/978-3-319-40259-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-40259-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40258-1
Online ISBN: 978-3-319-40259-8
eBook Packages: Computer ScienceComputer Science (R0)