Abstract
There is an urgent need to develop new text mining solutions to tackle exponential growth in text data. Problem sizes are increasing day by day by due to the addition of new text documents. Grid aware text mining is one of the solutions for knowledge extraction from such large volume of text. Part of speech (POS) tagging is an important preprocessing task in text mining. But tagging algorithms working on a very large document collection take very long time on conventional computers to produce results. In this paper we present a framework for parallel implementation of part of speech tagging for text mining using grid computing. Globus Toolkit, which is a middleware for scientific and data intensive grid applications, is used for developing this framework in grid environment. Experimental results show that this model significantly reduces the part of speech tagging time for text mining. This model can be integrated into grid-based text mining tool, helping to improve the overall performance of the text mining process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lopes, M.C., Costa, M.C.A., Ebecken, N.F.F.: Text Mining. In: Rezende, S.O. (ed.) Intelligent Systems: Foundations and Applications (in Portuguese). Editora Manole Ltda (2002)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York (1983)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press Books, New York (1999)
Kao, A., Poteet, S.R.: Natural Language Processing and Text Mining. Springer, Heidelberg (2007)
Hearst, M.A.: Untangling text data mining. In: Proceedings of the 37th Annual Meeting on Computational Linguistics, pp. 3–10. Association for Computational Linguistics (1999)
Konchady, M.: Text Mining Application Programming. Charles River Media, Hingham (2006)
Kudo, S., Bies, A., Libeman, M., Mandel, M., McDonald, R., Palmar, R., Schein, A., Ungar, L.: Integrated annotation for biomedical information extraction. In: Proceedings of HLT/NAACL 2004 (2004)
Tateisi, Y., Tsujii, J.: Part-of-speech annotation of biology research abstracts. In: Proceedings of 4th International Conference on Language Resource and Evaluation (LREC 2004), pp. 1267–1270 (2004)
The Globus Toolkit, http://www.globus.org/toolkit/
GT4 Data Management, http://www.globus.org/toolkit/docs/4.0/data/
The WS-Resource Framework, http://www.globus.org/wsrf/
Replica Location Service, http://www.globus.org/toolkit/data/rls/
LIGO Scientific Collaboration Research Group: Ligo Data Replicator, http://www.lsc-group.phys.uwm.edu/LDR/
Chervenak, A., Schuler, R., Kesselman, C., Koranda, S., Moe, B.: Wide area data replication for scientific collaborations. In: Proceedings of 6th IEEE/ACM International Workshop on Grid Computing, Grid 2005 (November 2005)
Metadata Catalog Service, http://www.globus.org/grid_software/data/mcs.php
GT 4.0: Security: Pre-Web Services Authentication and Authorization, http://www.globus.org/toolkit/docs/4.0/security/prewsaa/
Ninomiya, T., Torisawa, K., Tsujii, J.: An Agent-based Parallel HPSG Parser for Shared-memory Parallel Machines. Journal of Natural Language Processing 8, Ref number 1, 21–48 (2001) ISSN 1340761
Qin, X.: Performance Comparisons of Load Balancing Algorithms for I/O-Intensive Workloads on Clusters, July 2006. Journal of Network and Computer Applications (July 2006)
Gonzalez-Velez, H.: Self-adaptive skeletal task farm for computational grids. Parallel Computing 32(7-8), 479–490 (2006)
Part-of-Speech tagging, http://en.wikipedia.org/wiki/Part-of-speech_tagging
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, N., Kumar, S., Kumar, P. (2011). Parallel Implementation of Part of Speech Tagging for Text Mining Using Grid Computing. In: Abraham, A., Lloret Mauri, J., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22709-7_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-22709-7_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22708-0
Online ISBN: 978-3-642-22709-7
eBook Packages: Computer ScienceComputer Science (R0)