Abstract
The size of the internet is very large and it has grown enormously, search engines are the tools for World Wide Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing, known as web crawling. In this paper we reviewed various web crawlers and their performance attributes. We study mobile and parallel web crawling approach that makes web crawling system more effective and efficient. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce net- work load which, in turn, can improve the performance of the crawling process. The major advantage of parallel crawling is that as the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We identify fundamental issues related to migrating parallel crawling and also propose metrics to evaluate a migrating parallel crawler. Lastly, we summarize the web crawlers and their performance attributes that effects the process of web crawling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sullivan, D.: Search Engine Watch. Mecklermedia (1998)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Stanford University, Stanford, CA, Technical Report (1997)
McBryan, O.A.: GENVL and WWW: Tools for Taming the Web. In: Proceedings of the First International Conference on the World Wide Web, Geneva, Switzerland (1994)
Kahle, B.: Archiving the Internet. Scientific American (1996)
Gosling, J., McGilton, H.: The Java Language Environment. Sun Microsystems, Mountain View, CA, White Paper (April 1996)
White, J.E.: Mobile Agents. MIT Press, Cambridge (1996)
Harrison, C.G., Chess, D.M., Kershenbaum, A.: Mobile Agents: Are they a good idea? IBM Research Division, T.J. Watson Research Center, White Plains, NY, Research Report (September 1996)
Nwana, H.S.: Software Agents: An Overview. Knowledge Engineering Review 11, 3 (1996)
Wooldridge, M.: Intelligent Agents: Theory and Practice. Knowledge Engineering Review 10, 2 (1995)
Maes, P.: Modeling Adaptive Autonomous Agents. MIT Media Laboratory, Cambridge, MA, Research Report (May 1994)
Maes, P.: Intelligent Software. Scientific American 273, 3 (1995)
Finin, T., Labrou, Y., Mayfield, J.: KQML as an agent communication language. University of Maryland Baltimore County, Baltimore, MD (September 1994)
Hammer, J., Fiedler, J.: Using Mobile Crawlers to Search the Web Efficiently (2000)
Boldi, P., Codenotti, B., Santini, M., Vigna, S.: UbiCrawler: A Scalable Fully Distributed Web Crawler (2002)
Sharma, A.K., Gupta, J.P., Aggarwal, D.P.: PARCAHYDE: An Architecture of a Parallel Crawler based on Augmented HypertextDocuments (2010)
Cho, J., Garcia-Molina, H.: Parallel crawlers. In: Proceedings of the Eleventh International World Wide Web Conference, pp. 124–135 (2002)
Heydon, A., Najork, M.: Mercator: A scalable, extensible web crawler. World Wide Web 2(4), 219–229 (1999)
Singh, A., Singh, K.K.: Faster and Efficient Web Crawling with Parallel Migrating Web Crawler (2010)
Wu, M., Lai, J.: The Research and Implementation of parallel web crawler in cluster. In: International Conference on Computational and Information Sciences (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faizan Farooqui, M., Rizwan Beg, M., Qasim Rafiq, M. (2013). A Critical Review of Migrating Parallel Web Crawler. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds) Advances in Computing and Information Technology. Advances in Intelligent Systems and Computing, vol 177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31552-7_63
Download citation
DOI: https://doi.org/10.1007/978-3-642-31552-7_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31551-0
Online ISBN: 978-3-642-31552-7
eBook Packages: EngineeringEngineering (R0)