Abstract
Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs as states and JavaScript events execution as transitions. This approach fails when used with “real-life”, complex RIAs, because the size of the produced model is much too large to be practical. In this paper, we propose a new method to crawl AJAX-based RIAs in an efficient manner by detecting “components”, which are areas of the DOM that are independent from each other, and by crawling each component separately. This leads to a dramatic reduction of the required state space for the model, without loss of content coverage. Our method does not require prior knowledge of the RIA nor predefined definition of components. Instead, we infer the components by observing the behavior of the RIA during crawling. Our experimental results show that our method can index quickly and completely industrial RIAs that are simply out of reach for traditional methods.
Chapter PDF
Similar content being viewed by others
References
Fraternali, P., Rossi, G., Sánchez-Figueroa, F.: Rich internet applications. IEEE Internet Computing 14(3), 9–12 (2010)
Duda, C., Frey, G., Kossmann, D., Zhou, C.: Ajaxsearch: crawling, indexing and searching web 2.0 applications. Proceedings of the VLDB Endowment 1(2), 1440–1443 (2008)
Duda, C., Frey, G., Kossmann, D., Matter, R., Zhou, C.: Ajax crawl: making ajax applications searchable. In: ICDE 2009, pp. 78–89. IEEE (2009)
Amalfitano, D., Fasolino, A.R., Tramontana, P.: Reverse engineering finite state machines from rich internet applications. In: Proceedings of WCRE, pp. 69–73. IEEE (2008)
Amalfitano, D., Fasolino, A.R., Tramontana, P.: Rich internet application testing using execution trace data. In: Proceedings of ICSTW, pp. 274–283. IEEE (2010)
Peng, Z., He, N., Jiang, C., Li, Z., Xu, L., Li, Y., Ren, Y.: Graph-based ajax crawl: Mining data from rich internet applications. In: Proceedings of ICCSEE, vol. 3, pp. 590–594 (March 2012)
Dincturk, M.E., Jourdan, G.V., Bochmann, G.v., Onut, I.V.: A model-based approach for crawling rich internet applications. ACM Transactions on the WEB (to appear, 2014)
Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v., Onut, I.V.: Model-based rich internet applications crawling:menu and probability models. Journal of Web Engineering 13(3) (to appear, 2014)
Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Jourdan, G.-V., Bochmann, G.v., Onut, I.V.: Building rich internet applications models: Example of a better strategy. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 291–305. Springer, Heidelberg (2013)
Faheem, M., Senellart, P.: Intelligent and adaptive crawling of web applications for web archiving. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 306–322. Springer, Heidelberg (2013)
Amalfitano, D., Fasolino, A.R., Polcaro, A., Tramontana, P.: The dynaria tool for the comprehension of ajax web applications by dynamic analysis. In: Innovations in Systems and Software Engineering, pp. 1–17 (2013)
Doush, I.A., Alkhateeb, F., Maghayreh, E.A., Al-Betar, M.A.: The design of ria accessibility evaluation tool. Advances in Engineering Software 57, 1–7 (2013)
Mesbah, A., van Deursen, A.: Invariant-based automatic testing of ajax user interfaces. In: ICSE, pp. 210–220 (May 2009)
Amalfitano, D., Fasolino, A.R., Tramontana, P.: A gui crawling-based technique for android mobile application testing. In: Proceedings of ICSTW, pp. 252–261. IEEE Computer Society, Washington, DC (2011)
Amalfitano, D., Fasolino, A.R., Tramontana, P., De Carmine, S., Memon, A.M.: Using gui ripping for automated testing of android applications. In: Proceedings of ASE, pp. 258–261. ACM, New York (2012)
Erfani, M., Mesbah, A.: Reverse engineering ios mobile applications. In: Proceedings of WCRE (2012)
Mesbah, A., Bozdag, E., van Deursen, A.: Crawling ajax by inferring user interface state changes. In: Proceedings of ICWE, pp. 122–134. IEEE (2008)
Ayoub, K., Aly, H., Walsh, J.: Dom based page uniqueness identification, canada patent ca2706743a1 (2010)
Milani Fard, A., Mesbah, A.: Feedback-directed exploration of web applications to derive test models. In: Proceedings of ISSRE, 10 pages. IEEE Computer Society (2013)
Choudhary, S., Dincturk, M.E., Mirtaheri, S.M., Moosavi, A., Bochmann, G.v., Jourdan, G.-V., Onut, I.-V.: Crawling rich internet applications: the state of the art. In: CASCON, pp. 146–160 (2012)
Mirtaheri, S.M., Dinçtürk, M.E., Hooshmand, S., Bochmann, G.v., Jourdan, G.-V., Onut, I.V.: A brief history of web crawlers. In: Proceedings of CASCON, pp. 40–54. IBM Corp. (2013)
Bezemer, C.P., Mesbah, A., van Deursen, A.: Automated security testing of web widget interactions. In: Proceedings of ESEC/FSE, pp. 81–90. ACM (2009)
Chen, A.Q.: Widget identification and modification for web 2.0 access technologies (wimwat). ACM SIGACCESS Accessibility and Computing (96), 11–18 (2010)
Crescenzi, V., Mecca, G., Paolo, Merialdo, et al.: Roadrunner: Towards automatic data extraction from large web sites. In: VLDB, vol. 1, pp. 109–118 (2001)
Harel, D.: Statecharts: A visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (1987)
Peng, Z., He, N., Jiang, C., Li, Z., Xu, L., Li, Y., Ren, Y.: Graph-based ajax crawl: Mining data from rich internet applications. In: Proceedings of ICCSEE, vol. 3, pp. 590–594. IEEE (2012)
Moosavi, A.: Component-based crawling of complex rich internet applications. Master’s thesis, EECS - University of Ottawa (2014), http://ssrg.site.uottawa.ca/docs/Ali-Moosavi-Thesis.pdf
Benjamin, K., Bochmann, G.v., Jourdan, G.-V., Onut, I.-V.: Some modeling challenges when testing rich internet applications for security. In: Proceedings of ICSTW, pp. 403–409. IEEE Computer Society, Washington, DC (2010)
Choudhary, S., Dincturk, M.E., Bochmann, G.v., Jourdan, G.-V., Onut, I.V., Ionescu, P.: Solving some modeling challenges when testing rich internet applications for security. In: Proceedings of ICST, pp. 850–857 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Moosavi, A., Hooshmand, S., Baghbanzadeh, S., Jourdan, GV., Bochmann, G.V., Onut, I.V. (2014). Indexing Rich Internet Applications Using Components-Based Crawling. In: Casteleyn, S., Rossi, G., Winckler, M. (eds) Web Engineering. ICWE 2014. Lecture Notes in Computer Science, vol 8541. Springer, Cham. https://doi.org/10.1007/978-3-319-08245-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-08245-5_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08244-8
Online ISBN: 978-3-319-08245-5
eBook Packages: Computer ScienceComputer Science (R0)