Abstract
Distributed SPARQL queries enable users to retrieve information by exploiting the increasing amount of linked data being published. However, industrial-strength distributed SPARQL query processing is still at its early stage for efficiently answering queries. Previous research shows that it is possible to apply methods from graph theory to optimize the performance of distributed SPARQL. In this paper we describe a framework that can simulate arbitrary RDF data networks to evaluate different approaches of distributed SPARQL query processing. Using this framework we further explore the graph traversal algorithms for distributed SPARQL optimization. We present an implementation of a Minimum-Spanning-Tree-based (MST-based) algorithm for distributed SPARQL processing, the performance of which is compared to other approaches using this evaluation framework. The contribution of this paper is to show that a MST-based approach seems to perform much better than other non graph-traversal-based approaches, and to provide an evaluation framework for evaluating distributed SPARQL processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bernstein, A., Kiefer, C., Stocker, M.: OptARQ: A SPARQL Optimization Approach based on Triple Pattern Selectivity Estimation (2007)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)
Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal On Semantic Web and Information Systems-Special Issue on Scalability and Performance of Semantic Web Systems (2009)
Bouquet, P., Ghidini, C., Serafini, L.: Querying the Web of Data: A Formal Approach. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 291–305. Springer, Heidelberg (2009)
Correndo, G., Salvadores, M., Millard, I., Glaser, H., Shadbolt, N.: SPARQL query rewriting for implementing data integration over linked data. In: Proceedings of the 2010 EDBT Workshops, pp. 1–11. ACM (2010)
Cyganiak, R.: A relational algebra for SPARQL. Digital Media Systems Laboratory, HP Laboratories Bristol, pp. 2005–170 (2005)
Fletcher, G., Beck, P.: Scalable indexing of RDF graphs for efficient join processing. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, pp. 1513–1516. ACM (2009)
Gray, A., Gray, N., Ounis, I.: Can RDB2RDF Tools Feasibily Expose Large Science Archives for Data Integration? In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 491–505. Springer, Heidelberg (2009)
Haas, L., Kossmann, D., Wimmers, E., Yang, J.: Optimizing queries across diverse data sources. In: Proceedings of The International Conference on Very Large Data Bases, pp. 276–285. Citeseer (1997)
Haase, P., Mathäß, T.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, pp. 1–9. ACM (2010)
Hartig, O., Heese, R.: The SPARQL Query Graph Model for Query Optimization. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 564–578. Springer, Heidelberg (2007)
Kossmann, D.: The state of the art in distributed query processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)
Ladwig, G., Tran, T.: Linked Data Query Processing Strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)
McGlothlin, J., Khan, L.: RDFJoin: A Scalable Data Model for Persistence and Efficient Querying of RDF Datasets. Database (2009)
Obermeier, L., Nixon, L.: A Cost Model for Querying Distributed RDF-Repositories with SPARQL. In: Proceedings of the Workshop on Advancing Reasoning on the Web: Scalability and Commonsense Tenerife. Citeseer (2008)
Özsu, M.: Distributed and parallel database systems. ACM Computing Surveys (CSUR), 1–21 (1996)
Özsu, M., Valduriez, P.: Principles of distributed database systems. Prentice Hall (1999)
Pérez, J., Arenas, M.: Semantics and Complexity of SPARQL. ACM Transactions on Database Systems, TODS (2009)
Prim, R.: Shortest connection networks and some generalizations. Bell System Technical Journal 36(6), 1389–1401 (1957)
Prud’Hommeaux, E., Seaborne, A.: SPARQL query language for RDF (2008)
Quilitz, B.: Querying Distributed RDF Data Sources with SPARQL. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)
Schmidt, M., Hornung, T., Küchlin, N., Lausen, G., Pinkel, C.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 82–97. Springer, Heidelberg (2008)
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP2Bench: A SPARQL Performance Benchmark. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 222–233. IEEE (2009)
Schmidt, M., Meier, M., Lausen, G.: Foundations of SPARQL query optimization. In: Proceedings of the 13th International Conference on Database Theory, pp. 4–33. ACM (2010)
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., Reynolds, D.: SPARQL basic graph pattern optimization using selectivity estimation. In: Proceeding of the 17th International Conference on World Wide Web, pp. 595–604. ACM (2008)
Stuckenschmidt, H., Vdovjak, R., Houben, G., Broekstra, J.: Index structures and algorithms for querying distributed RDF repositories. In: Proceedings of the 13th International Conference on World Wide Web, pp. 631–639. ACM (2004)
Vandervalk, B.P., McCarthy, E.L., Wilkinson, M.D.: Optimization of Distributed SPARQL Queries Using Edmonds’ Algorithm and Prim’s Algorithm. In: International Conference on Computational Science and Engineering, CSE 2009, vol. 1, pp. 330–337. IEEE (2009)
Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D., et al.: Efficient RDF storage and retrieval in Jena2. In: Proceedings of SWDB, vol. 3, pp. 7–8 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Tiropanis, T., Davis, H.C. (2012). Evaluating Graph Traversal Algorithms for Distributed SPARQL Query Optimization. In: Pan, J.Z., et al. The Semantic Web. JIST 2011. Lecture Notes in Computer Science, vol 7185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29923-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-29923-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29922-3
Online ISBN: 978-3-642-29923-0
eBook Packages: Computer ScienceComputer Science (R0)