Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses | IGI Global Scientific Publishing
Reference Hub4
Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses

Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses

Swathi Kurunji, Tingjian Ge, Xinwen Fu, Benyuan Liu, Cindy X. Chen
Copyright: © 2013 |Volume: 5 |Issue: 4 |Pages: 18
ISSN: 1938-0259|EISSN: 1938-0267|EISBN13: 9781466635715|DOI: 10.4018/ijghpc.2013100108
Cite Article Cite Article

MLA

Kurunji, Swathi, et al. "Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses." IJGHPC vol.5, no.4 2013: pp.113-130. https://doi.org/10.4018/ijghpc.2013100108

APA

Kurunji, S., Ge, T., Fu, X., Liu, B., & Chen, C. X. (2013). Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses. International Journal of Grid and High Performance Computing (IJGHPC), 5(4), 113-130. https://doi.org/10.4018/ijghpc.2013100108

Chicago

Kurunji, Swathi, et al. "Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses," International Journal of Grid and High Performance Computing (IJGHPC) 5, no.4: 113-130. https://doi.org/10.4018/ijghpc.2013100108

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In this paper, the authors present storage structures, PK-map and Tuple-index-map, to improve the performance of query execution and inter-node communication in Cloud Data Warehouses. Cloud Data Warehouses require Read-Optimized databases because large amount of historical data are integrated on a regular basis to facilitate analytical applications for report generation, future analysis, and decision-making. This frequent data integration can grow the data size rapidly and hence there is a need to allocate resource dynamically on demand. As resource is scaled-out in the cloud environment, the number of nodes involved in the execution of a query increases. This in turn increases the number of inter-node communications. In queries, join operation between two different tables are most common. To perform the join operation of a query in the cloud environment, data need to be transferred among different nodes. This becomes critical when there is a huge amount of data (in Terabytes or Petabytes) stored across a large number of nodes. With the increase in number of nodes and amount of data, the size of the communication messages also increases, resulting in even increased bandwidth usage and performance degradation. In this paper, the authors show through extensive experiments using PlanetLab Cloud that their proposed storage structures PK-map and Tuple-index-map, and query execution algorithms improve the performance of join queries, decrease inter-node communication and workload in Cloud Data Warehouses.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global Scientific Publishing bookstore.