An Efficient, Extensible, Hardware-aware Indexing Kernel

Hamedani, Mohammad Sadoghi

An Efficient, Extensible, Hardware-aware Indexing Kernel

Files

Sadoghi_Hamedani_Mohammad_201306_PhD_thesis.pdf (5.03 MB)

Date

2013-06

Authors

Hamedani, Mohammad Sadoghi

Abstract

Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives.

This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms.

In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.

Keywords

Publish/Subscribe, Event Processing, Complex-Event Processing, Boolean Expression Indexing, Indexing Kernel, Data Structure, Data Indexing, Query Indexing, Storage Hierarchy, Index Maintenance, Compressed Matching, Delta Compression, Event Matching, Content-based Publish/Subscribe, FPGA, SSD, GPU, Multicore

URI

http://hdl.handle.net/1807/65515

Creative Commons

Attribution-NonCommercial 2.5 Canada

Creative Commons URI

http://creativecommons.org/licenses/by-nc/2.5/ca/

Collections

Doctoral Theses (2009 - )

Full item page

An Efficient, Extensible, Hardware-aware Indexing Kernel

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

ISSN

Creative Commons

Creative Commons URI

Collections