Abstract
We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select andmaterialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.
Similar content being viewed by others
References
S. Agarwal, R. Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in 22th International Conference on Very Large Data Bases, Mumbai (Bomabi), India, 1996.
R. Agrawal, A. Gupta, and S. Sarawagi, “Modeling multidimensional databases,” in Thirteenth International Conference on Database Engineering, Birmingham, UK, 1997.
Arbor Systems, “Large-scale data warehousing using hyperion essbase OLAP technology,” Arbor Systems, White Paper 1997.
K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg CUBEs,” in ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.
S. Chaudhuri and U. Dayal, “Data warehousing and OLAP for decision support,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 507–508, 1997.
S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” SIGMOD Record, vol. 26, pp. 65–74, 1997.
C.Y. Chan and Y.E. Ioannidis, “Bitmap index design and evaluation,” in 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, USA, 1998.
E.F. Codd, S.B. Codd, and C.T. Salley, “Beyond decision support,” in Computer World, vol. 27, 1993, www.arborsoft.com/OLAP.html.
E.F. Codd, S.B. Codd, and C.T. Salley, “Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate,” Technical Report 1993.
D. Comer, “The ubiquitous Btree,” ACM Computing Surveys, vol. 11, pp. 121–137, 1979.
C.E. Dyreson, “Information retrieval from an incomplete data cube,” in Twenty-Second International Conference on Very Large Data Bases, Mumbai (Bombai), India, 1996.
L. Fu and J. Hammer, “CubiST: A new algorithm for improving the performance of ad-hoc OLAP queries,” 0ACM Third International Workshop on Data Warehousing and OLAP (DOLAP), Washington, DC, 2000.
S. Goil and A. Choudhary, “High performance OLAP and data mining on parallel computers,” Journal of Data Mining and Knowledge Discovery, vol. 1, pp. 391–417, 1997.
S. Goil and A. Choudhary, “PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining,” Journal of Parallel and Distributed Computing, vol. 61, pp. 285–321, 2001.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29–53, 1997.
H. Gupta and I. Mumick, “Selection of views to materialize under a maintenance cost constraint,” Stanford University, Technical Report, 1997.
V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing data cubes efficiently,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 25, pp. 205–216, 1996.
Information Advantage, “Business intelligence,” White Paper, 1998.
Informix Corp., “Informix red brick decision server,” 2001, http://www.informix.com/redbrick/.
T. Johnson and D. Shasha, “Some approaches to index design for cube forests,” Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, vol. 20, pp. 27–35, 1997.
W. Labio, D. Quass, and B. Adelberg, “Physical database design for data warehouses,” in International Conference on Database Engineering, Birmingham, England, 1997.
M. Lee and J. Hammer, “Speeding upwarehouse physical design using a randomized algorithm,” International Journal of Cooperative Information Systems (IJCIS)—Special Issue on Design and Management of Data Warehouses, vol. 10, pp. 327–354, 2001.
D. Lomet, “Bulletin of the technical committee on data engineering,” in Special Issue on Materialized Views and Data Warehousing, J. Widom (Ed.), IEEE Computer Society, 1995, vol. 18.
Microsoft Corp., “Microsoft SQL server,” Microsoft, Seattle, WA, White Paper.
MicroStrategy Inc., “The case for relational OLAP,” MicroStrategy, White Paper.
P. O'Neil and D. Quass, “Improved query performance with variant indexes,” SIGMOD Record (ACMSpecial Interest Group on Management of Data), vol. 26, pp. 38–49, 1997.
P.E. O'Neil, “Model 204 architecture and performance,” 2nd International Workshop on High Performance Transaction Systems, Asilomar, CA, 1987.
Oracle Corp., “Oracle express OLAP technology,” http://www.oracle.com/olap/index.html.
Oracle Corp., “Oracle express server documentation,” Oracle Corporation, Redwood Shores, CA, Documentation, August 2001.
Pilot Software Inc., “An introduction to OLAP multidimensional terminology and technology,” Pilot Software, Cambridge, MA, White Paper.
Redbrick Systems, “Decision-makers, business data and RISQL,” Informix, Los Gatos, CA, White Paper, 1997.
N. Roussopoulos, Y. Kotidis, and M. Roussopoulos, “Cubetree: Organization of and bulk updates on the data cube,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 89–99, 1997.
N. Roussopoulos and D. Leifker, “Direct spatial search on pictorial databases using packed R-trees,” in 1985 ACM SIGMOD International Conference on Management of Data, Austin, Texas, 1985.
R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” in 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, 1996.
J. Srivastava, J.S.E. Tan, and V.Y. Lum, “TBSAM: An access method for efficient processing of statistical queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 1, pp. 414–423, 1989.
Transaction Processing Performance Council, “The TPC BenchmarkTMH,” Transaction Processing Council, 2001, http://www.tpc.org/tpch/.
Transaction Processing Performance Council, “Transaction processing performance council,” 2001, http://www.tpc.org/.
W.P. Yan and P. Larson, “Eager aggregation and lazy aggregation,” in 21th International Conference on Very Large Data Bases, Zurich, Switzerland, 1995.
Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” SIGMOD Record (ACM Special Interest Group on Management of Data), vol. 26, pp. 159–170, 1997.
Y. Zhuge, H. Garcia-Molina, and J.L. Wiener, “Consistency algorithms for multi-source warehouse view maintenance,” Distributed and Parallel Databases, vol. 6, pp. 7–40, 1998.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hammer, J., Fu, L. CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees. Distributed and Parallel Databases 14, 221–254 (2003). https://doi.org/10.1023/A:1025537315785
Issue Date:
DOI: https://doi.org/10.1023/A:1025537315785