Abstract
Data mining, or Knowledge Discovery in Databases (KDD), is of little benefit to commercial enterprises unless it can be carried out efficiently on realistic volumes of data. Operational factors also dictate that KDD should be performed within the context of standard DBMS. Fortunately, relational DBMS have a declarative query interface (SQL) that has allowed designers of parallel hardware to exploit data parallelism efficiently. Thus, an effective approach to the problem of efficient KDD consists of arranging that KDD tasks execute on a parallel SQL server. In this paper we devise generic KDD primitives, map these to SQL and present some results of running these primitives on a commercially-available parallel SQL server.
Supported by Brazilian government's CNPq, grant number 200384/93-7.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal et al. An interval classifier for database mining applications. Proc. 18th Int. Conf. Very Large Databases, 560–573. Vancouver, 1992.
M.P. Burwen. The White Cross parallel database servers. The Superperformance Computing Service. Product/Technology Review No. 145. (Available from 2685 Marine Way, Suite 1212, Mountain View, CA, USA, 94043.)
M. Holsheimer and A. Siebes. Data mining: the search for knowledge in databases. Report CS-R9406. Amsterdam, The Netherlands: CWI, 1994.
M. Houtsma and A. Swami. Set-oriented mining for association rules in relational databases. Proc. IEEE Int. Conf. Data Engineering, 1995.
IBC Ltd. Proc. Conf. on Commercial Parallel Processing, London, Oct. 1995. (Available from IBC Technical Services Ltd., 57-61 Mortimer Street, London.)
D. Michie et al. (Ed.) Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1995.
G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases. Menlo Park, CA: AAAI, 1991.
F.J. Provost & J.M. Aronis. Scaling up inductive learning with massive parallelism. To appear in Machine Learning.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
M. Richeldi and M. Rossotto. Class-Driven statistical discretization of continuous attributes. Proc. 8th ECML-95. LNAI-912, 335–338.
C. Schaffer. A conservation law for generalization performance. Proc. 11th Int. Conf. Machine Learning, 259–265, 1994.
A. Shatdal and J.F. Naughton. Adaptive parallel aggregation algorithms. Proc. 1995 ACM SIGMOD Int. Conf. Management of Data, 104–114.
S.J. Stolfo et al. A parallel and distributed environment for database rule processing: open problems and future directions. In: M. Abdelguerfi & S. Lavington (Ed.) Emerging Trends in Database and Knowledge-Base Machine. IEEE Computer Science Press, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Freitas, A.A., Lavington, S.H. (1996). Parallel data mining for very large relational databases. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1996. Lecture Notes in Computer Science, vol 1067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61142-8_542
Download citation
DOI: https://doi.org/10.1007/3-540-61142-8_542
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61142-4
Online ISBN: 978-3-540-49955-8
eBook Packages: Springer Book Archive