Abstract
An automated data mining service offers an out-sourced, cost-effective analysis option for clients desiring to leverage their data resources for decision support and operational improvement. In the context of the service model, typically the client provides the service with data and other information likely to aid in the analysis process (e.g. domain knowledge, etc.). In return, the service provides analysis results to the client. We describe the required processes, issues, and challenges in automating the data mining and analysis process when the high-level goals are: (1) to provide the client with a high quality, pertinent analysis result; and (2) to automate the data mining service, minimizing the amount of human analyst effort required and the cost of delivering the service. We argue that by focusing on client problems within market sectors, both of these goals may be realized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207–216, Washington, D.C., May 1993.
J. D. Becher, P. Berkhin, and E. Freeman. Automating exploratory data analysis for efficient mining. In Proc. of the Sixth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD-2000), pages 424–429, Boston, MA, 2000.
P. S. Bradley and U. M. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91–99. Morgan Kaufmann, San Francisco, CA, 1998.
P. S. Bradley, J. Gehrke, R. Ramakrishnan, and R. Srikant. Scaling mining algorithms to large databases. Comm. of the ACM, 45(8):38–43, 2002.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
I. V. Cadez and P. S. Bradley. Model based population tracking and automatic detection of distribution changes. In Proc. Neural Information Processing Systems 2001, 2001.
D. M. Chickering. Personal communication, January 2003.
CRISP-DM Consortium. Cross industry standard process for data mining (crispdm). http://www.crisp-dm.org/.
Microsoft Corp. Introduction to ole db for data mining. http://www.microsoft.com/data/oledb/dm.htm.
R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, New York, 2000.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy. Advances in Knowledge Discovery and Data Mining. MIT Press, Cambridge, MA, 1996.
Data Mining Group. Pmml version 2.0. http://www.dmg.org/index.htm.
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 73–84, New York, 1998. ACM Press.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discovering association rules. In Usama M. Fayyad and Ramasamy Uthurusamy, editors, AAAI Workshop on Knowledge Discovery in Databases (KDD-94), pages 181–192, Seattle, Washington, 1994. AAAI Press.
Nimrod Megiddo and Ramakrishnan Srikant. Discovering predictive association rules. In Knowledge Discovery and Data Mining, pages 274–278, 1998.
Sreerama K. Murthy. Automatic construction of decision trees from data: A multidisciplinary survey. Data Mining and Knowledge Discovery, 2(4):345–389, 1998.
M. T. Oguz. Strategic intelligence: Business intelligence in competitive strategy. DM Review, August 2002.
Clark F. Olson. Parallel algorithms for hierarchical clustering. Parallel Computing, 21(8): 1313–1325, 1995.
G. Piatetsky-Shapiro. Personal communication, January 2003.
Foster J. Provost and Tom Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Knowledge Discovery and Data Mining, pages 43–48, 1997.
D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, San Francisco, CA, 1999.
Padhraic Smyth. Clustering using monte carlo cross-validation. In Knowledge Discovery and Data Mining, pages 126–133, 1996.
M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36:111–147, 1974.
D. E. Weisman and C. Buss. Database functionality high, analytics lags, September 28, 2001. Forrester Brief: Business Technographics North America.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bradley, P.S. (2003). Data Mining as an Automated Service. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_1
Download citation
DOI: https://doi.org/10.1007/3-540-36175-8_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive