Abstract
Many enterprises have accumulated a large amount of data over time. To achieve competitive advantages, enterprises need to find effective ways to analyze and understand the vast amounts of raw data they have. Different methods and techniques have been used to reduce the data volume to a manageable level and to help enterprises identify the business value from the data sets. In particular, segmentation methods have been widely used in the area of data mining. In this paper, we present a new algorithm for data segmentation which can be used to build time-dependent customer behavior models. The proposed model has the potential to solve the optimization problem in data segmentation.
Similar content being viewed by others
References
Zeng L, Li L, Duan L (2012) Business intelligence in enterprise computing environment. Information Technology and Management published online doi:10.1007/s10799-012-0123-z
Li L, Ge R, Zhou S, Valerdi R (2012) Guest editorial integrated heathcare information systems. IEEE Trans Inf Technol Biomed 16(4):515–517
Xu L (2011) Enterprise Systems: state-of-the-art and future trends. IEEE Trans Industr Inf 7(4):630–640
Zeng L, et al. (2012) Distributed data mining: a survey information technology and management published online doi: 10.1007/s10799-012-0124-y
Duan L, Street W, Xu E (2011) Heathcare information systems: data mining methods in the creation of a clinical recommender system. Enterp Inf Syst 5(2):169–181
Xu L, Liang N, Gao Q (2008) An integrated approach for agricultural ecosystem management. IEEE Trans Syst Man Cybern Part C Appl Rev 38(4):590–599
Shi Z et al (2007) MSMiner-a developing platform for OLAP. Decis Support Syst 42(4):2016–2028
Liu B, Cao S, He W (2011) Distributed data mining for e-business. Inf Technol Manage 12(2):67–79
Duan L, Xu L (2012) Business Intelligence for Enterprise Systems: a Survey. IEEE Transactions on Industrial Informatics online published 2012. doi:10.1109/TII.2012.2188804
McCarty J, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60(6):656–662
Li J, Wang K, Xu L (2009) Chameleon based on clustering feature tree and its application in customer segmentation. Ann Oper Res 168(1):225–245
Qi J et al (2009) ADTrees Logit model for customer churn prediction. Ann Oper Res 168(1):247–265
Shan S, Wang L, Wang J, Hao Y, Hua F (2011) Research on e-government evaluatioon model based on the principal component analysis. Inf Technol Manage 12(2):173–185
Sheth-Voss P, Carreras I (2010) How informative is your segmentation? A simple new metric yields surprising results. In: A magazine of management and applications. Marketing research. American Marketing Association, pp 8–13
Han J, Kamber M (2001) Data mining: conceptions and techniques. Morgan Kaufman, San Francisco
Kaufman L, Rousseew P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Jain A, Dudes R (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
Murtagh F (1985) Multidimensional clustering algorithms. Physica-Verlag HD, Vienna
Olson C (1995) Parallel algorithms for hierarchical clustering. Parallel Comput 21:1313–1325
Lance G, Williams W (1967) A general theory of classification sorting strategies. Comput J 9:373–386
Tan P, Dowe D (2004) MML inference and oblique decision trees. In: Webb G, Yu X (eds) Lectures notes in artificial intelligence. Springer, Berlin, Heidelberg
Mitchell T (1997) Machine learning. McGraw-Hill, New York. ISBN: 0070428077
Rokach L, Maimon O (2007) Data mining with decision trees. Theory and applications. World Scientific Publishing Co., Singapore
Colin A (1996) Building decision trees with the ID3 algorithm. Dr. Dobbs Journal
MacKay D (1995) A short course in information theory. Cavendish Laboratory, Cambridge
Michalski R, Stepp R (1983) Learning from observation: conceptual clustering. In: Michalski R, Tecuci G (eds) Machine learning: an artificial intelligence approach. Morgan Kaugman Publishers, Inc, San Francisco
McLachlan G, Basford K (1987) Mixed models: inference and applications to clustering. CRC Press, New York
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc (Series B) 39(1):1–38
Korenjak-Černe S, Batagelj V, Pavešić B (2011) Clustering large data sets described with discrete distributions and its application on TIMSS data set. Stat Anal Data Min 4(2):199–215
Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th Conference on VLDB, Santiago, Chile, pp 144–155
Kaufman L, Rousseew P (2005) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Hartigan J (1975) Clustering algorithms. Wiley, New York
Hartigan J, Wong M (1979) Algorithm AS136: a k-means clustering algorithm. Appl Stat 28(1):100–108
Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. Proceedings of the 7th ACM SIGKDD, San Francisco, California, pp.269–274
Hamerly G (2010) Making k-means even faster. In: Proceedings of the Tenth SIAM International Conference on Data Mining, pp. 130–140
Ackermann M, Lammersen C, Märtens M, Raupach C, Sohlerz C, Swierkot K (2010) StreamKM++: a clustering algorithm for data streams. In: Proceedings of the Twelfth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 173–187
Braverman V, Meyerson A, Ostrovsky R, Roytman A, Shindler M, Tagiku B (2011) Streaming k-means on well-clusterable data. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 26–40
Duan L, Xu L, Liu Y, Lee J (2009) Cluster-based outlier detection. Ann Oper Res 168(1):151–168
Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with noise. Inf Syst 32(7):978–986
Blackstock A, Manatunga A, Park Y, Jones D, Yu T (2011) Clustering based on periodicity in high-throughput time course data. Stat Anal Data Min 6(4):579–589
Taliun A, Bohlen M, Mazeika A (2009) Core : nonparametric clustering of large numeric databases. In: Proceedings of the SIAM International Conference on Data Mining pp. 14–25
Muller E, Assent I, Krieger R, Gunnemann S, Seidl T (2009) DensEst: density estimation for data mining in high dimensional spaces. In: Proceedings of the SIAM International Conference on Data Mining pp. 175–186
Liao W, Liu Y, Choudhary A (2004) A grid-based clustering algorithm using adaptive mesh refinement. www.ece.northwestern.edu/~choudhar/publications/pdf/LiaLiu04A.pdf
Schikuta E, Erhart M (1997) The BANG-clustering system: grid-based data analysis. In: Proceedings of the 2nd International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data, Springer, pp. 513–524
Wang W, Yang J, Munz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd Conference on VLDB, Athens, Greece, pp.186–195
Li L (2011) Introduction: advances in e-business engineering. Inf Technol Manage 12(2):49–50
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bulysheva, L., Bulyshev, A. Segmentation modeling algorithm: a novel algorithm in data mining. Inf Technol Manag 13, 263–271 (2012). https://doi.org/10.1007/s10799-012-0136-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-012-0136-7