Abstract
The clustering of high-dimensional data presents a critical computational problem. Therefore, it is convenient to arrange the cluster centres on a grid with a small dimensional space that reduces computational cost and can be easily visualized. Moreover, in real applications there is no sharp boundary between classes, real datasets are naturally defined in a fuzzy context. Therefore, fuzzy clustering fits better for complex real datasets to determine the best distribution. Self-organizing map (SOM) technique is appropriate for clustering and vector quantization of high-dimensional data. In this paper we present a new fuzzy learning approach called FMIG (fuzzy multilevel interior growing self-organizing maps). The proposed algorithm is a fuzzy version of MIGSOM (multilevel interior growing self-organizing maps). The main contribution of FMIG is to define a fuzzy process of data mapping and to take into account the fuzzy aspect of high-dimensional real datasets. This new algorithm is able to auto-organize the map accordingly to the fuzzy training property of the nodes. In the second step, the introduced scheme for FMIG is clustered by means of fuzzy C-means (FCM) to discover the interior class distribution of the learned data. The validation of FCM partitions is directed by applying six validity indexes. Superiority of the new method is demonstrated by comparing it with crisp MIGSOM, GSOM (growing SOM) and FKCN (fuzzy Kohonen clustering network) techniques. Thus, our new method shows improvement in term of quantization error, topology preservation and clustering ability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abonyi J, Migaly S, Szeifert F (2002) Fuzzy self-organizing map based on regularized fuzzy c-means clustering. Advances in soft computing, Engineering Design and Manufacturing
Abraham A, Nath B (2001) A neuro-fuzzy approach for modelling electricity demand in Victoria. Appl Soft Comput 1(2):127–138
Aguilera PA, Frenich AG (2001) Application of the Kohonen neural network in coastal water management: methodological development for the assessment and prediction of water quality. Water Res 35:4053–4062
Alimi AM, Hassine R, Selmi M (2003) Beta fuzzy logic systems: approximation properties in the MIMO case. Int J Appl Math Comput Sci 13(2):225–238
Alimi AM (2000) The beta system: toward a change in our use of neuro-fuzzy systems. Int J Manag, Invited Paper, no. June, pp 15–19
Alimi AM (2003) Beta neuro-fuzzy systems. TASK Q J, Special Issue on Neural Networks edited by Duch W and Rutkowska D, vol 7, no 1, pp 23–41
Amarasiri R, Alahakoon D, Smith KA (2004) HDGSOM: a modified growing self-organizing map for high dimensional data clustering. Fourth international coriference on hybrid intelligent systems (HIS’04), pp 216–221
Ayadi T, Ellouze M, Hamdani TM, Alimi AM (2012) Movie scenes detection with MIGSOM based on shots semi-supervised clustering. Neural Comput Appl. doi 10.1007/s00521-012-0930-5
Ayadi T, Hamdani TM, Alimi AM, and Khabou MA (2007) 2IBGSOM: interior and irregular boundaries growing self-organizing maps. IEEE sixth international conference on machine learning and applications, pp 387–392
Ayadi T, Hamdani TM, Alimi AM (2010) A new data topology matching technique with multilevel interior growing self-organizing maps. IEEE international conference on systems, man, and cybernetics, pp 2479–2486
Ayadi T, Hamdani TM, Alimi AM (2011) On the use of cluster validity for evaluation of MIGSOM clustering. ISCIII: 5th international symposium on computational intelligence and intelligent informatics, pp 121–126
Ayadi T, Hamdani TM, Alimi AM (2012) MIGSOM: multilevel interior growing self-organizing maps for high dimensional data clustering. In: Neural processing letters, vol 36, pp 235 256
Barrea A (2011) Local fuzzy c-means clustering for medical spectroscopy images. Appl Math Sci 5(30):1449–1458
Bezdek JC (1974) Cluster validity with fuzzy sets. J Cybern 58–72
Bezdek JC (1987) Convergence theory for fuzzy c-means. IEEE Trans SMC, pp 873–877
Bezdek JC, Harris JD (1978) Fuzzy partitions and relation: an axiomatic basic for clustering fuzzy sets and systems. Academic Press, Massachusetts
Bezdek JC, Castelaz PF (1977) Prototype classification and feature selection with fuzzy sets. IEEE Trans Syst Man Cybern 7(2):87–92
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Blake CL, Merz CJ (1998) UCI repository of machine learning databases, [http://www.ics.uci.edu/learn/MLRepository.html]. University of California, Department of Information and Computer Science, Irvine, CA
Chang PC, Liao TW (2006) Combining SOM and fuzzy rule base for flow time prediction in semiconductor manufacturing factory. Appl Soft Comput 6(2):198–206
Das S, Abraham A, Konar A (2008) Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm. Patt Recogn Lett 29:688–699
Das S, Abraham A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans SMC, vol 38, no 1, pp 218–237
Denaïa MA, Palisb F, Zeghbibb A (2007) Modeling and control of non-linear systems using soft computing techniques. Appl Soft Comput 7(3):728–738
Dhahri H, Alimi AM (2005) Hierarchical learning algorithm for the beta basis function neural network. In: Proceedings of third international conference on systems, signals & devices: SSD’2005, Sousse, Tunisia, March 2005
Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cybern 4:95–104
El Malek J, Alimi AM, Tourki R (2002) Problems in pattern classification in high dimensional spaces: behavior of a class of combined neuro-fuzzy classifiers. Fuzzy Sets Syst 128(1):15–33
Fritzke B (1994) Growing cell structure: a self organizing network for supervised and un-supervised learning. Neural Netw 7:1441–1460
Hamdani TM,. Alimi AM, Khabou MA (2011) An iterative method for deciding SVM and single layer neural network structures. Neural Process Lett 33(2):171–186
Hamdani TM, Alimi AM, Karray F (2008) Enhancing the structure and parameters of the centers for BBF fuzzy neural network classifier construction based on data structure. In: Proceedings of IEEE international join conference on neural networks, Hong Kong, IJCNN, art. no. 4634247, pp 3174–3180
Hamdani TM, Won JM, Alimi AM, Karray F (2011) Hierarchical genetic algorithm with new evaluation function and bi-coded representation for features selection with confidence rate. Appl Soft Comput 11(1):2501–2509, ISSN 1568–4946
Hsu AL, Halgarmuge SK (2001) Enhanced topology preservation of dynamic self-organising maps for data visualization. IFSA world congress and 20th NAFIPS international conference, vol 3, pp 1786–1791
Hu W, Xie D, Tan T, Maybank S (2004) Learning activity patterns using fuzzy self-organizing neural network. IEEE Trans Syst Man Cybern B 34(3):1618–1626
Kim DW, Lee KH, Lee D (2004) On cluster validity index for estimation of the optimal number of fuzzy clusters. Patt Recogn 37(10):2009–2025
Kim M, Ramakrishna RS (2005) New indices for cluster validity assessment. Patt Recogn Lett 26(15):2353–2363
Kohonen T (1998) Statistical pattern recognition with neural networks: benchmark studies. In: Proceedings of the second annual IEEE international conference on neural networks, vol 1
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69
Kohonen T (1984) Self-organization and associative memory. Springer, Berlin
Kohonen T (1997) Self-organizing maps, 2nd edn. Springer, Berlin
Kohonen T (1998) The self-organizing map. Neurocomputing 21:1–6
Kohonen T (2001) Self-organizing maps. Springer, Berlin
Li Hui-Ya, Hwang Wen-Jyi, Chang Chia-Yen (2011) Efficient fuzzy C-means architecture for image segmentation. Sensors 2011(11):6697–6718. doi:10.3390/s110706697
Mingoti SA, Lima JO (2006) Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms. Eur J Oper Res 174:1742–1759
Pal NR, Bezdek JC (1992) On cluster validity for fuzzy c-means model. IEEE Trans Fuzzy Syst 3:370–379
Pascual A, Barcena M, Merelo JJ, Carazo JM (2000) Mapping and fuzzy classification of macromolecular images using self-organizing neural networks. Ultramicroscopy 84:85–99
Petterssona F, Chakrabortib N, Saxéna H (2007) A genetic algorithms based multi-objective neural net applied to noisy blast furnace data. Appl Soft Comput 7(1):387–397
Ravia V, Kurniawana H, Thaia PN, Kumarb PR (2008) Soft computing system for bank performance prediction. Appl Soft Comput 8(1):305–315
Theodoridis Y (1996) Spatial datasets—an unofficial collection. Available from: http://www.dias.cti.gr/~ytheod/research/datasets/spatial.html
Tlili M, Ayadi T, Hamdani TM, Alimi AM (2012) FMIG: fuzzy multilevel interior growing self-organizing maps. IEEE international conference on tools with artificial intelligence (ICTAI)
Tlili M, Hamdani TM, Alimi AM, Abraham A (2014) FVINOS: fuzzy validity index with noise-overlap separation for clustering algorithms. Patt Recogn Lett (in press)
Tsekouras GE, Sarimveis H (2004) A new approach for measuring the validity of the fuzzy c-means algorithm. Adv Eng Softw 35(8–9):567–575
Velmurugan T, Santhanam T (2010) Performance evaluation of K-means and fuzzy C-means clustering algorithms for statistical distributions of input data points. Eur J Sci Res ISSN 1450–216X, 46(3):320–330
Wu KL, Yang MS (2005) A cluster validity index for fuzzy clustering. Patt Recogn Lett 26(9):1275–1291
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8):841–847
Yuksel ME, Besdok E (2004) A simple neuro-fuzzy impulse detector for efficient blur reduction of impulse noise removal operators for digital images. IEEE Trans Fuzzy Syst 12(6):854–865
Acknowledgments
The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), REGIM-Lab (LR11ES48) Tunisia, under the ARUB program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by E. Lughofer.
Appendix
Appendix
1.1 Description of the used validity indexes
Let \(X=\left\{ {x_1 ,x_2 ,...,x_n } \right\} \) be a dataset in an m-dimensional Euclidean space Rm with its ordinary Euclidean norm \(\left\| . \right\| \) and let C, be the matrix of cluster centres with \(c_i \), the centre of the cluster \(C_i \) (\(1\le i\le \mathrm{{nbc}})\). In case of fuzzy clustering, a pattern (point) may belong to all the clusters with a certain fuzzy membership grade. Consider the matrix \(U=\left| {u_{ik} } \right| \) where \(u_{ik} \) is the value of membership of the point k in cluster \(i\), and \(u_{ik} \) is in the interval [0,1] (\(1\le k\le n)\).
We present below the validity indices studied in this work depending on the geometric aspects of clusters (compactness, separation) and the data properties (noise, overlap) evaluated by each index.
1.1.1 SVI index
SVI (Separation Validity Index) (Wu and Yang 2005) is given by
\(\varvec{\pi }\) is a global compactness of the partition, it makes the sum of \(\pi _i \), \(1\le i\le \mathrm{{nbc}}\).
The compactness of the cluster c\(^i\) is defined by
The variation \(\sigma _i \) and the cardinality \(n_{i} \) of the cluster c\(_i\) are given, respectively, as
with \({u}_{ik}\), the membership degree of the vector \(x_k\) to the cluster c\(_i \) and \(\nu _i\) the centre of c\(_i \).
\(S\) is the global separation between the nbc clusters defined as
\(\mathrm{de}\nu _{ij} \) is the deviation between the centres of \(c_i\) and \(c_j \).
\(\left[ {z_1 ,z_2 ,...,z_{\mathrm{{nbc}},} z_{\mathrm{{nbc}}+1} } \right] =\left[ {\nu _1 ,\nu _2 ,...,\nu _{\mathrm{{nbc}},} \overline{x} } \right] ^T,\) where \(\overline{x} \) is the mean of X.
\(\mu _{ij} \) is the degree of membership of \(z_j \) to the centre \(z_i \).
1.1.2 XBI index
This index is normalized and gives the best number of clusters at its minimum value (Mingoti and Lima 2006) as shown:
where
\(\max _{k=1,..,\mathrm{{nbc}}} \left\{ {\sum \limits _{j=1}^n {\frac{u_{kj}^2 \left\| {x_j -c_k } \right\| ^2}{n_k }} } \right\} \) gives the max fuzzy distance between points \(x_j \) with membership degree \(u_{kj}\) to the center \(c_k \).
This index introduced the \(\max \mathrm{{Diff}}(\mathrm{{nbc}})\) factor in order to compare the nbc-partitions obtained by increasing nbc. Thus, we can find out the best partition.
with \(dw\) intra-cluster distance measured for different values of nbc.
1.1.3 PCAESN index
Called Partition Coefficient And Exponential Separation (PCAES) (Hu et al. 2004), this validity index takes into account two factors with a normalized partition coefficient and an exponential separation measure to validate separately each fuzzy cluster \(i\). PCAES is then defined as:
The validity index of cluster i is measured by:
where\(\sum \limits _{j=1}^n {{\mu _{ij}^2 }/{\mu _M }} \): the compactness of the cluster i compared to the most compact cluster with its value of compactness \(\mu _M \)measured by:
\(\mu _{ij}\) is the membership degree of vector j (\(1\le j\le n)\) to cluster i (\(1\le k\le \mathrm{{nbc}})\),
\( \text{ exp } ( {{-\min \{ }\left\| {a_\mathrm{i} -a_k } \right\| ^{2}{\} } / {\beta _\mathrm{T} }})\) is the separation measure relative to the total average separation \(\beta _\mathrm{T} \) of the nbc clusters given by:
\(a_\mathrm{i}\) is the center of cluster \(i\), and \(\overline{a} =\sum \nolimits _{i=1}^{\mathrm{{nbc}}} {a_i /nbc} \), the average of center vectors \(a_\mathrm{i} \).
Normalisation of PCAES index:
Basically, each validity index measure PCAESi (\(i =1{\ldots }{\mathrm{nbc}}\)) is obtained by a subtraction between the compactness and separation values which are defined in the interval [0 1]. Consequently, we present PCAES values as
Thus, to obtain the value of PCAES in [0 1], we have specified PCAESN Tlili et al. (2014) as following:
1.1.4 VOS index
Validity Overlap Separation (VOS) (Pal and Bezdek 1992) gives its values in the interval [0 1] and reaches its best partition for the minimum value.
This index is given by
where \(Overlap^N(nbc,U)\) gives an inter-cluster overlap for different values of nbc, normalized by the max overlap for nbc = 2...nbcmax.
\(Sep^N(nbc,U)\) calculates the separation between clusters for different values of nbc, normalized by the max separation for nbc = 2...nbcmax.
1.2 Overlap function
\(P(\overline{F} _p ,\overline{F} _q )\) defines the total overlap between two fuzzy clusters \(\overline{F} _p \) and \(\overline{F} _q \):
The function \(f(\mu :\overline{F} _p ,\overline{F} _q )\) calculates the overlap degree between two fuzzy clusters \(\overline{F} _p \) and \(\overline{F} _q \) at a membership degree \(\mu \) given by
\(\delta (x_j , \mu :\overline{F} _p ,\overline{F} _q )\) indicates whether two clusters are overlapping at the membership degree \(\mu \) for data point \(x_j \). It returns an overlap value of \(\omega (x_j )\) when the membership degrees of two clusters are both greater than \(\mu \); otherwise it returns 0.0:
\(\omega (x_j )\) is a weight factor for each point \(x_j \) between clusters. \(\omega (x_j )\) is a value in [0 1].
1.3 Separation function
The separation measure is obtained using the similarity distance \(S(\overline{F} _p ,\overline{F} _q )\) between two fuzzy clusters \(\overline{F} _p \) and \(\overline{F} _q \). It is defined as
\(S(\overline{F} _p ,\overline{F} _q )\) is the maximum membership degree between two clusters \(\overline{F} _p \) and \(\overline{F} _q \) in the interval [0 1]:
where \(\mu _{\overline{F} p} \text{( }x\text{) }\), \(\mu _{\overline{F} q} (x)\) are the membership degrees of vector \(x\), respectively, in \(\overline{F} _p \) and \(\overline{F} _q \).
1.3.1 Dunn index
The validity indice Dunn (1974) is based on inter-cluster distance and the diameter of a hyperspheric cluster. Dunn index is given as
\(diam(c_k )\) is the diameter of the cluster \(c_k \).
1.3.2 FVINOS index
Fuzzy Validity Index with Noise-Overlap Separation (Tlili et al. 2014) (FVINOS) is inspired from the Davies–Bouldin validity index Mingoti and Lima (2006). FVINOS is defined as
\(\min _{l=1,...,\mathrm{{nbc}},l\ne i} \left\{ {d_{i,l} } \right\} \) calculates the minimum separation between two clusters \(i\) and l.
max Diff(nbc) factor is given to compare each two successive obtained partitions. In consequence, the best partition of the dataset is found.
\(\mathrm{diff}_i (\mathrm{nbc})\) calculates the difference between max sums of compactness in a pair of clusters \(i\) and k; this is calculated for the obtained partition at nbc and nbc+1.
We define the average of fuzzy compactness relative to cluster \(i\) as
where \(u_i (x)\) is the membership degree of point \(x\) to cluster \(i\), \(m\) is the fuzzifier factor.
\(d(x,c_i )\) is the Euclidean distance separating \(x\) from the cluster centre \(c_i \).
Rights and permissions
About this article
Cite this article
Tlili, M., Ayadi, T., Hamdani, T.M. et al. Performance evaluation of FMIG clustering using fuzzy validity indexes. Soft Comput 19, 3515–3528 (2015). https://doi.org/10.1007/s00500-014-1478-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-014-1478-3