A new kernel fuzzy based feature extraction method using attraction points | Multidimensional Systems and Signal Processing Skip to main content
Log in

A new kernel fuzzy based feature extraction method using attraction points

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

This paper aims at introducing a novel supervised feature extraction method to be used in small sample size situations. The proposed approach considers the class membership of samples and exploits a nonlinear mapping in order to extract the relevant features and to mitigate the Hughes phenomenon. The proposed objective function is composed of three different terms, namely, attraction function, repulsion function, and the between-feature scatter matrix, where the last term increases the difference between extracted features. Subsequently, the attraction function and the repulsion function are redefined by incorporating the membership degrees of samples. Finally, the proposed method is extended using the kernel trick to capture the inherent nonlinearity of the original data. To evaluate the accuracy of the proposed feature extraction method, four remote sensing images are used in our experiments. The experiments indicate that the proposed feature extraction method is anappropriate choice for classification of hyperspectral images using limited training samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10), 2385–2404.

    Article  Google Scholar 

  • Camps-Valls, G., Shervashidze, N., & Borgwardt, K. M. (2010). Spatio-spectral remote sensing image classification with graph kernels. IEEE Geoscience and Remote Sensing Letters, 7(4), 741–745.

    Article  Google Scholar 

  • Chang, C., & Linin, C. (2008). LIBSVM—A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

  • Chen, L. F., Mark Liao, H. Y., Ko, M. T., Lin, J Ch., & Yu, G. J. (2000). A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 33, 1713–1726.

    Article  Google Scholar 

  • Cui, Y., & Fan, L. (2012). Feature extraction using fuzzy maximum margin criterion. Neurocomputing, 86, 52–58.

    Article  Google Scholar 

  • Dehghani, H., & Ghassemian, H. (2006). Measurement of uncertainty by the entropy: Application to the classification of MSS data. International Journal of Remote Sensing, 27(18), 4005–4014.

    Article  Google Scholar 

  • Ding, S., Meng, L., Han, Y., & Xue, Y. (2017a). A review on feature binding theory and its functions observed in perceptual process. Cognitive Computation, 9(2), 194–206.

    Article  Google Scholar 

  • Ding, S., Zhang, X., An, Y., & Xue, Y. (2017b). Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognition, 67, 32–46.

    Article  Google Scholar 

  • Foody, G. M. (2004). Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy. Photogrammetric Engineering and Remote Sensing, 70, 627–633.

    Article  Google Scholar 

  • Gao, F., Lv, W., Zhang, Y., Sun, J., Wang, J., & Yang, E. (2016). A novel semisupervised support vector machine classifier based on active learning and context information. Multidimensional Systems and Signal Processing, 27(4), 969–988.

    Article  MathSciNet  Google Scholar 

  • Hastie, T., Buja, A., & Tibshirane, R. (1995). Penalized discriminant analysis. Annals of Statistics, 23(1), 73–102.

    Article  MathSciNet  MATH  Google Scholar 

  • Howland, P., & Park, H. (2004). Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 995–1006.

    Article  Google Scholar 

  • Imani, M., & Ghassemian, H. (2014a). Feature extraction using attraction points for classification of hyperspectral images in a small sample size situation. Geoscience and Remote Sensing Letters, 11(11), 1986–1990.

    Article  Google Scholar 

  • Imani, M., & Ghassemian, H. (2014b). Band clustering-based feature extraction for classification of hyperspectral images using limited training samples. Geoscience and Remote Sensing Letters, 11(8), 1325–1329.

    Article  Google Scholar 

  • Imani, M., & Ghassemian, H. (2015). Feature space discriminant analysis for hyperspectral data feature reduction. ISPRS Journal of Photogrammetry and Remote Sensing, 102, 1–13.

    Article  Google Scholar 

  • Ji, Sh W, & Ye, J. P. (2008). Generalized linear discriminant analysis: A unified framework and efficient model selection. IEEE Transaction on Neural Networks, 19(10), 1768–1782.

    Article  Google Scholar 

  • Kamandar, M., & Ghassemian, H. (2013). Linear feature extraction for hyperspectral images based on information theoretic learning. IEEE Geoscience and Remote Sensing Letters, 10(4), 702–706.

    Article  Google Scholar 

  • Kathrin S. (2004).On the Kronecker product. Master’s Thesis, University of Waterloo.

  • Kwak, K., & Pedrycz, W. (2005). Face recognition using a fuzzy fisherface classifier. Pattern Recognition, 38, 1717–1732.

    Article  Google Scholar 

  • Landgrebe, D. A. (2002). Hyperspectral image data analysis. IEEE Signal Processing Magazine, 19(1), 17–28.

    Article  Google Scholar 

  • Li, H. F., Jiang, T., & Zhang, K Sh. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transaction on Neural Networks, 17(1), 157–165.

    Article  Google Scholar 

  • Li, J., et al. (2015). Multiple feature learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 53(3), 1592–1606.

    Article  Google Scholar 

  • Liang, Y. X., Li, Ch R, Gong, W. G., & Pan, Y. J. (2007). Uncorrelated linear discriminant analysis based on weighted pairwise fisher criterion. Pattern Recognition, 40, 3606–3615.

    Article  MATH  Google Scholar 

  • Liu, S., Feng, L., Liu, Y., Wu, J., Sun, M., & Wang, W. (2016). Robust discriminative extreme learning machine for relevance feedback in image retrieval. Multidimensional Systems and Signal Processing, 1, 1–19.

    Google Scholar 

  • Lotlikar, R., & Kothari, R. (2000). Fractional-step dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6), 623–627.

    Article  Google Scholar 

  • Lu, J., Plataniotis, K. N., & Venetsanopoulos, A. N. (2005). Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recognition Letters, 26(2), 181–191.

    Article  Google Scholar 

  • Marconcini, M., Camps-Valls, G., & Bruzzone, L. (2009). A composite semisupervised SVM for classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 6(2), 234–238.

    Article  Google Scholar 

  • Melgani, M., & Bruzzone, L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790.

    Article  Google Scholar 

  • Pekalska, E., & Haasdonk, B. (2009). Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1017–1032.

    Article  Google Scholar 

  • Prasad, B. K., & Sanyal, G. (2016). Novel features and a cascaded classifier based Arabic numerals recognition system. Multidimensional Systems and Signal Processing, 1, 1–18.

    Google Scholar 

  • Price, R., & Gee, F. (2005). Face recognition using direct, weighted linear discriminant analysis and modular subspaces. Pattern Recognition, 38, 209–219.

    Article  MATH  Google Scholar 

  • Scholkopf, B., Smola, A. J., & Muller, K. R. (1997). Kernel principal component. In: Analysis: Lecture notes in computer science.

  • Shahdoosti, H. R., & Javaheri, N. (2017). Pansharpening of clustered MS and Pan images considering mixed pixels. IEEE Geoscience and Remote Sensing Letters, 14(6), 826–830.

    Article  Google Scholar 

  • Shahdoosti, H. R., & Javaheri, N. (2018a). A fast algorithm for feature extraction of hyperspectral images using the first order statistics. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-018-5695-0.

    Google Scholar 

  • Shahdoosti, H. R., & Javaheri, N. (2018b). A new hybrid feature extraction method in a dyadic scheme for classification of hyperspectral data. International Journal of Remote Sensing, 39(1), 101–130.

    Article  Google Scholar 

  • Shahdoosti, H. R., & Mirzapour, F. (2017). Spectral–spatial feature extraction using orthogonal linear discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing, 50(1), 111–124.

    Article  Google Scholar 

  • Shahshahani, B. M., & Landgrebe, D. A. (1994). The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5), 1087–1095.

    Article  Google Scholar 

  • Wang, J. G., Lin, Y Sh, Yang, W. K., & Yang, J. Y. (2008). Kernel maximum scatter difference based feature extraction and its application to face recognition. Pattern Recognition Letters, 29, 1832–1835.

    Article  Google Scholar 

  • Xia, J., Chanussot, J., Du, P., & He, X. (2014). (Semi-) supervised probabilistic principal component analysis for hyperspectral remote sensing image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6), 2224–2236.

    Article  Google Scholar 

  • Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6), 1656–1671.

    Article  Google Scholar 

  • Yang, W. K., Wang, J. G., Ren, M. W., Zhang, L., & Yang, J. Y. (2009). Feature extraction using fuzzy inverse FDA. Neurocomputing, 72, 3384–3390.

    Article  Google Scholar 

  • Ye, J. P. (2006). Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. The Journal of Machine Learning Research, 7, 1183–1204.

    MathSciNet  MATH  Google Scholar 

  • Ye, J. P., & Li, Q. (2005). A two-stage linear discriminant analysis via QR-decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 929–941.

    Article  Google Scholar 

  • Yu, H., & Yang, J. (2001). A direct LDA algorithm for high-dimensional data—With application to face recognion. Pattern Recognition, 34, 2067–2070.

    Article  MATH  Google Scholar 

  • Zhang, J., Ding, S., Zhang, N., & Shi, Z. (2016). Incremental extreme learning machine based on deep feature embedded. International Journal of Machine Learning and Cybernetics, 7(1), 111–120.

    Article  Google Scholar 

  • Zhu, M., & Martinez, A. M. (2006). Selecting principal components in a two-stage LDA algorithm. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) (vol. 1, pp. 132–137).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Reza Shahdoosti.

Appendices

Appendix A

Considering \( {\mathbf{A}} = 4{\mathbf{UPU}}^{T} \), \( {\mathbf{B}} = 2\gamma {\mathbf{Q}}^{T} {\mathbf{Q}} \), and \( {\mathbf{C}} = {\mathbf{UU}}^{T} \), and applying the vec operator to Eq. (7) yield:

$$ \text{vec} ({\mathbf{TA}}) + \text{vec} ({\mathbf{BTC}}) - \lambda \text{vec} ({\mathbf{T}}) = 0 $$
(17)

Substituting \( \text{vec} ({\mathbf{TA}}) \) with \( \text{vec} ({\mathbf{I}}_{m \times m} {\mathbf{TA}}) \), where \( {\mathbf{I}}_{m \times m} \) is an m × m identity matrix, and using the equality \( \text{vec} ({\mathbf{abc}}) = ({\mathbf{c}}^{T} \otimes {\mathbf{a}})\text{vec} ({\mathbf{b}}) \) (Kathrin 2004), where \( \otimes \) is the Kronecker product, one can rewrite Eq. (17) as:

$$ ({\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} )vec({\mathbf{T}}) + ({\mathbf{C}}^{T} \otimes {\mathbf{B}})vec({\mathbf{T}}) - \lambda vec({\mathbf{T}}) = 0 $$
(18)

which is equal to Eq. (8).

Appendix B

Considering Eq. (6), one should maximize the following equation under the normalization constraint:

$$ 2\text{tr} ({\mathbf{TUPU}}^{T} {\mathbf{T}}^{T} ) + \gamma \text{tr} ({\mathbf{U}}^{T} {\mathbf{T}}^{T} {\mathbf{Q}}^{T} {\mathbf{QTU}}\varvec{)} $$
(19)

Using the circular property of trace, one may write:

$$ 2\text{tr} ({\mathbf{T}}^{T} {\mathbf{TUPU}}^{T} ) + \gamma \text{tr} ({\mathbf{T}}^{T} {\mathbf{Q}}^{T} {\mathbf{QTUU}}^{T} \varvec{)} $$
(20)

Using the equality \( \text{tr} ({\mathbf{a}}^{T} {\mathbf{b}}) = \text{vec} ({\mathbf{a}})^{T} \text{vec} ({\mathbf{b}}) \) (Kathrin 2004), one can rewrite Eq. (20) as:

$$ 2\text{vec} ({\mathbf{T}})^{T} \text{vec} ({\mathbf{TUPU}}^{T} ) + \gamma \text{vec} ({\mathbf{T}})^{T} \text{vec} ({\mathbf{Q}}^{T} {\mathbf{QTUU}}^{T} ) $$
(21)

Using the equality \( \text{vec} ({\mathbf{abc}}) = ({\mathbf{c}}^{T} \otimes {\mathbf{a}})\text{vec} ({\mathbf{b}}) \) (Kathrin 2004) and defining \( {\mathbf{A}} = 4{\mathbf{UPU}}^{T} \), \( {\mathbf{B}} = 2\gamma {\mathbf{Q}}^{T} {\mathbf{Q}} \), and \( {\mathbf{C}} = {\mathbf{UU}}^{T} \), Eq. (21) can be rewritten as:

$$ \begin{aligned} & \text{vec} ({\mathbf{T}})^{T} ({\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} )\text{vec} ({\mathbf{T}}) + \text{vec} ({\mathbf{T}})^{T} ({\mathbf{C}}^{T} \otimes {\mathbf{B}})\text{vec} ({\mathbf{T}}) \\ & \quad = \text{vec} ({\mathbf{T}})^{T} ({\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}})\text{vec} ({\mathbf{T}}) \\ \end{aligned} $$
(22)

Due to the fact that \( \text{vec} ({\mathbf{T}}) \) is the eigenvector of \( {\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}} \) (see Eq. (8)), one can conclude:

$$ \text{vec} ({\mathbf{T}})^{T} ({\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}})\text{vec} ({\mathbf{T}}) = \lambda \text{vec} ({\mathbf{T}})^{T} \text{vec} ({\mathbf{T}}) $$
(23)

Using the equality \( \text{tr} ({\mathbf{a}}^{T} {\mathbf{b}}) = \text{vec} ({\mathbf{a}})^{T} \text{vec} ({\mathbf{b}}) \) and considering the normalization constraint i.e. \( \varvec{tr(}{\mathbf{TT}}^{T} \varvec{) = 1} \), yield:

$$ \lambda \text{vec} ({\mathbf{T}})^{T} \text{vec} ({\mathbf{T}}) = \lambda \text{tr} ({\mathbf{TT}}^{T} ) = \lambda $$
(24)

So, the maximum of Eq. (19) is obtained if \( \text{vec} ({\mathbf{T}}) \) is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}} \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahdoosti, H.R., Javaheri, N. A new kernel fuzzy based feature extraction method using attraction points. Multidim Syst Sign Process 30, 1009–1027 (2019). https://doi.org/10.1007/s11045-018-0592-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-018-0592-2

Keywords

Navigation