Abstract
This paper aims at introducing a novel supervised feature extraction method to be used in small sample size situations. The proposed approach considers the class membership of samples and exploits a nonlinear mapping in order to extract the relevant features and to mitigate the Hughes phenomenon. The proposed objective function is composed of three different terms, namely, attraction function, repulsion function, and the between-feature scatter matrix, where the last term increases the difference between extracted features. Subsequently, the attraction function and the repulsion function are redefined by incorporating the membership degrees of samples. Finally, the proposed method is extended using the kernel trick to capture the inherent nonlinearity of the original data. To evaluate the accuracy of the proposed feature extraction method, four remote sensing images are used in our experiments. The experiments indicate that the proposed feature extraction method is anappropriate choice for classification of hyperspectral images using limited training samples.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10), 2385–2404.
Camps-Valls, G., Shervashidze, N., & Borgwardt, K. M. (2010). Spatio-spectral remote sensing image classification with graph kernels. IEEE Geoscience and Remote Sensing Letters, 7(4), 741–745.
Chang, C., & Linin, C. (2008). LIBSVM—A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Chen, L. F., Mark Liao, H. Y., Ko, M. T., Lin, J Ch., & Yu, G. J. (2000). A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 33, 1713–1726.
Cui, Y., & Fan, L. (2012). Feature extraction using fuzzy maximum margin criterion. Neurocomputing, 86, 52–58.
Dehghani, H., & Ghassemian, H. (2006). Measurement of uncertainty by the entropy: Application to the classification of MSS data. International Journal of Remote Sensing, 27(18), 4005–4014.
Ding, S., Meng, L., Han, Y., & Xue, Y. (2017a). A review on feature binding theory and its functions observed in perceptual process. Cognitive Computation, 9(2), 194–206.
Ding, S., Zhang, X., An, Y., & Xue, Y. (2017b). Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognition, 67, 32–46.
Foody, G. M. (2004). Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy. Photogrammetric Engineering and Remote Sensing, 70, 627–633.
Gao, F., Lv, W., Zhang, Y., Sun, J., Wang, J., & Yang, E. (2016). A novel semisupervised support vector machine classifier based on active learning and context information. Multidimensional Systems and Signal Processing, 27(4), 969–988.
Hastie, T., Buja, A., & Tibshirane, R. (1995). Penalized discriminant analysis. Annals of Statistics, 23(1), 73–102.
Howland, P., & Park, H. (2004). Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 995–1006.
Imani, M., & Ghassemian, H. (2014a). Feature extraction using attraction points for classification of hyperspectral images in a small sample size situation. Geoscience and Remote Sensing Letters, 11(11), 1986–1990.
Imani, M., & Ghassemian, H. (2014b). Band clustering-based feature extraction for classification of hyperspectral images using limited training samples. Geoscience and Remote Sensing Letters, 11(8), 1325–1329.
Imani, M., & Ghassemian, H. (2015). Feature space discriminant analysis for hyperspectral data feature reduction. ISPRS Journal of Photogrammetry and Remote Sensing, 102, 1–13.
Ji, Sh W, & Ye, J. P. (2008). Generalized linear discriminant analysis: A unified framework and efficient model selection. IEEE Transaction on Neural Networks, 19(10), 1768–1782.
Kamandar, M., & Ghassemian, H. (2013). Linear feature extraction for hyperspectral images based on information theoretic learning. IEEE Geoscience and Remote Sensing Letters, 10(4), 702–706.
Kathrin S. (2004).On the Kronecker product. Master’s Thesis, University of Waterloo.
Kwak, K., & Pedrycz, W. (2005). Face recognition using a fuzzy fisherface classifier. Pattern Recognition, 38, 1717–1732.
Landgrebe, D. A. (2002). Hyperspectral image data analysis. IEEE Signal Processing Magazine, 19(1), 17–28.
Li, H. F., Jiang, T., & Zhang, K Sh. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transaction on Neural Networks, 17(1), 157–165.
Li, J., et al. (2015). Multiple feature learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 53(3), 1592–1606.
Liang, Y. X., Li, Ch R, Gong, W. G., & Pan, Y. J. (2007). Uncorrelated linear discriminant analysis based on weighted pairwise fisher criterion. Pattern Recognition, 40, 3606–3615.
Liu, S., Feng, L., Liu, Y., Wu, J., Sun, M., & Wang, W. (2016). Robust discriminative extreme learning machine for relevance feedback in image retrieval. Multidimensional Systems and Signal Processing, 1, 1–19.
Lotlikar, R., & Kothari, R. (2000). Fractional-step dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(6), 623–627.
Lu, J., Plataniotis, K. N., & Venetsanopoulos, A. N. (2005). Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recognition Letters, 26(2), 181–191.
Marconcini, M., Camps-Valls, G., & Bruzzone, L. (2009). A composite semisupervised SVM for classification of hyperspectral images. IEEE Geoscience and Remote Sensing Letters, 6(2), 234–238.
Melgani, M., & Bruzzone, L. (2004). Classification of hyperspectral remote sensing images with support vector machines. IEEE Transactions on Geoscience and Remote Sensing, 42(8), 1778–1790.
Pekalska, E., & Haasdonk, B. (2009). Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 1017–1032.
Prasad, B. K., & Sanyal, G. (2016). Novel features and a cascaded classifier based Arabic numerals recognition system. Multidimensional Systems and Signal Processing, 1, 1–18.
Price, R., & Gee, F. (2005). Face recognition using direct, weighted linear discriminant analysis and modular subspaces. Pattern Recognition, 38, 209–219.
Scholkopf, B., Smola, A. J., & Muller, K. R. (1997). Kernel principal component. In: Analysis: Lecture notes in computer science.
Shahdoosti, H. R., & Javaheri, N. (2017). Pansharpening of clustered MS and Pan images considering mixed pixels. IEEE Geoscience and Remote Sensing Letters, 14(6), 826–830.
Shahdoosti, H. R., & Javaheri, N. (2018a). A fast algorithm for feature extraction of hyperspectral images using the first order statistics. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-018-5695-0.
Shahdoosti, H. R., & Javaheri, N. (2018b). A new hybrid feature extraction method in a dyadic scheme for classification of hyperspectral data. International Journal of Remote Sensing, 39(1), 101–130.
Shahdoosti, H. R., & Mirzapour, F. (2017). Spectral–spatial feature extraction using orthogonal linear discriminant analysis for classification of hyperspectral data. European Journal of Remote Sensing, 50(1), 111–124.
Shahshahani, B. M., & Landgrebe, D. A. (1994). The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5), 1087–1095.
Wang, J. G., Lin, Y Sh, Yang, W. K., & Yang, J. Y. (2008). Kernel maximum scatter difference based feature extraction and its application to face recognition. Pattern Recognition Letters, 29, 1832–1835.
Xia, J., Chanussot, J., Du, P., & He, X. (2014). (Semi-) supervised probabilistic principal component analysis for hyperspectral remote sensing image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6), 2224–2236.
Xue, B., Zhang, M., & Browne, W. N. (2013). Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6), 1656–1671.
Yang, W. K., Wang, J. G., Ren, M. W., Zhang, L., & Yang, J. Y. (2009). Feature extraction using fuzzy inverse FDA. Neurocomputing, 72, 3384–3390.
Ye, J. P. (2006). Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. The Journal of Machine Learning Research, 7, 1183–1204.
Ye, J. P., & Li, Q. (2005). A two-stage linear discriminant analysis via QR-decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 929–941.
Yu, H., & Yang, J. (2001). A direct LDA algorithm for high-dimensional data—With application to face recognion. Pattern Recognition, 34, 2067–2070.
Zhang, J., Ding, S., Zhang, N., & Shi, Z. (2016). Incremental extreme learning machine based on deep feature embedded. International Journal of Machine Learning and Cybernetics, 7(1), 111–120.
Zhu, M., & Martinez, A. M. (2006). Selecting principal components in a two-stage LDA algorithm. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06) (vol. 1, pp. 132–137).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Considering \( {\mathbf{A}} = 4{\mathbf{UPU}}^{T} \), \( {\mathbf{B}} = 2\gamma {\mathbf{Q}}^{T} {\mathbf{Q}} \), and \( {\mathbf{C}} = {\mathbf{UU}}^{T} \), and applying the vec operator to Eq. (7) yield:
Substituting \( \text{vec} ({\mathbf{TA}}) \) with \( \text{vec} ({\mathbf{I}}_{m \times m} {\mathbf{TA}}) \), where \( {\mathbf{I}}_{m \times m} \) is an m × m identity matrix, and using the equality \( \text{vec} ({\mathbf{abc}}) = ({\mathbf{c}}^{T} \otimes {\mathbf{a}})\text{vec} ({\mathbf{b}}) \) (Kathrin 2004), where \( \otimes \) is the Kronecker product, one can rewrite Eq. (17) as:
which is equal to Eq. (8).
Appendix B
Considering Eq. (6), one should maximize the following equation under the normalization constraint:
Using the circular property of trace, one may write:
Using the equality \( \text{tr} ({\mathbf{a}}^{T} {\mathbf{b}}) = \text{vec} ({\mathbf{a}})^{T} \text{vec} ({\mathbf{b}}) \) (Kathrin 2004), one can rewrite Eq. (20) as:
Using the equality \( \text{vec} ({\mathbf{abc}}) = ({\mathbf{c}}^{T} \otimes {\mathbf{a}})\text{vec} ({\mathbf{b}}) \) (Kathrin 2004) and defining \( {\mathbf{A}} = 4{\mathbf{UPU}}^{T} \), \( {\mathbf{B}} = 2\gamma {\mathbf{Q}}^{T} {\mathbf{Q}} \), and \( {\mathbf{C}} = {\mathbf{UU}}^{T} \), Eq. (21) can be rewritten as:
Due to the fact that \( \text{vec} ({\mathbf{T}}) \) is the eigenvector of \( {\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}} \) (see Eq. (8)), one can conclude:
Using the equality \( \text{tr} ({\mathbf{a}}^{T} {\mathbf{b}}) = \text{vec} ({\mathbf{a}})^{T} \text{vec} ({\mathbf{b}}) \) and considering the normalization constraint i.e. \( \varvec{tr(}{\mathbf{TT}}^{T} \varvec{) = 1} \), yield:
So, the maximum of Eq. (19) is obtained if \( \text{vec} ({\mathbf{T}}) \) is the eigenvector corresponding to the largest eigenvalue of \( {\mathbf{A}}^{T} \otimes {\mathbf{I}}_{m \times m} + {\mathbf{C}}^{T} \otimes {\mathbf{B}} \).
Rights and permissions
About this article
Cite this article
Shahdoosti, H.R., Javaheri, N. A new kernel fuzzy based feature extraction method using attraction points. Multidim Syst Sign Process 30, 1009–1027 (2019). https://doi.org/10.1007/s11045-018-0592-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11045-018-0592-2