Abstract
This paper proposes a new k Nearest Neighbor (kNN) algorithm based on sparse learning, so as to overcome the drawbacks of the previous kNN algorithm, such as the fixed k value for each test sample and the neglect of the correlation of samples. Specifically, the paper reconstructs test samples by training samples to learn the optimal k value for each test sample, and then uses kNN algorithm with the learnt k value to conduct all kinds of tasks, such as classification, regression, and missing value imputation. The rationale of the proposed method is that different test samples should be assigned different k values in kNN algorithm, and learning the optimal k value for each test sample should be taken the correlation of data into account. To this end, in the reconstruction process, the proposed method is designed to achieve the minimal reconstruction error via a least square loss function, and employ an ℓ1-norm regularization term to create the element-wise sparsity in the reconstruction coefficient, i.e., sparsity appearing in the element of the coefficient matrix. For achieving effectiveness, the Locality Preserving Projection (LPP) is employed to keep the local structures of data. Finally, the experimental results on real datasets, and the experimental results show that the proposed kNN algorithm is better than the state-of-the-art algorithms in terms of different learning tasks, such as classification, regression, and missing value imputation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Burba, F., Ferraty, F., Vieu, P.: k-nearest neighbour method in functional nonparametric regression. Journal of Nonparametric Statistics 21(4), 453–469 (2009)
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Ferraty, F., Vieu, P.: Nonparametric functional data analysis: theory and practice (2006)
Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R.: Neighbourhood components analysis. In: NIPS (2004)
He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)
Kang, P., Cho, S.: Locally linear reconstruction for instance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)
Lall, U., Sharma, A.: A nearest neighbor bootstrap for resampling hydrologic time series. Water Resources Research 32(3), 679–693 (1996)
Liu, H., Zhang, S., Zhao, J., Zhao, X., Mo, Y.: A new classification algorithm using mutual nearest neighbors. In: GCC, pp. 52–57 (2010)
Meesad, P., Hengpraprohm, K.: Combination of knn-based feature selection and knnbased missing-value imputation of microarray data. In: ICICIC, pp. 341–341 (2008)
Qin, Y., Zhang, S., Zhu, X., Zhang, J., Zhang, C.: Semi-parametric optimization for missing data imputation. Applied Intelligence 27(1), 79–88 (2007)
Qin, Z., Wang, A.T., Zhang, C., Zhang, S.: Cost-sensitive classification with k-nearest neighbors. In: Wang, M. (ed.) KSEM 2013. LNCS, vol. 8041, pp. 112–131. Springer, Heidelberg (2013)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS, pp. 1473–1480 (2005)
Wu, X., Zhang, C., Zhang, S.: Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems (TOIS) 22(3), 381–405 (2004)
Wu, X., Zhang, C., Zhang, S.: Database classification for multi-database mining. Information Systems 30(1), 71–88 (2005)
Wu, X., Zhang, S.: Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 15(2), 353–367 (2003)
Zhang, C., Zhu, X., Zhang, J., Qin, Y., Zhang, S.: GBKII: An imputation method for missing values. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1080–1087. Springer, Heidelberg (2007)
Zhang, S.: Cost-sensitive classification with respect to waiting cost. Knowledge-Based Systems 23(5), 369–378 (2010)
Zhang, S.: Estimating semi-parametric missing values with iterative imputation. International Journal of Data Warehousing and Mining 6(3), 1–10 (2010)
Zhang, S.: KNN-CF approach: Incorporating certainty factor to knn classification. IEEE Intelligent Informatics Bulletin 11(1), 24–33 (2010)
Zhang, S.: Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35(1), 123–133 (2011)
Zhang, S.: Decision tree classifiers sensitive to heterogeneous costs. Journal of Systems and Software 85(4), 771–779 (2012)
Zhang, S.: Nearest neighbor selection for iteratively knn imputation. Journal of Systems and Software 85(11), 2541–2552 (2012)
Zhang, S., Jin, Z., Zhu, X.: Missing data imputation by utilizing information within incomplete instances. Journal of Systems and Software 84(3), 452–459 (2011)
Zhang, S., Jin, Z., Zhu, X., Zhang, J.: Missing data analysis: A kernel-based multi-imputation approach. In: Gavrilova, M.L., Tan, C.J.K. (eds.) Transactions on Computational Science III. LNCS, vol. 5300, pp. 122–142. Springer, Heidelberg (2009)
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)
Zhao, Y., Zhang, S.: Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Transactions on Knowledge and Data Engineering 18(2), 231–244 (2006)
Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.T.: Sparse hashing for fast multimedia search. ACM Transactions on Information Systems 31(2), 9 (2013)
Zhu, X., Huang, Z., Cui, J., Shen, H.T.: Video-to-shot tag propagation by graph sparse group lasso. IEEE Transactions on Multimedia 15(3), 633–646 (2013)
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)
Zhu, X., Huang, Z., Tao Shen, H., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognition 45(8), 3003–3016 (2012)
Zhu, X., Huang, Z., Yang, Y., Tao Shen, H., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognition 46(1), 215–229 (2013)
Zhu, X., Suk, H.-I., Shen, D.: Matrix-similarity based loss function and feature selection for alzheimer’s disease diagnosis. In: CVPR, pp. 3089–3096 (2014)
Zhu, X., Suk, H.-I., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in ad diagnosis. NeuroImage (2014)
Zhu, X., Zhang, L., Huang, Z.: A sparse embedding and least variance encoding approach to hashing. IEEE Transactions on Image Processing 23(9), 3737–3750 (2014)
Zhu, X., Zhang, S., Jin, Z., Zhang, Z., Xu, Z.: Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23(1), 110–121 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M. (2014). kNN Algorithm with Data-Driven k Value. In: Luo, X., Yu, J.X., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2014. Lecture Notes in Computer Science(), vol 8933. Springer, Cham. https://doi.org/10.1007/978-3-319-14717-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-14717-8_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14716-1
Online ISBN: 978-3-319-14717-8
eBook Packages: Computer ScienceComputer Science (R0)