机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化

目录

  • ​​机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化​​
  • ​​基本介绍​​
  • ​​程序设计​​
  • ​​参数描述​​
  • ​​学习小结​​
  • ​​参考资料​​

基本介绍

MATLAB实现KNN的内置函数是fitcknn,本文讲解参数优化相关。

程序设计

  • 这个例子展示了如何使用 fitcknn 自动优化超参数, 该示例使用 Fisher iris 数据,加载数据。
  • 通过使用自动超参数优化,找到最小化五倍交叉验证损失的超参数。
  • 为了重现性,设置随机种子并使用“预期改进加”采集功能。
load fisheriris
X = meas;
Y = species;

rng(1)
Mdl = fitcknn(X,Y,'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName','expected-improvement-plus'))
  • 运行结果
|=====================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | NumNeighbors | Distance |
| | result | | runtime | (observed) | (estim.) | | |
|=====================================================================================================|
| 1 | Best | 0.026667 | 0.63205 | 0.026667 | 0.026667 | 30 | cosine |
| 2 | Accept | 0.04 | 0.29254 | 0.026667 | 0.027197 | 2 | chebychev |
| 3 | Accept | 0.19333 | 0.42726 | 0.026667 | 0.030324 | 1 | hamming |
| 4 | Accept | 0.33333 | 0.34219 | 0.026667 | 0.033313 | 31 | spearman |
| 5 | Best | 0.02 | 0.40167 | 0.02 | 0.020648 | 6 | cosine |
| 6 | Accept | 0.073333 | 0.27227 | 0.02 | 0.023082 | 1 | correlation |
| 7 | Accept | 0.06 | 0.61176 | 0.02 | 0.020875 | 2 | cityblock |
| 8 | Accept | 0.04 | 0.29312 | 0.02 | 0.020622 | 1 | euclidean |
| 9 | Accept | 0.24 | 0.35135 | 0.02 | 0.020562 | 74 | mahalanobis |
| 10 | Accept | 0.04 | 0.31685 | 0.02 | 0.020649 | 1 | minkowski |
| 11 | Accept | 0.053333 | 0.40512 | 0.02 | 0.020722 | 1 | seuclidean |
| 12 | Accept | 0.19333 | 0.60398 | 0.02 | 0.020701 | 1 | jaccard |
| 13 | Accept | 0.04 | 0.28822 | 0.02 | 0.029203 | 1 | cosine |
| 14 | Accept | 0.04 | 0.52468 | 0.02 | 0.031888 | 75 | cosine |
| 15 | Accept | 0.04 | 0.24879 | 0.02 | 0.020076 | 1 | cosine |
| 16 | Accept | 0.093333 | 0.40397 | 0.02 | 0.020073 | 75 | euclidean |
| 17 | Accept | 0.093333 | 0.25712 | 0.02 | 0.02007 | 75 | minkowski |
| 18 | Accept | 0.1 | 0.32964 | 0.02 | 0.020061 | 75 | chebychev |
| 19 | Accept | 0.15333 | 0.18694 | 0.02 | 0.020044 | 75 | seuclidean |
| 20 | Accept | 0.1 | 0.31301 | 0.02 | 0.020044 | 75 | cityblock |
|=====================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | NumNeighbors | Distance |
| | result | | runtime | (observed) | (estim.) | | |
|=====================================================================================================|
| 21 | Accept | 0.033333 | 0.34193 | 0.02 | 0.020046 | 75 | correlation |
| 22 | Accept | 0.033333 | 0.35383 | 0.02 | 0.02656 | 9 | cosine |
| 23 | Accept | 0.033333 | 0.41993 | 0.02 | 0.02854 | 9 | cosine |
| 24 | Accept | 0.02 | 0.31331 | 0.02 | 0.028607 | 1 | chebychev |
| 25 | Accept | 0.02 | 0.3881 | 0.02 | 0.022264 | 1 | chebychev |
| 26 | Accept | 0.02 | 0.31359 | 0.02 | 0.021439 | 1 | chebychev |
| 27 | Accept | 0.02 | 0.42224 | 0.02 | 0.020999 | 1 | chebychev |
| 28 | Accept | 0.66667 | 0.39723 | 0.02 | 0.020008 | 75 | hamming |
| 29 | Accept | 0.04 | 0.3714 | 0.02 | 0.020008 | 12 | correlation |
| 30 | Best | 0.013333 | 0.35841 | 0.013333 | 0.013351 | 6 | euclidean |

机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化_数据


机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化_KNN_02

__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 67.0041 seconds
Total objective function evaluation time: 11.1825

Best observed feasible point:
NumNeighbors Distance
____________ _________

6 euclidean

Observed objective function value = 0.013333
Estimated objective function value = 0.013351
Function evaluation time = 0.35841

Best estimated feasible point (according to models):
NumNeighbors Distance
____________ _________

6 euclidean

Estimated objective function value = 0.013351
Estimated function evaluation time = 0.35395
Mdl =
ClassificationKNN
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: {'setosa' 'versicolor' 'virginica'}
ScoreTransform: 'none'
NumObservations: 150
HyperparameterOptimizationResults: [1x1 BayesianOptimization]
Distance: 'euclidean'
NumNeighbors: 6


Properties, Methods

参数描述

  • Tbl — 样本数据
  • ResponseVarName - 响应变量名称
  • formula - 响应变量和预测变量子集的解释模型
  • Y - 类标签
  • X - 预测数据
  • BreakTies — 打破平局算法
  • BucketSize - 节点中的最大数据点
  • Categorical Predictors - 分类预测器标志
  • ClassNames - 用于训练的类的名称
  • Cost - 错误分类的成本
  • Cov - 协方差矩阵
  • Distance - 距离度量
  • DistanceWeight - 距离加权函数
  • Exponent - Minkowski 距离指数
  • IncludeTies - 并列包含标志
  • NSMethod - 最近邻搜索方法
  • NumNeighbors - 要查找的最近邻居数
  • PredictorNames - 预测变量名称
  • Prior - 先验概率
  • ResponseName - 响应变量名称
  • Scale - 距离比例
  • ScoreTransform - 分数转换
  • Standardize - 用于标准化预测变量的标志
  • Weights - 观察权重
  • CrossVal - 交叉验证标志
  • CVPartition - 交叉验证分区
  • Holdout - 用于验证的数据部分
  • KFold - 折叠次数
  • Leaveout - 留一法交叉验证标志
  • OptimizeHyperparameters — Parameters to optimize
  • HyperparameterOptimizationOptions - 优化选项
  • Mdl - 训练的 k 最近邻分类模型

学习小结

对于KNN,我们还要解决另外两个问题,一个是K值的确定,另一个是距离的度量。如果数据之间线性关系比较明显的话,线性分类会优于KNN,如果数据之间线性关系不明显的话,KNN会比较好一些。但总得来说,这两个都是机器学习中最简单,最基础的模型。随着维度的上升,这两个模型,尤其是KNN会面临失效的问题,这就是所谓的维度灾