机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化
原创
©著作权归作者所有:来自51CTO博客作者机器学习之心的原创作品,请联系作者获取转载授权,否则将追究法律责任
机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化
目录
- 机器学习 | MATLAB实现KNN(K近邻)fitcknn参数优化
- 基本介绍
- 程序设计
- 参数描述
- 学习小结
- 参考资料
基本介绍
MATLAB实现KNN的内置函数是fitcknn,本文讲解参数优化相关。
程序设计
- 这个例子展示了如何使用 fitcknn 自动优化超参数, 该示例使用 Fisher iris 数据,加载数据。
- 通过使用自动超参数优化,找到最小化五倍交叉验证损失的超参数。
- 为了重现性,设置随机种子并使用“预期改进加”采集功能。
load fisheriris
X = meas;
Y = species;
rng(1)
Mdl = fitcknn(X,Y,'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName','expected-improvement-plus'))
|=====================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | NumNeighbors | Distance |
| | result | | runtime | (observed) | (estim.) | | |
|=====================================================================================================|
| 1 | Best | 0.026667 | 0.63205 | 0.026667 | 0.026667 | 30 | cosine |
| 2 | Accept | 0.04 | 0.29254 | 0.026667 | 0.027197 | 2 | chebychev |
| 3 | Accept | 0.19333 | 0.42726 | 0.026667 | 0.030324 | 1 | hamming |
| 4 | Accept | 0.33333 | 0.34219 | 0.026667 | 0.033313 | 31 | spearman |
| 5 | Best | 0.02 | 0.40167 | 0.02 | 0.020648 | 6 | cosine |
| 6 | Accept | 0.073333 | 0.27227 | 0.02 | 0.023082 | 1 | correlation |
| 7 | Accept | 0.06 | 0.61176 | 0.02 | 0.020875 | 2 | cityblock |
| 8 | Accept | 0.04 | 0.29312 | 0.02 | 0.020622 | 1 | euclidean |
| 9 | Accept | 0.24 | 0.35135 | 0.02 | 0.020562 | 74 | mahalanobis |
| 10 | Accept | 0.04 | 0.31685 | 0.02 | 0.020649 | 1 | minkowski |
| 11 | Accept | 0.053333 | 0.40512 | 0.02 | 0.020722 | 1 | seuclidean |
| 12 | Accept | 0.19333 | 0.60398 | 0.02 | 0.020701 | 1 | jaccard |
| 13 | Accept | 0.04 | 0.28822 | 0.02 | 0.029203 | 1 | cosine |
| 14 | Accept | 0.04 | 0.52468 | 0.02 | 0.031888 | 75 | cosine |
| 15 | Accept | 0.04 | 0.24879 | 0.02 | 0.020076 | 1 | cosine |
| 16 | Accept | 0.093333 | 0.40397 | 0.02 | 0.020073 | 75 | euclidean |
| 17 | Accept | 0.093333 | 0.25712 | 0.02 | 0.02007 | 75 | minkowski |
| 18 | Accept | 0.1 | 0.32964 | 0.02 | 0.020061 | 75 | chebychev |
| 19 | Accept | 0.15333 | 0.18694 | 0.02 | 0.020044 | 75 | seuclidean |
| 20 | Accept | 0.1 | 0.31301 | 0.02 | 0.020044 | 75 | cityblock |
|=====================================================================================================|
| Iter | Eval | Objective | Objective | BestSoFar | BestSoFar | NumNeighbors | Distance |
| | result | | runtime | (observed) | (estim.) | | |
|=====================================================================================================|
| 21 | Accept | 0.033333 | 0.34193 | 0.02 | 0.020046 | 75 | correlation |
| 22 | Accept | 0.033333 | 0.35383 | 0.02 | 0.02656 | 9 | cosine |
| 23 | Accept | 0.033333 | 0.41993 | 0.02 | 0.02854 | 9 | cosine |
| 24 | Accept | 0.02 | 0.31331 | 0.02 | 0.028607 | 1 | chebychev |
| 25 | Accept | 0.02 | 0.3881 | 0.02 | 0.022264 | 1 | chebychev |
| 26 | Accept | 0.02 | 0.31359 | 0.02 | 0.021439 | 1 | chebychev |
| 27 | Accept | 0.02 | 0.42224 | 0.02 | 0.020999 | 1 | chebychev |
| 28 | Accept | 0.66667 | 0.39723 | 0.02 | 0.020008 | 75 | hamming |
| 29 | Accept | 0.04 | 0.3714 | 0.02 | 0.020008 | 12 | correlation |
| 30 | Best | 0.013333 | 0.35841 | 0.013333 | 0.013351 | 6 | euclidean |
__________________________________________________________
Optimization completed.
MaxObjectiveEvaluations of 30 reached.
Total function evaluations: 30
Total elapsed time: 67.0041 seconds
Total objective function evaluation time: 11.1825
Best observed feasible point:
NumNeighbors Distance
____________ _________
6 euclidean
Observed objective function value = 0.013333
Estimated objective function value = 0.013351
Function evaluation time = 0.35841
Best estimated feasible point (according to models):
NumNeighbors Distance
____________ _________
6 euclidean
Estimated objective function value = 0.013351
Estimated function evaluation time = 0.35395
Mdl =
ClassificationKNN
ResponseName: 'Y'
CategoricalPredictors: []
ClassNames: {'setosa' 'versicolor' 'virginica'}
ScoreTransform: 'none'
NumObservations: 150
HyperparameterOptimizationResults: [1x1 BayesianOptimization]
Distance: 'euclidean'
NumNeighbors: 6
Properties, Methods
参数描述
- Tbl — 样本数据
- ResponseVarName - 响应变量名称
- formula - 响应变量和预测变量子集的解释模型
- Y - 类标签
- X - 预测数据
- BreakTies — 打破平局算法
- BucketSize - 节点中的最大数据点
- Categorical Predictors - 分类预测器标志
- ClassNames - 用于训练的类的名称
- Cost - 错误分类的成本
- Cov - 协方差矩阵
- Distance - 距离度量
- DistanceWeight - 距离加权函数
- Exponent - Minkowski 距离指数
- IncludeTies - 并列包含标志
- NSMethod - 最近邻搜索方法
- NumNeighbors - 要查找的最近邻居数
- PredictorNames - 预测变量名称
- Prior - 先验概率
- ResponseName - 响应变量名称
- Scale - 距离比例
- ScoreTransform - 分数转换
- Standardize - 用于标准化预测变量的标志
- Weights - 观察权重
- CrossVal - 交叉验证标志
- CVPartition - 交叉验证分区
- Holdout - 用于验证的数据部分
- KFold - 折叠次数
- Leaveout - 留一法交叉验证标志
- OptimizeHyperparameters — Parameters to optimize
- HyperparameterOptimizationOptions - 优化选项
- Mdl - 训练的 k 最近邻分类模型
学习小结
对于KNN,我们还要解决另外两个问题,一个是K值的确定,另一个是距离的度量。如果数据之间线性关系比较明显的话,线性分类会优于KNN,如果数据之间线性关系不明显的话,KNN会比较好一些。但总得来说,这两个都是机器学习中最简单,最基础的模型。随着维度的上升,这两个模型,尤其是KNN会面临失效的问题,这就是所谓的维度灾