python 支持向量机代码 python支持向量机参数

转载

香奈儿 2023-11-17 21:02:09

文章标签 python 支持向量机代码机器学习 svm 支持向量机 python 文章分类 Python 后端开发

支持向量机算法

1 概述
2 算法特点
3 算法原理

3.1 距离计算
3.2 分类器的求解优化

3.2.1 要优化的目标
3.2.2 目标函数

3.3 软间隔最大化
3.4 核函数

4 总结
5、python实现

1 概述

支持向量机（support vector machines,SVM）主要作为一种二分类模型。它的强大之处在于既可以用作线性分类器又可以作为非线性分类器。

2 算法特点

优点：泛化错误率低，计算开销不大，结果容易解释。
缺点：对参数调节和核函数的选择敏感，原始分类器不加修改仅适用于二分类问题。
适用数据类型：数值型和标称型数据。

3 算法原理

对于给定的训练数据集，对于线性可分的情况，就是找到一条决策边界使其距离最近的两个不同类别点最远；线性不可分时，主要区别是先将数据通过变换函数（映射函数）映射到新的空间，然后在新空间种使用线性分类。

python 支持向量机代码 python支持向量机参数_支持向量机

3.1 距离计算

$python 支持向量机代码 python支持向量机参数_支持向量机_02$ ,那么如何求取其到某个点的距离呢？举个例子求已知点 $python 支持向量机代码 python支持向量机参数_支持向量机_03$ 到 $python 支持向量机代码 python支持向量机参数_机器学习_04$ 的距离，可以求得距离为 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_05$ ，扩展到n维空间的分隔面可以写成 $python 支持向量机代码 python支持向量机参数_机器学习_06$ ，要计算点A到分隔超平面的距离，就必须给出点到分隔面的法线或者垂线的距离，其值为 $python 支持向量机代码 python支持向量机参数_机器学习_07$

3.2 分类器的求解优化

$python 支持向量机代码 python支持向量机参数_支持向量机_08$
当 $python 支持向量机代码 python支持向量机参数_支持向量机_09$ 为正例的时候 $python 支持向量机代码 python支持向量机参数_python_10$ ，当 $python 支持向量机代码 python支持向量机参数_支持向量机_09$ 为负例的时候 $python 支持向量机代码 python支持向量机参数_python_12$ 。决策方程为： $python 支持向量机代码 python支持向量机参数_python_13$ ，可推出

$python 支持向量机代码 python支持向量机参数_python_14$

3.2.1 要优化的目标

找到一条线，是距离该线最近的不同分类点最远
点到直线的距离：因 $python 支持向量机代码 python支持向量机参数_svm_15$ ，则 $python 支持向量机代码 python支持向量机参数_支持向量机_16$ 可写成 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_17$

3.2.2 目标函数

优化的目标： $python 支持向量机代码 python支持向量机参数_机器学习_18$
首先我们知道 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_19$ ，通过放缩可以使 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_20$ ，在此条件下我们只需考虑 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_21$ ，使其值最大。为了方便求解和化简，目标函数转成就最小值问题 $python 支持向量机代码 python支持向量机参数_机器学习_22$ 。
接下来就是求解目标函数：
$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_23$
$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_24$
求解目标函数需要用到拉格朗日乘子法
约束条件的优化问题：
$python 支持向量机代码 python支持向量机参数_支持向量机_25$
约束条件：
$python 支持向量机代码 python支持向量机参数_机器学习_26$ $python 支持向量机代码 python支持向量机参数_机器学习_27$ 拉格朗日函数为：
$python 支持向量机代码 python支持向量机参数_python_28$
套用拉格朗日函数可得：
约束条件： $python 支持向量机代码 python支持向量机参数_机器学习_29$ 即 $python 支持向量机代码 python支持向量机参数_支持向量机_30$
可得
$python 支持向量机代码 python支持向量机参数_python_31$

拉格朗日乘子法：
拉格朗日函数和KKT条件：
$python 支持向量机代码 python支持向量机参数_python_32$ 约束条件： $python 支持向量机代码 python支持向量机参数_python_33$
推广到多个约束条件：
$python 支持向量机代码 python支持向量机参数_支持向量机_25$
约束条件：
$python 支持向量机代码 python支持向量机参数_机器学习_26$ $python 支持向量机代码 python支持向量机参数_机器学习_27$
则拉格朗日函数为：
$python 支持向量机代码 python支持向量机参数_python_28$
引入的KKT条件为：

$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_38$ $python 支持向量机代码 python支持向量机参数_python_39$ $python 支持向量机代码 python支持向量机参数_python_40$
SVM求解
(1)根据拉格朗日的对偶性，那原始问题的对偶问题就是极大极小问题：
$python 支持向量机代码 python支持向量机参数_机器学习_41$
先求 $python 支持向量机代码 python支持向量机参数_svm_42$ 对 $python 支持向量机代码 python支持向量机参数_机器学习_43$ 的极小，在求对 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_44$ 的极大。
对式子求导
$python 支持向量机代码 python支持向量机参数_python_31$
得：
$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_46$ $python 支持向量机代码 python支持向量机参数_svm_47$
代入上述条件可求得：
$python 支持向量机代码 python支持向量机参数_python_48$
(2)求 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_49$ 对 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_44$ 得极大，即是对偶问题。
$python 支持向量机代码 python支持向量机参数_支持向量机_51$ $python 支持向量机代码 python支持向量机参数_支持向量机_52$
将目标函数由极大转换成求极小。
$python 支持向量机代码 python支持向量机参数_svm_53$ $python 支持向量机代码 python支持向量机参数_支持向量机_52$
假设通过上式求解到 $python 支持向量机代码 python支持向量机参数_机器学习_55$ ,则:
$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_46$ $python 支持向量机代码 python支持向量机参数_svm_57$
(4)示例
得到目标函数后，该如何求解呢？已知如下图所示的训练数据集：正例有 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_58$ ，负例是 $python 支持向量机代码 python支持向量机参数_支持向量机_59$
求解： $python 支持向量机代码 python支持向量机参数_svm_53$
约束条件：
$python 支持向量机代码 python支持向量机参数_svm_61$ $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_62$

python 支持向量机代码 python支持向量机参数_python_63

解：

$python 支持向量机代码 python支持向量机参数_svm_53$ $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_65$

由于： $python 支持向量机代码 python支持向量机参数_python_66$ ，化简可得：

$python 支持向量机代码 python支持向量机参数_svm_67$

对 $python 支持向量机代码 python支持向量机参数_python_68$ 求偏导，偏导等于0，可求得： $python 支持向量机代码 python支持向量机参数_机器学习_69$ ，已知 $python 支持向量机代码 python支持向量机参数_支持向量机_70$ ，不满足约束条件，所以最小值在边界上。当 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_71$ 时，对 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_72$ 求偏导可得 $python 支持向量机代码 python支持向量机参数_svm_73$ ；当 $python 支持向量机代码 python支持向量机参数_机器学习_74$ 时，对 $python 支持向量机代码 python支持向量机参数_机器学习_75$ 求偏导可得 $python 支持向量机代码 python支持向量机参数_python_76$ 。可知 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_77$ 时值最小，可得 $python 支持向量机代码 python支持向量机参数_svm_78$

将 $python 支持向量机代码 python支持向量机参数_svm_79$ ,带入求解 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_46$ $python 支持向量机代码 python支持向量机参数_支持向量机_81$

$python 支持向量机代码 python支持向量机参数_svm_57$ $python 支持向量机代码 python支持向量机参数_python_83$ $python 支持向量机代码 python支持向量机参数_支持向量机_84$

求得的平面方程为： $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_85$

可以看到 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_72$ 并未起到作用，也就是说分类界限只跟最近的不同类别点相关，当然这是线性完全可分得情况下。

python 支持向量机代码 python支持向量机参数_svm_87

3.3 软间隔最大化

在现实问题中，训练数据集往往时线性不可分的，即在样本中出现噪声或者异常点，此时需要引入软间隔最大化。
目标函数：
$python 支持向量机代码 python支持向量机参数_机器学习_88$ $python 支持向量机代码 python支持向量机参数_python_89$
其中 $python 支持向量机代码 python支持向量机参数_机器学习_90$ 为惩罚参数， $python 支持向量机代码 python支持向量机参数_机器学习_90$ 值越大，意味着分类要求越严格，误分类个数要越小； $python 支持向量机代码 python支持向量机参数_机器学习_90$ 值越大，意味着分类要求越宽松，误分类个数要求也相对较多。

3.4 核函数

对于线性分类问题，可以使用线性分类支持向量机进行分类，但对于非线性的问题，需要使用到核技巧，就是将低维线性不可分映射到高维，通过变换将非线性问题转换成线性问题。

python 支持向量机代码 python支持向量机参数_python 支持向量机代码_93

那么如何将低维数据变换成高维数据呢？这个就要用到变换函数，因为要计算内积，所以引入了核函数，常用的核函数有多项式核函数、高斯核函数。

多项式核函数：

$python 支持向量机代码 python支持向量机参数_python_94$

高斯核函数：

$python 支持向量机代码 python支持向量机参数_python 支持向量机代码_95$

核函数计算，假定 $python 支持向量机代码 python支持向量机参数_机器学习_96$ 为二维空间上的两点。核函数为 $python 支持向量机代码 python支持向量机参数_机器学习_97$ ，先将数据由二维映射到三维，变换函数为： $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_98$

通过变换函数：

$python 支持向量机代码 python支持向量机参数_支持向量机_99$ $python 支持向量机代码 python支持向量机参数_svm_100$ $python 支持向量机代码 python支持向量机参数_python_101$ $python 支持向量机代码 python支持向量机参数_机器学习_102$ $python 支持向量机代码 python支持向量机参数_svm_103$

此时我们还发现计算复杂度为 $python 支持向量机代码 python支持向量机参数_svm_104$ ,如果二维转成四维，计算花费为 $python 支持向量机代码 python支持向量机参数_支持向量机_105$ ，我们发现对于 $python 支持向量机代码 python支持向量机参数_python_106$ 的计算结果时在 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_107$ 基础上加了平方，实际内积计算中可以先计算 $python 支持向量机代码 python支持向量机参数_python 支持向量机代码_107$ 低维内积，再计算指数次方，这样计算花费为 $python 支持向量机代码 python支持向量机参数_svm_104$ 而不是 $python 支持向量机代码 python支持向量机参数_支持向量机_105$ ,大大提高了计算效率。

4 总结

支持向量机决策边界是距离最近不同类别样本点最远的时候最佳。如果样本数据本身线性不可分，可以尝试变换到更高维度。计算时可以先计算低维，再计算高维，可以提高计算效率。

5、python实现

git地址：https://github.com/lingxiaaisuixin/MarchineLearning/tree/master/SVM

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：scrapy爬取数据存入mysql scrapy爬虫爬取表格

下一篇：android usb刷卡请求手机连接刷卡器失败

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯