[lightGBM]使用LightGBM来实现恶意软件识别
原创
©著作权归作者所有:来自51CTO博客作者是念的原创作品,请联系作者获取转载授权,否则将追究法律责任
最近正好用树模型,所以正好整理一下相关的示例代码,方便大家进行后面的修改。我这里使用的数据集是恶意软件的表格数据,下载地址为:
https:///chihebchebbi/Mastering-Machine-Learning-for-Penetration-Testing/blob/master/Chapter03/MalwareData.csv.gz
下载完数据,解压放到data目录里面,然后使用随机森林的示例代码(lgb_demo.py)为:
# build the lightgbm model
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import train_test_split
# view accuracy
from sklearn.metrics import accuracy_score
# pip install lightgbm
MalwareDataset = pd.read_csv('data/MalwareData.csv', sep='|')
Legit = MalwareDataset[0:41323].drop(['legitimate'], axis=1)
Malware = MalwareDataset[41323::].drop(['legitimate'], axis=1)
print('The Number of important features is %i \n' % Legit.shape[1])
Data = MalwareDataset.drop(['Name', 'md5', 'legitimate'], axis=1).values
Target = MalwareDataset['legitimate'].values
X_train, X_test, y_train, y_test = train_test_split(Data, Target ,test_size=0.2)
clf = lgb.LGBMClassifier()
clf.fit(X_train, y_train)
# predict the results
y_pred=clf.predict(X_test)
accuracy=accuracy_score(y_pred, y_test)
print('LightGBM Model accuracy score: {0:0.4f}'.format(accuracy_score(y_test, y_pred)))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
LightGBM的速度很快,下面是输出结果:
The Number of important features is 56
LightGBM Model accuracy score: 0.9935
precision recall f1-score support
0 1.00 0.99 1.00 19296
1 0.99 0.99 0.99 8314
accuracy 0.99 27610
macro avg 0.99 0.99 0.99 27610
weighted avg 0.99 0.99 0.99 27610
准确率还挺高的。
参考文献
LightGBM Classifier in Python