Py之imblearn:imblearn/imbalanced-learn库的简介、安装、使用方法之详细攻略
目录
imblearn/imbalanced-learn库的使用方法
imblearn/imbalanced-learn库的简介
imblearn/imbalanced-learn是一个python包,它提供了许多重采样技术,常用于显示强烈类间不平衡的数据集中。它与scikit learn兼容,是 scikit-learn-contrib 项目的一部分。
在python3.6+下测试了imbalanced-learn。依赖性要求基于上一个scikit学习版本:
- scipy(>=0.19.1)
- numpy(>=1.13.3)
- scikit-learn(>=0.22)
- joblib(>=0.11)
- keras 2 (optional)
- tensorflow (optional)
imblearn/imbalanced-learn库的安装
pip install imblearn
pip install imbalanced-learn
pip install -U imbalanced-learn
conda install -c conda-forge imbalanced-learn
imblearn/imbalanced-learn库的使用方法
大多数分类算法只有在每个类的样本数量大致相同的情况下才能达到最优。高度倾斜的数据集,其中少数被一个或多个类大大超过,已经证明是一个挑战,但同时变得越来越普遍。
解决这个问题的一种方法是通过重新采样数据集来抵消这种不平衡,希望得到一个比其他方法更健壮和公平的决策边界。
Re-sampling techniques are divided in two categories:
- Under-sampling the majority class(es).
- Over-sampling the minority class.
- Combining over- and under-sampling.
- Create ensemble balanced sets.
Below is a list of the methods currently implemented in this module.
-
Under-sampling
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links [1]
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3) [2]
- Condensed Nearest Neighbour [3]
- One-Sided Selection [4]
- Neighboorhood Cleaning Rule [5]
- Edited Nearest Neighbours [6]
- Instance Hardness Threshold [7]
- Repeated Edited Nearest Neighbours [14]
- AllKNN [14]
-
Over-sampling
- Random minority over-sampling with replacement
- SMOTE - Synthetic Minority Over-sampling Technique [8]
- SMOTENC - SMOTE for Nominal Continuous [8]
- bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2 [9]
- SVM SMOTE - Support Vectors SMOTE [10]
- ADASYN - Adaptive synthetic sampling approach for imbalanced learning [15]
- KMeans-SMOTE [17]
-
Over-sampling followed by under-sampling
-
Ensemble classifier using samplers internally
- Mini-batch resampling for Keras and Tensorflow