判断数据集是否存在null、inf,快速定位所在列、行,方便分析原因
无穷大、无穷小处理
import pandas as pd
import numpy as np
# Create dataframe using dictionary
data = {'Student ID': [10, 11, 12, 13, 14], 'Age': [
23, 22, 24, 22, np.nan], 'Weight': [66, 72, np.inf, 68, -np.inf]}
df = pd.DataFrame(data)
df
Student ID | Age | Weight | |
0 | 10 | 23.0 | 66.0 |
1 | 11 | 22.0 | 72.0 |
2 | 12 | 24.0 | inf |
3 | 13 | 22.0 | 68.0 |
4 | 14 | NaN | -inf |
是否存在inf
这里null也是非finite的
d = np.isfinite(df)
display(d)
Student ID | Age | Weight | |
0 | True | True | True |
1 | True | True | True |
2 | True | True | False |
3 | True | True | True |
4 | True | False | False |
都是有限数据
np.all(np.isfinite(df))
False
存在有限数据
np.any(np.isfinite(df))
True
定位行列
定位所在列
np.isfinite(df).sum()
Student ID 5
Age 4
Weight 3
dtype: int64
np.isfinite(df).all()
Student ID True
Age False
Weight False
dtype: bool
定位所在行
np.isfinite(df).T.all()
0 True
1 True
2 False
3 True
4 False
dtype: bool
df[list(np.isfinite(df).all()[np.isfinite(df).all() == False].index)].loc[list(
np.isfinite(df).T.all()[np.isfinite(df).T.all() == False].index), :]
Age | Weight | |
2 | 24.0 | inf |
4 | NaN | -inf |
这样找到无穷大值所在的行列,颇为麻烦
na值处理
df.isna().sum()
Student ID 0
Age 1
Weight 0
dtype: int64
df.isnull().sum()
Student ID 0
Age 1
Weight 0
dtype: int64
同样处理逻辑,直接看代码
np.isnan(df)
Student ID | Age | Weight | |
0 | False | False | False |
1 | False | False | False |
2 | False | False | False |
3 | False | False | False |
4 | False | True | False |
df[list(np.isnan(df).any()[np.isnan(df).any() == True].index)].loc[list(
np.isnan(df).T.any()[np.isnan(df).T.any() == True].index), :]