python数据分析删除某些行 python删除前n行数据

转载

mob64ca14163a4f 2023-08-11 19:14:06

文章标签 python数据分析删除某些行 python 列表索引 numpy 文章分类 Python 后端开发

自定义索引读取
如果自定义的索引是纯数字就不能用0-n的索引读取了用自定义的索引
单点读取
区间读取 Series(start,end,step)
创建后再更改 index 也是可以的
默认行索引列索引都是 0-n
规定行列索引
我们可以更改列表索引的顺序
读取行
列已经存在则修改
取值 2019
修改
创建时给相同索引
直接把spe_s2中的值赋给新列
在原数据上删除 new5

需要导入的模块

import numpy as np
import pandas as pd
from pandas import Series,DataFrame

1、Series

pandas.Series（）返回一个有 index 和 values 属性的数据对象

s = pd.Series([4,5,-7,3])
s

0    4
1    5
2   -7
3    3
dtype: int64

s.index

RangeIndex(start=0, stop=4, step=1)

s.values

array([ 4,  5, -7,  3], dtype=int64)

自定义索引

#s1 = Series( [4,7,6,5],index=['a','b','c','d'] ,dtype = float)
s1 = Series( [4,7,6,5],index=['a','b','c','d'] ,dtype = float)
s1

a    4.0
b    7.0
c    6.0
d    5.0
dtype: float64

s1.index

Index(['a', 'b', 'c', 'd'], dtype='object')

读取 Series 1- 使用 0 - n 的索引
2- 自定义的索引
#数字索引读取

s1[0]

4.0

自定义索引读取

s1['a']

4.0

如果自定义的索引是纯数字就不能用0-n的索引读取了用自定义的索引

s2 = Series( [4,7,6,5],index=[3,5,7,9] ,dtype = float)
s2

3    4.0
5    7.0
7    6.0
9    5.0
dtype: float64

s2[3]

4.0

Series.loc[‘自定义的索引’] || Series.iloc[0-n的数字]

s2.loc[7]

6.0

s2.iloc[2]

6.0

s1

a    4.0
b    7.0
c    6.0
d    5.0
dtype: float64

s1.loc['e']=8
s1

a    4.0
b    7.0
c    6.0
d    5.0
e    8.0
dtype: float64

xxx.loc['yy]=new 修改或增加

s1['a'] = 40
s1

a    40.0
b     7.0
c     6.0
d     5.0
e     8.0
dtype: float64

读取多个值 ,会得到一个新对象

s1
a    40.0
b     7.0
c     6.0
d     5.0
e     8.0
dtype: float64

单点读取

s1[ ['a','c'] ]

a    40.0
c     6.0
dtype: float64

区间读取 Series(start,end,step)

s1['a':'c']

a    40.0
b     7.0
c     6.0
dtype: float64

s1['a'::2]

a    40.0
c     6.0
e     8.0
dtype: float64

通过字典数据得到Series key
会变成索引 values变成值

build_price ={'beijing':68000,'shanghai':54000,'guangzhou':35000,'shenzhen':72000}
s3 = Series(build_price)
s3

beijing      68000
shanghai     54000
guangzhou    35000
shenzhen     72000
dtype: int64

给定index 的值，他会进行匹配，按照给定的顺序显示数据，匹不上则

index4 = ['beijing','guangzhou','shanghai','shenzhen']
s4 = Series(build_price,index=index4)
s4

beijing      68000
guangzhou    35000
shanghai     54000
shenzhen     72000
dtype: int64

Series(build_price,index = ['beijing','guangzhou','shanghai','shenzheng','haerbin']
)

beijing      68000.0
guangzhou    35000.0
shanghai     54000.0
shenzheng        NaN
haerbin          NaN
dtype: float64

创建后再更改 index 也是可以的

s3.index = list('abcd')
s3

a    68000
b    54000
c    35000
d    72000
dtype: int64

2、Dataframe 表格型数据，就像 excel 有行有列
2.1 numpy的2维数组转为DataFrame

默认行索引列索引都是 0-n

df = DataFrame(np.arange(10,22).reshape(3,4))
df

0	1	2	3
0	10	11	12	13
1	14	15	16	17
2	18	19	20	21

规定行列索引

df1 = DataFrame(np.arange(10,22).reshape(3,4) ,index=list('abc'),columns='one,two,three,four'.split(',') )
df1

one	two	three	four
a	10	11	12	13
b	14	15	16	17
c	18	19	20	21

2.2 通过字典创建 DataFrame

df_dict = {
    'city':'北京,上海,广州,深圳,台北'.split(','),
    'price':(68000,54000,35000,72000,50000),
    'year':np.arange(2015,2020)
}
df2 = DataFrame(df_dict)
df2

city	price	year
0	北京	68000	2015
1	上海	54000	2016
2	广州	35000	2017
3	深圳	72000	2018
4	台北	50000	2019

df2.values

array([['北京', 68000, 2015],
       ['上海', 54000, 2016],
       ['广州', 35000, 2017],
       ['深圳', 72000, 2018],
       ['台北', 50000, 2019]], dtype=object)

df2.index

RangeIndex(start=0, stop=5, step=1)

df2.columns

Index(['city', 'price', 'year'], dtype='object')

我们可以更改列表索引的顺序

DataFrame(df2,columns=['year','city','price'])

year	city	price
0	2015	北京	68000
1	2016	上海	54000
2	2017	广州	35000
3	2018	深圳	72000
4	2019	台北	50000

读取 DataFrame
df[’’] 读取列可以进行修改和增加
df.列名称
df.loc[] 读取行

df2['city']

0    北京
1    上海
2    广州
3    深圳
4    台北
Name: city, dtype: object

df2.city

0    北京
1    上海
2    广州
3    深圳
4    台北
Name: city, dtype: object

type(df2['city'])

pandas.core.series.Series

df2['city'].name

'city'

读取行

df2.loc[1]

city        上海
price    54000
year      2016
Name: 1, dtype: object

增加新列
只能 df[‘columnName’] =值不能 df.xxx=

df2['new1']= 5
df2

city	price	year	new1
	0	北京	68000	2015	5
	1	上海	54000	2016	5
	2	广州	35000	2017	5
	3	深圳	72000	2018	5
	4	台北	50000	2019	5

df2['new2'] = np.arange(6)

列已经存在则修改

df2['new1'] = 100
df2

city	price	year	new1	new2
	0	北京	68000	2015	100	0
	1	上海	54000	2016	100	1
	2	广州	35000	2017	100	2
	3	深圳	72000	2018	100	3
	4	台北	50000	2019	100	4
	 def	行	行	行	行	5

行增加与修改行 df.loc[] = xxx

df2

city	price	year	new1	new2
	0	北京	68000	2015	100	0
	1	上海	54000	2016	100	1
	2	广州	35000	2017	100	2
	3	深圳	72000	2018	100	3
	4	台北	50000	2019	100	4
	def	行	行	行	行	5

df2.iloc[-1]

city     行
price    行
year     行
new1     行
new2     5
Name: def, dtype: object

len(df2)

df2.index.values.size

读取单一值 df2.loc[行，列]

df2.loc[2][2]

取值 2019

df2.year[4]

df2.loc[4]['year']

修改

df2.loc[4,'year'] =2000
df2

city	price	year	new1	new2
	0	北京	68000	2015	100	0
	1	上海	54000	2016	100	1
	2	广州	35000	2017	100	2
	3	深圳	72000	2018	100	3
	4	台北	50000	2000	100	4
	def	行	行	行	行	5

当我们想增加新列，我们如果直接给数字或者是数组的时候，行索引是自动匹配的但是如果我们给的数据是已经有索引的Series.而 Series是有索引的，那么就会进行索引匹配，能匹配上则给值进去，不能则没有

spe_s = Series(np.arange(6)*3 ,index=[0,1,54,3,7,'new'])
spe_s

0       0
1       3
54      6
3       9
7      12
new    15
dtype: int32

df2['new4'] = spe_s
df2

city	price	year	new1	new2	new4
	0	北京	68000	2015	100	0	0.0
	1	上海	54000	2016	100	1	3.0
	2	广州	35000	2017	100	2	NaN
	3	深圳	72000	2018	100	3	9.0
	4	台北	50000	2000	100	4	NaN
	def	行	行	行	行	5	NaN

创建时给相同索引

spe_s2 = Series(np.arange(6)*5 ,index=df2.index)
spe_s2

0       0
1       5
2      10
3      15
4      20
def    25
dtype: int32

df2['new5'] = spe_s2
df2

city	price	year	new1	new2	new4	new5
	0	北京	68000	2015	100	0	0.0	0
	1	上海	54000	2016	100	1	3.0	5
	2	广州	35000	2017	100	2	NaN	10
	3	深圳	72000	2018	100	3	9.0	15
	4	台北	50000	2000	100	4	NaN	20
	def	行	行	行	行	5	NaN	25

直接把spe_s2中的值赋给新列

df2['new6'] = spe_s2.values
df2

city	price	year	new1	new2	new4	new5	new6
	0	北京	68000	2015	100	0	0.0	0	0
	1	上海	54000	2016	100	1	3.0	5	5
	2	广州	35000	2017	100	2	NaN	10	10
	3	深圳	72000	2018	100	3	9.0	15	15
	4	台北	50000	2000	100	4	NaN	20	20
	def	行	行	行	行	5	NaN	25	25

删除 del df[列名称] 删除列

def
del df2['new6']
df2

city	price	year	new1	new2	new4	new5
	0	北京	68000	2015	100	0	0.0	0
	1	上海	54000	2016	100	1	3.0	5
	2	广州	35000	2017	100	2	NaN	10
	3	深圳	72000	2018	100	3	9.0	15
	4	台北	50000	2000	100	4	NaN	20
	def	行	行	行	行	5	NaN	25

df.drop（label ， axis=0, inplace = False） ```
label 具体名称
axis 控制按照还是行还是列
inplace 是否在原值上删除

df2.drop('def')

city	price	year	new1	new2	new4	new5
	0	北京	68000	2015	100	0	0.0	0
	1	上海	54000	2016	100	1	3.0	5
	2	广州	35000	2017	100	2	NaN	10
	3	深圳	72000	2018	100	3	9.0	15
	4	台北	50000	2000	100	4	NaN	20

df2

city	price	year	new1	new2	new4	new5
	0	北京	68000	2015	100	0	0.0	0
	1	上海	54000	2016	100	1	3.0	5
	2	广州	35000	2017	100	2	NaN	10
	3	深圳	72000	2018	100	3	9.0	15
	4	台北	50000	2000	100	4	NaN	20
	def	行	行	行	行	5	NaN	25

在原数据上删除 new5

df2.drop('new5',inplace = True ,axis=1)
df2

city	price	year	new1	new2	new4
	0	北京	68000	2015	100	0	0.0
	1	上海	54000	2016	100	1	3.0
	2	广州	35000	2017	100	2	NaN
	3	深圳	72000	2018	100	3	9.0
	4	台北	50000	2000	100	4	NaN
	def	行	行	行	行	5	NaN

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：pyspark collect_list pyspark collect_list 全局排序

下一篇：虚拟化的基本概念虚拟化的主要内容包括

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python数据分析删除某些行 python删除前n行数据

python数据分析删除某些行 python删除前n行数据

目录

需要导入的模块

1、Series

自定义索引读取

如果自定义的索引是纯数字就不能用0-n的索引读取了用自定义的索引

单点读取

区间读取 Series(start,end,step)

创建后再更改 index 也是可以的

默认行索引列索引都是 0-n

规定行列索引

我们可以更改列表索引的顺序

读取行

列已经存在则修改

取值 2019

修改

创建时给相同索引

直接把spe_s2中的值赋给新列

在原数据上删除 new5

51CTO博客

python数据分析删除某些行 python删除前n行数据

python数据分析删除某些行 python删除前n行数据

目录

需要导入的模块

1、Series

自定义索引读取

如果自定义的索引 是纯数字 就不能用0-n的索引读取了 用自定义的索引

单点 读取

区间读取 Series(start,end,step)

创建后再更改 index 也是 可以的

默认 行索引 列索引 都是 0-n

规定行列索引

我们可以更改列表索引的顺序

读取行

列已经存在 则修改

取值 2019

修改

创建时给相同索引

直接把spe_s2中的值 赋给新列

在原数据上 删除 new5

51CTO博客

如果自定义的索引是纯数字就不能用0-n的索引读取了用自定义的索引

单点读取

创建后再更改 index 也是可以的

默认行索引列索引都是 0-n

列已经存在则修改

直接把spe_s2中的值赋给新列

在原数据上删除 new5