一、进程池
1、 每开启进程,开启属于这个进程的内存空间(寄存器 堆栈 文件),进程越多操作系统的调度越频繁
2、进程池:
(1)python中的 先创建一个属于进程的池子
(2)这个池子指定能存放多少个进程
(3)先将这些进程建好
3、更高级的进程池(有上下限)
from multiprocessing import Pool,Process
import time
def func(n):
for i in range(10):
print(n+1)
if __name__ == '__main__':
start = time.time()
pool = Pool(5) #有五个进程的池子
pool.map(func,range(100)) #100个任务,传的参数必须是可迭代的,map:自带close和join方法
t1 = time.time() - start
start = time.time()
'''与进程池做对比同样100个任务'''
p_list = []
for i in range(100):
p = Process(target=func,args=(i,))
p_list.append(p)
p.start()
for p in p_list:p.join()
t2 = time.time() - start
print(t1,t2)
输出结果:
0.28822946548461914 5.494309902191162
4、p.apply
(同步),p.apply_async
(异步)调用
from multiprocessing import Pool
import os,time
def func(n):
print('start func%s'%n,os.getpid())
time.sleep(1)
print('end func%s' % n, os.getpid())
if __name__ == '__main__':
p = Pool(5)
for i in range(10):
# p.apply(func,args=(i,)) #同步调用(不好)等待本次进程提交完再提交下一个任务
p.apply_async(func, args=(i,)) # 异步调用和主进程完全异步,需要手动close和join
p.close()
p.join()
5、获取返回值
from multiprocessing import Pool
import time
def func(i):
time.sleep(0.5)
return i*i
if __name__ == '__main__':
p = Pool(5)
ret_lst = []
for i in range(10):
# res = p.apply(func,args=(i,)) #apply的结果就是func的返回值
res = p.apply_async(func, args=(i,)) # apply的结果就是func的返回值
ret_lst.append(res)
for res in ret_lst:print(res.get()) #get():阻塞等待结果
输出结果:
0
1
4
9
16
25
36
49
64
81
map()
返回值
from multiprocessing import Pool
import time
def func(i):
time.sleep(0.5)
return i*i
if __name__ == '__main__':
p = Pool(5)
ret = p.map(func,range(10))
print(ret)
输出结果:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
6、回调函数
import os
from multiprocessing import Pool
def func1(n):
print('in func1',os.getpid())
return n*n
def func2(nn):
print('in func2',os.getpid())
print(nn)
if __name__ == '__main__':
p = Pool(5)
p.apply_async(func1,args=(10,),callback=func2) #fun1的返回值作为回调函func2的参数,然后执行func2
p.close()
p.join()
输出结果:
支持打印的话会打印(in func1 12333)
in func2 10296
100
二、复习
1、apply
:同步的:只有当func执行完后才会继续向下执行其他代码,ret = apply(func,args = ())返回值就是func的return
2、map
:自带close,和join,所有的结果是join
3、apply_async
:异步的,当func被注册进入一个程序之后,程序就继续向下执行,返回值:apply_async返回的对象obj,为了用户能从中获取func的返回值obj.get(),get会阻塞直到对应的func执行完毕拿到结果;使用apply_async给进程池分配任务;需要先close后join来保持多进程和主进程代码的同步性
4、回调函数:是在主进程中执行的
from multiprocessing import Pool
def func1(n):
return n+1
def func2(m):
print(m)
if __name__ == '__main__':
p = Pool(5)
for i in range(0,5):
p.apply_async(func1,args=(i,),callback=func2)
p.close()
p.join()
输出结果:
1
2
3
4
5
例1:回调函数爬取数据例子
import requests
from multiprocessing import Pool
def gets(url):
ret = requests.get(url)
if ret.status_code == 200:
return url,ret.content.decode('utf-8')
def call_back(args):
url,content = args
print(url,len(content))
if __name__ == '__main__':
url_list = [
'http://www.baidu.com',
'javascript:void(0)',
'javascript:void(0)',
'http://www.sohu.com/',
]
p = Pool(5)
for url in url_list:
p.apply_async(gets,args=(url,),callback=call_back)
p.close()
p.join()
输出结果:
http://www.baidu.com 2287
http://www.sohu.com/ 179481
例2:爬取电影信息例子
import re
from urllib.request import urlopen
from multiprocessing import Pool
def get_page(url,pattern):
response=urlopen(url).read().decode('utf-8')
return pattern,response
def parse_page(info):
pattern,page_content=info
res=re.findall(pattern,page_content)
for item in res:
dic={
'index':item[0].strip(),
'title':item[1].strip(),
'actor':item[2].strip(),
'time':item[3].strip(),
}
print(dic)
if __name__ == '__main__':
regex = r'<dd>.*?<.*?class="board-index.*?>(\d+)</i>.*?title="(.*?)".*?class="movie-item-info".*?<p class="star">(.*?)</p>.*?<p class="releasetime">(.*?)</p>'
pattern1=re.compile(regex,re.S)
url_dic={'http://maoyan.com/board/7':pattern1,}
p=Pool()
res_l=[]
for url,pattern in url_dic.items():
res=p.apply_async(get_page,args=(url,pattern),callback=parse_page)
res_l.append(res)
for i in res_l:
i.get()
输出结果:
{‘index’: ‘1’, ‘title’: ‘海王’, ‘actor’: ‘主演:杰森·莫玛,艾梅柏·希尔德,妮可·基德曼’, ‘time’: ‘上映时间:2018-12-07’}
{‘index’: ‘2’, ‘title’: ‘印度合伙人’, ‘actor’: ‘主演:阿克谢·库玛尔,拉迪卡·艾普特,索娜姆·卡普尔’, ‘time’: ‘上映时间:2018-12-14’}
{‘index’: ‘3’, ‘title’: ‘叶问外传:张天志’, ‘actor’: ‘主演:张晋,戴夫·巴蒂斯塔,柳岩’, ‘time’: ‘上映时间:2018-12-21’}
{‘index’: ‘4’, ‘title’: ‘龙猫’, ‘actor’: ‘主演:秦岚,糸井重里,岛本须美’, ‘time’: ‘上映时间:2018-12-14’}
{‘index’: ‘5’, ‘title’: ‘毒液:致命守护者’, ‘actor’: ‘主演:汤姆·哈迪,米歇尔·威廉姆斯,里兹·阿迈德’, ‘time’: ‘上映时间:2018-11-09’}
{‘index’: ‘6’, ‘title’: ‘无名之辈’, ‘actor’: ‘主演:陈建斌,任素汐,潘斌龙’, ‘time’: ‘上映时间:2018-11-16’}
{‘index’: ‘7’, ‘title’: ‘生活万岁’, ‘actor’: ‘主演:李安甫,胡兆翠,康昕’, ‘time’: ‘上映时间:2018-11-27’}
{‘index’: ‘8’, ‘title’: ‘恐龙王’, ‘actor’: ‘主演:王衡,吕佩玉,孙晔’, ‘time’: ‘上映时间:2018-11-10’}
{‘index’: ‘9’, ‘title’: ‘蜘蛛侠:平行宇宙’, ‘actor’: ‘主演:彭昱畅,沙梅克·摩尔,杰克·M·约翰森’, ‘time’: ‘上映时间:2018-12-21’}
{‘index’: ‘10’, ‘title’: ‘绿毛怪格林奇’, ‘actor’: ‘主演:本尼迪克特·康伯巴奇,卡梅伦·丝蕾,拉什达·琼斯’, ‘time’: ‘上映时间:2018-12-14’}
5、进程池socket应用
server端:
import socket
from multiprocessing import Pool
def func(conn):
conn.send(b'hello')
ret = conn.recv(1024).decode('utf-8')
print(ret)
if __name__ == '__main__':
p = Pool(5)
sk = socket.socket()
sk.bind(('127.0.0.1',8080))
sk.listen(5)
while True:
conn, addr = sk.accept()
p.apply_async(func,args=(conn,))
conn.close()
sk.close()
client端:
import socket
sk = socket.socket()
sk.connect(('127.0.0.1',8080))
ret = sk.recv(1024).decode('utf-8')
print(ret)
msg = input('>>').encode('utf-8')
sk.send(msg)