简介
使用Python可以快速地编写程序,但是python对多线程的支持却不好,在Python2中,更多地使用多进程。在Python3中,引入了concurrent,便于多线程/进程开发。
Python GIL
Python代码的执行由Python解释器进行控制,目前Python的解释器有多种,比较著名的有CPython、PyPy、Jython等。其中CPython为最广泛使用的Python解释器,是最早的由c语言开发。
在OS中,支持多个线程同时执行。 但在Python设计之初考虑到在Python解释器的主循环中执行Python代码,于是CPython中设计了全局解释器锁GIL(Global Interpreter Lock)机制,用于管理解释器的访问,Python线程的执行必须先竞争到GIL权限才能执行。
因此无论是单核还是多核CPU,任意给定时刻只有一个线程会被Python解释器执行,无法多线程运行。并这也是为什么在多核CPU上,Python的多线程有时效率并不高的根本原因。
Python2中高性能解决方法
Python多任务的解决方案主要由这么几种:
- 启动多进程,每个进程只有一个线程,通过多进程执行多任务;
- 启动单进程,在进程内启动多线程,通过多线程执行多任务;
- 启动多进程,在每个进程内再启动多个线程,同时执行更多的任务–这样子太复杂,实际上效果并不好,使用的更少。
使用多进程
多进程的package对应的是multiprocessing。
先看一下Process类。
'''
from multiprocessing.process import Process, current_process, active_children
'''
class Process(object):
'''
Process objects represent activity that is run in a separate process
The class is analagous to `threading.Thread`
'''
_Popen = None
def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
assert group is None, 'group argument must be None for now'
count = _current_process._counter.next()
self._identity = _current_process._identity + (count,)
self._authkey = _current_process._authkey
self._daemonic = _current_process._daemonic
self._tempdir = _current_process._tempdir
self._parent_pid = os.getpid()
self._popen = None
self._target = target
self._args = tuple(args)
self._kwargs = dict(kwargs)
self._name = name or type(self).__name__ + '-' + \
':'.join(str(i) for i in self._identity)
一个简单的Process的使用示例:
from multiprocessing import Process
def f(name):
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
多线程处理
线程处理的package是threading
.
先简单看一下Thread类
# Main class for threads
class Thread(_Verbose):
"""A class that represents a thread of control.
This class can be safely subclassed in a limited fashion.
"""
__initialized = False
# Need to store a reference to sys.exc_info for printing
# out exceptions when a thread tries to use a global var. during interp.
# shutdown and thus raises an exception about trying to perform some
# operation on/with a NoneType
__exc_info = _sys.exc_info
# Keep sys.exc_clear too to clear the exception just before
# allowing .join() to return.
__exc_clear = _sys.exc_clear
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
"""This constructor should always be called with keyword arguments. Arguments are:
*group* should be None; reserved for future extension when a ThreadGroup
class is implemented.
*target* is the callable object to be invoked by the run()
method. Defaults to None, meaning nothing is called.
*name* is the thread name. By default, a unique name is constructed of
the form "Thread-N" where N is a small decimal number.
*args* is the argument tuple for the target invocation. Defaults to ().
*kwargs* is a dictionary of keyword arguments for the target
invocation. Defaults to {}.
If a subclass overrides the constructor, it must make sure to invoke
the base class constructor (Thread.__init__()) before doing anything
else to the thread.
"""
简单示例
#!/usr/bin/python
from threading import Thread
def count(n):
print "begin count..." "\r\n"
while n > 0:
n-=1
print "done."
def test_ThreadCount():
t1 = Thread(target=count,args=(1000000,))
print("start thread.")
t1.start()
print "join thread."
t1.join()
if __name__ == '__main__':
test_ThreadCount()
输出:
start thread.
begin count...
join thread.
done.
使用多进程和多线程性能对比
测试代码是网友的,使用了timeit, 请先安装此包。
#!/usr/bin/python
from threading import Thread
from multiprocessing import Process,Manager
from timeit import timeit
def count(n):
while n > 0:
n-=1
def test_normal():
count(1000000)
count(1000000)
def test_Thread():
t1 = Thread(target=count,args=(1000000,))
t2 = Thread(target=count,args=(1000000,))
t1.start()
t2.start()
t1.join()
t2.join()
def test_Process():
t1 = Process(target=count,args=(1000000,))
t2 = Process(target=count,args=(1000000,))
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == '__main__':
print "test_normal",timeit('test_normal()','from __main__ import test_normal',number=10)
print "test_Thread",timeit('test_Thread()','from __main__ import test_Thread',number=10)
print "test_Process",timeit('test_Process()','from __main__ import test_Process',number=10)
执行后的输出结果:
test_normal 1.0291161
test_Thread 7.5084157
test_Process 1.6441867
可见,直接使用方法反而最快,使用Process次之,使用Thread最慢。单这个测试只是运算测试。如果有IO类的慢速操作时,还是要使用Process或者Thread。
python3中的concurrent.futures包
使用java或者CSharp的开发者,对future应该比较了解。这个是用以并发支持。
在Python3.2中提供了concurrent.futures包, 而python 2.7需要安装futures模块,使用命令pip install futures
安装即可.
模块concurrent.futures
给开发者提供一个执行异步调用的高级接口。concurrent.futures
基本上就是在Python的threading
和multiprocessing
模块之上构建的抽象层,更易于使用。尽管这个抽象层简化了这些模块的使用,但是也降低了很多灵活性。
这里最重要的是类Executor,当然Executor是抽象类,具体的实现类有2个,分别是ThreadPoolExecutor
和 ProcessPoolExecutor
,正如名字所示,分别对应着Thread和Process的执行池。
看一下ProcessPoolExecutor定义, 缺省地,最大的工作任务应该和CPU数量匹配。
class ProcessPoolExecutor(_base.Executor):
def __init__(self, max_workers=None):
"""Initializes a new ProcessPoolExecutor instance.
Args:
max_workers: The maximum number of processes that can be used to
execute the given calls. If None or not given then as many
worker processes will be created as the machine has processors.
"""
_check_system_limits()
if max_workers is None:
self._max_workers = multiprocessing.cpu_count()
else:
if max_workers <= 0:
raise ValueError("max_workers must be greater than 0")
self._max_workers = max_workers
再看一下ThreadPoolExecutor的定义, 最重叠IO上(或者参考CompleteIO),处理最大的工作数量应该cpu数量的5倍。
class ThreadPoolExecutor(_base.Executor):
def __init__(self, max_workers=None):
"""Initializes a new ThreadPoolExecutor instance.
Args:
max_workers: The maximum number of threads that can be used to
execute the given calls.
"""
if max_workers is None:
# Use this number because ThreadPoolExecutor is often
# used to overlap I/O instead of CPU work.
max_workers = (cpu_count() or 1) * 5
if max_workers <= 0:
raise ValueError("max_workers must be greater than 0")
self._max_workers = max_workers
self._work_queue = queue.Queue()
self._threads = set()
self._shutdown = False
self._shutdown_lock = threading.Lock()
看一个简单的示例,改编自网友的程序:
#!/usr/bin/python2
import os
import urllib
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed
from concurrent.futures import ProcessPoolExecutor
def downloader(url):
req = urllib.urlopen(url)
if (req != None):
print "begin down", url
filename = os.path.basename(url)
ext = os.path.splitext(url)[1]
if not ext:
raise RuntimeError("URL does not contain an extension")
with open(filename,"wb") as file_handle:
while True:
chunk = req.read(1024)
if not chunk:
break
file_handle.write(chunk)
msg = "Finished downloading {filename}".format(filename = filename)
return msg
def mainProcess(urls):
with ProcessPoolExecutor(max_workers = 5) as executor:
futures = [executor.submit(downloader,url) for url in urls]
for future in as_completed(futures):
print(future.result())
def mainThread(urls):
with ThreadPoolExecutor(max_workers = 5) as executor:
futures = [executor.submit(downloader,url) for url in urls]
for future in as_completed(futures):
print(future.result())
if __name__ == "__main__":
urls1 = [
"http://www.irs.gov/pub/irs-pdf/f1040.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040ez.pdf"]
urls2 = [
"http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
mainProcess(urls1)
mainThread(urls2)
执行3次,输出如下:
----1
begin down http://www.irs.gov/pub/irs-pdf/f1040ez.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040a.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040.pdf
Finished downloading f1040ez.pdf
Finished downloading f1040.pdf
Finished downloading f1040a.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040es.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040sb.pdf
Finished downloading f1040sb.pdf
Finished downloading f1040es.pdf
----2
begin down http://www.irs.gov/pub/irs-pdf/f1040.pdfb
egin down http://www.irs.gov/pub/irs-pdf/f1040ez.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040a.pdf
Finished downloading f1040ez.pdf
Finished downloading f1040a.pdf
Finished downloading f1040.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040es.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040sb.pdf
Finished downloading f1040sb.pdf
Finished downloading f1040es.pdf
----3
begin down http://www.irs.gov/pub/irs-pdf/f1040.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040a.pdf
Finished downloading f1040.pdf
Finished downloading f1040a.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040ez.pdf
Finished downloading f1040ez.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040sb.pdf
begin down http://www.irs.gov/pub/irs-pdf/f1040es.pdf
Finished downloading f1040sb.pdf
Finished downloading f1040es.pdf