文章目录
- 简介
- 对比
- 基准
- _thread
- Thread类
- Lock
- Queue
- multiprocessing.dummy
- 线程池(推荐)
- 进度条
- 参考文献
简介
- 多进程 Process:multiprocessing
- 优点:使用多核 CPU 并行运算
- 缺点:占用资源最多、可启动数目比线程少
- 适用场景:CPU 密集型
- 多线程 Thread:threading
- 优点:相比进程,更轻量级、占用资源少
- 缺点:
- 相比进程:多线程并发执行时只能同时使用一个 CPU,不能利用多 CPU(因为 GIL,但因为有 IO 存在,多线程依然可以加速运行)
- 相比协程:启动数目有限,占用内存资源,有线程切换开销
- 适用场景:IO 密集型、同时运行任务数不多
- 多协程 Coroutine:asyncio
- 优点:内存开销最小、启动协程数量多
- 缺点:支持的库少、实现复杂
- 适用场景:IO 密集型、需要超多任务运行
IO 指输入输出,有文件 IO 和网络 IO,如文件读写、数据库读写、网络请求(爬虫)
好用的多线程目标:
- 速度快
- 有返回值
- 数据同步
对比
方案 | 优点 | 缺点 | 耗时/s |
基准 | 33.05 | ||
_thread | 1. 后台运行 2. 适合 GUI | 1. 需要程序一直运行 2. 难以获取返回值 | 142.75 |
Thread类 | 1. 获取返回值有点麻烦 2. 数据同步需要用到 Lock 或 Queue | 29.22 | |
multiprocessing.dummy | 1. 启动方便 2. 有返回值 3. 数据同步 | 需先收集参数,编写逻辑有点不同 | 28.81 |
线程池 | 1. 启动方便 2. 有返回值 3. 数据同步 | 需先收集参数,编写逻辑有点不同 | 30.09 |
基准
以简单的文件读写为例,模拟 IO 操作
def benchmark(n):
"""多线程基准函数"""
i = 0
with open('{}.txt'.format(n), 'w') as f:
for i in range(n * 1000000):
f.write(str(i) + '\n')
return i
if __name__ == '__main__':
from timeit import timeit
def f():
for n in range(10):
print(benchmark(n))
print(timeit(f, number=1))
_thread
import _thread
from tool import benchmark
def f():
for n in range(10):
print(_thread.start_new_thread(benchmark, (n,)))
if __name__ == '__main__':
f()
while True:
pass
缺点:
- 需要程序一直运行
- 难以获取返回值
Thread类
import threading
from tool import benchmark
class MyThread(threading.Thread):
def run(self):
if self._target is not None:
self._return = self._target(*self._args, **self._kwargs)
def join(self):
super().join()
return self._return
def f():
threads = []
for n in range(10):
threads.append(MyThread(target=benchmark, args=(n,)))
for thread in threads:
thread.start()
for thread in threads:
print(thread.join())
if __name__ == '__main__':
from timeit import timeit
print(timeit(f, number=1))
缺点:
- 获取返回值有点麻烦
- 数据同步需要用到 Lock 或 Queue
Lock
import time
import threading
from threading import Thread, Lock
lock = Lock()
class Account:
def __init__(self, balance):
self.balance = balance
def draw(account, amount):
with lock:
if account.balance >= amount:
time.sleep(0.1)
print(threading.current_thread().name, '取钱成功')
account.balance -= amount
print(threading.current_thread().name, '余额', account.balance)
else:
print(threading.current_thread().name, '取钱失败,余额不足')
if __name__ == '__main__':
account = Account(1000)
ta = Thread(target=draw, args=(account, 800), name='ta')
tb = Thread(target=draw, args=(account, 800), name='tb')
ta.start()
tb.start()
Queue
import threading
from queue import Queue
from tool import benchmark
def f(queue):
n = queue.get()
print(benchmark(n))
if __name__ == '__main__':
queue = Queue()
for n in range(10):
queue.put(n)
for n in range(10):
thread = threading.Thread(target=f, args=(queue,))
thread.start()
这种写法数据不同步
耗时:26.44
multiprocessing.dummy
from multiprocessing.dummy import Pool
from tool import benchmark
def f():
n_list = [n for n in range(10)]
pool = Pool(processes=8)
results = pool.map(benchmark, n_list)
pool.close()
pool.join()
print(results)
if __name__ == '__main__':
from timeit import timeit
print(timeit(f, number=1))
线程池(推荐)
线程池
from concurrent.futures import ThreadPoolExecutor
from tool import benchmark
def f():
n_list = [n for n in range(10)]
with ThreadPoolExecutor() as executor:
results = list(executor.map(benchmark, n_list))
print(results)
if __name__ == '__main__':
from timeit import timeit
print(timeit(f, number=1))
要用多个参数时,可用 lambda 函数进行封装,如
import time
from concurrent.futures import ThreadPoolExecutor
def f(x=1, y=2):
time.sleep(1)
return x * y
x_list = [1, 2, 3]
y_list = [4, 5, 6]
with ThreadPoolExecutor() as executor:
results = list(executor.map(f, x_list, y_list))
print(results) # [4, 10, 18]
results = list(executor.map(lambda y: f(y=y), y_list))
print(results) # [4, 5, 6]
进度条
from concurrent.futures import ThreadPoolExecutor
from tool import benchmark
def f():
n_list = [n for n in range(10)]
with ThreadPoolExecutor() as executor:
results = list(executor.map(benchmark, n_list))
print(results)
if __name__ == '__main__':
from timeit import timeit
print(timeit(f, number=1))