一、不同进程的数据共享
参考:https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Value
#=============主进程创建共享变量,并与子进程共享该变量=============
activate_flag = Value("i",0)
user_process = Process(target=user.run_model,args=activate_flag,recv_port,request_records])
user_process.start()
#=========== 主进程修改共享变量值
activate_flag.value = 4
#========== 子进程不断刷新检测共享变量的最新值
while True:
if activate_flag.value >0:
pass
elif activate_flag.value == 0:
time.sleep(0.5)
else:
break
注:1. 共享变量的同步具有一定的滞后性,充分考虑滞后性并设计应对策略;如关掉端口,直接kill掉进程。
2. 进程间直接共享Value的效率高于共享Manager.value的效率。
for user in user_list:
# 0. logout users by kill the process they lie in
# 1. release the recv_port
try:
recv_port = recv_port_list[i].value
os.system('fuser -k -n tcp '+str(recv_port))
os.kill(user.run_model_pid,signal.SIGKILL)
print("#################logout users##################",user.user_id,user.model_name)
except Exception as e:
print("error happens when logouts users",e)
i = i+1
二、同一进程下不同线程间的数据共享
from queue import Queue #python 3 从queue导入,python 2 从Queue导入
data_queue = Queue()
request_records ={}
Thread(target=self.recv_data, args=[activate_flag, recv_port, request_records]).start()
Thread(target=self.send_data,args=activate_flag,data_queue,recv_port,request_records]).start()
tips:
1. python本身集成的数据变量(如dict,list)本身就是thread-safe的,因此不用担心数据一致性的问题
2. 子线程可以修改主线程中的数据data(共享数据),但是子线程中创建、操作与共享数据同名的变量,并不会造成共享数据的变化。
data = []
def write_list(data,name):
data.append(name)
print("thread",name,data)
data = "wujing"
for i in range(5):
Thread(target=write_list,args=[data,str(i)]).start()
输入如下:
thread 0 ['0']
thread 1 ['0', '1']
thread 2 ['0', '1', '2']
thread 3 ['0', '1', '2', '3']
thread 4 ['0', '1', '2', '3', '4']
三、子进程下的不同子线程共享主线程的数据(没有找到可行的方法)
尝试一
def execute(shared_data,name):
shared_data[name] = name
print("sub", name, shared_data)
def sub_process(shared_data):
sub_thread = Thread(target=execute,args=[shared_data,"a"])
sub_thread.start()
sub_thread2 = Thread(target=execute,args=[shared_data,"b"])
sub_thread2.start()
#1.主线程创建共享变量
shared_data = Manager().dict()
print("main",shared_data)
# 2.与子线程共享变量
p = Process(target=sub_process,args=[shared_data])
p.start()
# p.join()
print("main",shared_data)
执行方式一、注释掉p.join(),运行结果如下:
main {}
main {}
执行方式二、加入p.join,运行结果如下:
main {}
sub a sub b {'a': 'a', 'b': 'b'}
{'a': 'a', 'b': 'b'}
main {'a': 'a', 'b': 'b'}
or
main {}
sub a {'a': 'a'}
sub b main {'a': 'a', 'b': 'b'}
运行结果不稳定,该方法不可行
四、进程中run函数和__init__的运行位置
class ProcessSon(Process):
def __init__(self):
super().__init__()
print("sub process pid of init",os.getpid())
def change(self,data,i):
data["b"] = i
print("sub thread pid of change",os.getpid(),i, data)
def run(self):
# 1. 在该线程了中开辟子线程A和B,该线程需要和A共享队列,A和B共享socket对象和一个单纯的变量。
# 2. 两个子线程共享队列,让一个线程在退出前保存所有数据
data = {"a":1}
print("sub process pid of run", os.getpid())
for i in range(3):
Thread(target=self.change,args=[data,i]).start()
print("main process pid",os.getpid())
p= ProcessSon()
p.start()
结果:
main process pid 19728
sub process pid of init 19728
sub process pid of run 19744
sub thread pid of change 19744 0 {'a': 1, 'b': 0}
sub thread pid of change 19744 1 {'a': 1, 'b': 1}
sub thread pid of change 19744 2 {'a': 1, 'b': 2}
总结:
子进程的主函数是在主进程中运行的,但是子进程中的子线程在子进程中运行。
六、线程池协同处理
基于多个进程实现数据并行
https://stackoverflow.com/questions/20776189/concurrent-futures-vs-multiprocessing-in-python-3
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing
https://docs.python.org/3/library/concurrent.futures.html
读取manager.dict()的内容
https://docs.python.org/3/library/stdtypes.html#dict
https://docs.python.org/3/library/stdtypes.html#dict-views
支持pickle的数据类型
https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled
note:
1. 多进程能很好的绕开Global interperter lcok的影响,实现真正意义上的并行。
2. The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.
进程间的共享数据仅限于特定的数据类型,同时进程调用返回和传递的数据和参数也必须是可以充分pickle的object(详细内容参见上述链接)
def update_solo_edge_result(future):
result = future.result()
try:
model_info["model_" + str(model_index)] = {"type": result[0][0], \
"headroom": result[1][0],
"rate": result[2][0]}
result_dict['total_mps'] = result[3][0] + result_dict['total_mps']
except Exception as e:
print("update the allocation of each solo edges in greedy allocation fails", e)
result_dict = mp.Manager().dict({"total_mps": 0, "model_info": None})
model_info = mp.Manager().dict()
executor = concurrent.futures.ProcessPoolExecutor(max_workers=self.MAX_WORKERS)
unmated_model_index = np.setdiff1d(np.arange(0,self.model_nums),np.array(select_nodes))
for model_index in unmated_model_index:
model = self.model_list_org[model_index]
# evaluate the demanded resources
future = executor.submit(self.pipe.get_gslice_multi_ins_baseline,model[0],\
self.TOTAL_POINTS[model_name], model[2], model[1], model_name)
future.add_done_callback(update_solo_edge_result)
executor.shutdown()
result_dict["model_info"] = model_info.items()
总结
1. 进程是OS分配资源的最小单位,而线程是OS调度的最小单位。同一个子进程下的不同线程可以“有效的操作”(读写)其所在子进程的数据(如dict),但是只能读其所在主进程中的数据,不能写主进程数据 (注:不会抛出异常,但是子线程的操作不能生效)。
2. 子进程的主函数是在主进程中运行的,但是子进程中的子线程在子进程中运行。