消费者和生产者模式框架

目的

实现生产者和消费者这种并行分布式的框架,来分布式的服务实时进行处理。

即实时生产者端产生数据,数据产生在消费者端实时处理,并行计算,没有for循环那种前后关系,实现并行

multiprocessing模块提供了一个Process类,可以用来创建和管理进程

Python多线程的不足

生产者-消费者模型一般采用多线程的方式实现,然而在python中,由于GIT全局锁的存在,多个线程还是在单个内核上分时执行,多线程程序并不能像C++等那样充分利用多核CPU的计算性能,导致单个核心性能不堪重负而其他核心性能的浪费。


Python多进程的一般示例

python中真正支持多核CPU的是多进程并行机制,一般通过调用multiprocessing.Process实现多进程的创建,并且通过multiprocessing.Queue实现生产者进程与消费者进程的数据通信。


代码demo

#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import sys

curPath = os.path.abspath(os.path.dirname(__file__))
rootPath = os.path.split(curPath)[0]
sys.path.append(rootPath)
import time
from multiprocessing import Queue as multiQueue
from multiprocessing import Process
import json


class Msg(object):
    def __init__(self, num):
        self.num = num


def producer(msg_queue):
    for i in range(5):
        m = Msg(i)
        msg_queue.put(m)
        print('==>producer is processing ==> {}'.format(str(i)))
        time.sleep(0.5)


def consumer1(q, send_q):
    msg = q.get()

    def func1(n):
        return n ** 2

    msg_2 = func1(msg.num)
    print("{} is processed to {} by consumer1".format(msg, msg_2))

    send_q.put(msg_2)


def consumer2(send_q):
    msg = send_q.get()

    def func2(n):
        return n - 1

    msg_2 = func2(msg)
    print("{} is processed to {} by consumer2".format(msg, msg_2))


if __name__ == '__main__':

    msg_queue = multiQueue()
    send_msg_queue = multiQueue()

    producer_process_list = []
    consumer1_process_list = []
    consumer2_process_list = []
    for i in range(2):
        producer_process_list.append(Process(target=producer, args=(msg_queue,)))

    for i in range(10):
        consumer1_process_list.append(Process(target=consumer1, args=(msg_queue, send_msg_queue)))

    for i in range(10):
        consumer2_process_list.append(Process(target=consumer2, args=(send_msg_queue,)))

    for producer_process in producer_process_list:
        producer_process.start()
    for consumer1_process in consumer1_process_list:
        consumer1_process.start()
    for consumer2_process in consumer2_process_list:
        consumer2_process.start()

    for producer_process in producer_process_list:
        producer_process.join()
    for consumer1_process in consumer1_process_list:
        consumer1_process.join()
    for consumer2_process in consumer2_process_list:
        consumer2_process.join()

在这个示例中,我们可以看到,主进程创建了个生产者和消费者,数据通过Queue实现共享和传递。在程序运行时,会自动创建生产者进程和消费者进程,并分配给不同的内核处理,从而实现了多生产者进程和多消费者进程的并行。然而,生产者与消费者是一个普通的函数,而不是我所设想的类,不符合面向对象的编程法则,只能执行非常简单的任务。在面向复杂设备或对象的数据采集和处理过程中,面向对象设计可以大幅简化程序设计并实现代码重用。而且,构造生产者类和消费者类,是理解并应用生产者-消费者机制的重要的桥梁。

基于Process派生类的生产者-消费者模型

from multiprocessing import Queue, Process


class producer(Process):   # 可定义为信号检测或数据生产模块
    def __init__(self, name, my_q):
        super(producer, self).__init__()
        self.name = name  # 数据成员在构造函数中初始化
                                      # 更复杂的类对象也在此处初始化
                                      # 函数实现不具备此功能
        self.q = my_q         # 指定数据共享与通讯的队列

    def run(self):   # run函数在Process.start函数中调用
        for i in range(1000000):  # 生产1000000个数据            
            info = self.name + "的娃娃%s" % str(i)
            # 数据放入队列
            self.q.put(info)
        self.q.put(None)


class consumer(Process):  # 可定义为数据分析处理模块
    def __init__(self, name, my_q):
        super(consumer, self).__init__()
        self.name = name
        self.q = my_q

    def run(self):  
        while True:
            info = self.q.get()
            if info:
                print("%s拿走了%s" % (self.name, info))
            else:
                break


if __name__ == '__main__':
    my_q = Queue(10)   
    # 构造生产者类和消费者类的实例
    producer_process_list = []
    consumer2_process_list = []
    for i in range(2):
        producer_process_list.append(producer('生产者'+str(i), my_q))

    for i in range(10):
        consumer2_process_list.append(consumer('消费者'+str(i), my_q))

    for producer_process in producer_process_list:
        producer_process.start()
    for consumer2_process in consumer2_process_list:
        consumer2_process.start()

    for producer_process in producer_process_list:
        producer_process.join()
    for consumer2_process in consumer2_process_list:
        consumer2_process.join()