Python多线程编程

执行多个线程,类似于执行同时执行几个不同的程序,但是它有下面优势:

  • 一个进程内的多个线程,可以与主线程分享同一份数据,这样可以让它们之间交流更容易。
  • 线程有时候被称为轻量级进程,它们不需要额外的申请内存; 他们比进程消耗小。

线程都有一个开始,执行序列,和一个结尾。它还有一个指令指针,跟踪当前执行的位置。

  • 线程可先运行(也可被中断)
  • 其它线程运行时,线程也可以被搁置(例如:sleeping),这个被称为yielding

开始一个新线程

想要生成一个线程,你需要使用thread模块中的方法。

thread.start_new_thread(function,args[, kwargs])

该方法可以在Linux和Windows上,创建一个快速高效的线程。

此方法被调用后立即返回,子线程开启,调用对应的函数和参数,当函数返回时,线程结束。

这里的args是一个参数元组;如果是一个空的元组,函数调用则不用传递任何参数。kwargs表示一个关键字参数的字典。

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import thread 
import time 
# Define a function for the thread
def print_time(threadName, delay):
    count = 0 
    while count < 5:
        time.sleep(delay)
        count += 1 
        print "%s: %s" %(threadName, time.ctime(time.time()))
# Create two threads as follows
try: 
    thread.start_new_thread(print_time, ("Thread-1", 2))
    thread.start_new_thread(print_time, ("Thread-2", 4))
except: 
    print "Error: unable tostart thread"
    # Wait for threads return
while 1: 
    pass

执行上面的程序,返回的结果如下:

Thread-1: Sat Aug 9 14:29:56 2014
Thread-2: Sat Aug 9 14:29:58 2014
Thread-1: Sat Aug 9 14:29:58 2014
Thread-1: Sat Aug 9 14:30:00 2014
Thread-2: Sat Aug 9 14:30:02 2014
Thread-1: Sat Aug 9 14:30:02 2014
Thread-1: Sat Aug 9 14:30:04 2014
Thread-2: Sat Aug 9 14:30:06 2014
Thread-2: Sat Aug 9 14:30:10 2014
Thread-2: Sat Aug 9 14:30:14 2014

尽管thread针对底层的线程很有效,但是与thread比较,它会有些限制。

Threading模块

自Python2.4开始,引入了threading模块,它的功能更加强大。除上述thread功能外,还支持一个高级功能。

threading.activeCount() 返回激活线程的数目
threading.currentThread()返回调用者线程控制的线程数
threading.enumerate() 返回激活线程对象列表

此外,threading模块调用Thread类实现了threading.Thread类提供的方法如下:

run() 线程的入口点
start() 调用run方法,开启线程
join([time]) 等待线程结束
isAlive()检测线程是否还在执行
getName() 返回线程名

Threading模块创建线程

用threading模块实现一个新的线程,你需要做下面的事情:

定义Thread的子类

重写__init__(self[,args])方法并添加一些参数

重写__run__(self[,args])方法,实现线程启动时应完成的任务

一旦创建一个新的Thread子类,你需要为它创建一个实例,调用start()方法启用新线程,它会执行run()方法。

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import threading
import time 
exitFlag = 0 
class newthread(threading.Thread):
def __init__(self, threadID,name, delay, counter):
    threading.Thread.__init__(self)
    self.threadID = threadID
    self.name = name
    self.delay = delay
    self.counter = counter
def run(self):
    print "Starting: %s"% self.name
    print_time(self.name,self.delay, self.counter)
    print "Exiting: %s"% self.name
def print_time(threadName, delay,counter):
    while counter:
        if exitFlag: 
            return 
        time.sleep(delay)
        print "%s: %s" %(threadName, time.ctime(time.time()))
        counter -= 1 
# Create new threads
thread1 = newthread(1, "Thread-1",1, 5)
thread2 = newthread(2, "Thread-2",2, 5)
# start new threads
thread1.start()
thread2.start()
print "Exiting Main Thread"

上述代码,执行后的结果如下:

Starting: Thread-1
Starting: Thread-2
Exiting Main Thread
Thread-1: Sat Aug 9 15:22:14 2014
Thread-2: Sat Aug 9 15:22:15 2014
Thread-1: Sat Aug 9 15:22:15 2014
Thread-1: Sat Aug 9 15:22:16 2014
Thread-2: Sat Aug 9 15:22:17 2014
Thread-1: Sat Aug 9 15:22:17 2014
Thread-1: Sat Aug 9 15:22:18 2014
Exiting: Thread-1
Thread-2: Sat Aug 9 15:22:19 2014
Thread-2: Sat Aug 9 15:22:21 2014
Thread-2: Sat Aug 9 15:22:23 2014
Exiting: Thread-2

同步线程

Python提供的threading模块包含一个锁定机制,允许你同步线程。调用Lock()方法创建新锁。

锁定对象的acquire(blocking)方法,将促使线程同步。可选参数blocking允许你决定线程是否需要等待获取这个锁。

如果blocking设为0,如果无法获取锁,那么线程将返回值0,如果成功获取,线程将返回1。如果blocking设置为1,那么线程会被锁定,一直等到锁释放。

锁不在需要时,可使用release()方法来释放所对象。

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import threading
import time 
class newthread(threading.Thread):
def __init__(self, threadID,name, delay, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.delay = delay
self.counter = counter
def run(self):
print "Starting: %s"% self.name
# Get Lock to synchronizethreads
threadLock.acquire()
print_time(self.name,self.delay, self.counter)
# Free lock to release nextthread
threadLock.release()
def print_time(threadName, delay,counter):
while counter:
time.sleep(delay)
print "%s:%s" %(threadName, time.ctime(time.time()))
counter -= 1 
threadLock = threading.Lock()
threads = [] 
# Create new threads
thread1 = newthread(1, "Thread-1",1, 5)
thread2 = newthread(2, "Thread-2",2, 5)
# Start new threads
thread1.start()
thread2.start()
# Add threads to thread list
threads.append(thread1)
threads.append(thread2)
# Wait for all threads to complete
for t in threads:
t.join() 
print "Exiting Main Thread"

执行上述代码,结果如下:

Starting: Thread-1
Starting: Thread-2
Thread-1:Sat Aug 9 16:11:56 2014
Thread-1:Sat Aug 9 16:11:57 2014
Thread-1:Sat Aug 9 16:11:58 2014
Thread-1:Sat Aug 9 16:11:59 2014
Thread-1:Sat Aug 9 16:12:00 2014
Thread-2:Sat Aug 9 16:12:02 2014
Thread-2:Sat Aug 9 16:12:04 2014
Thread-2:Sat Aug 9 16:12:06 2014
Thread-2:Sat Aug 9 16:12:08 2014
Thread-2:Sat Aug 9 16:12:10 2014
Exiting Main Thread

多线程优先级队列

Queue模块允许你创建一个新的队列,队列可以容纳一定数目的对象。下面方法可以用来控制队列:

get() 返回队列中的某个值,并将其从队列中删除
put() 添加值到队列中
qsize() 返回当前队列中的项数
empty() 如果队列为空,返回True;否则,返回False.
full() 如果队列满了,返回True;否则,返回False.
#!/usr/bin/env python
 # -*- encoding: utf8 -*-

 import urllib2
 import threading
 from Queue import Queue


 class threadpool(object):
     def __init__(self):
         self.que = Queue()
         self.lock = threading.Lock()

     def task(self, host):
         resp = urllib2.urlopen(host)
         print "%s,%s" % (resp.code, host)

     def mtasks(self, nums, hosts):
         for host in hosts:
             self.que.put(host)

         for i in range(nums):
             while True:
                 if not self.que.empty():
                     self.lock.acquire()
                     host = self.que.get()
                     self.lock.release()
                     t = threading.Thread(target=self.task, args=[host])
                     t.daemon = True
                     t.start()
                     t.join()
                 else:
                     break


 def main():
     x = []
     with open('list.csv') as f:
         for line in f:
             x.append(line.split(',')[1])

     tp = threadpool()
     tp.mtasks(20, x)

 if __name__ == '__main__':
     main()

Nice Job ---- https://pypi.python.org/pypi/threadpool/

#!/usr/bin/env python
 # -*- coding: utf8 -*-

 import time
 import threadpool


 def do_something(data):
     print "do_something: %s" % data
     return data.replace("http://", "https://")


 def callback(request, data):
     print "Callback: %s: %s" % (request.requestID, data)


 def exp_callback(request, exc_info):
     pass


 if __name__ == "__main__":
     data = ["http://demo-%s.com" % i for i in range(10000)]

     requests = threadpool.makeRequests(do_something,
                                        data,
                                        callback,
                                        exp_callback)

     pool = threadpool.ThreadPool(5)

     [pool.putRequest(req) for req in requests]

     while True:
         try:
             time.sleep(0.5)
             pool.poll()
         except KeyboardInterrupt:
             pass
             break
         except threadpool.NoResultsPending:
             pass
             break

     if pool.dismissedWorkers:
         pool.joinAllDismissedWorkers()

http://pymotw.com/2/threading/ https://docs.python.org/2/library/threading.html