python3下multiprocessing、threading和gevent性能對比以及進程池、線程池和協程池性能對比,很多新手對此不是很清楚,為了幫助大家解決這個難題,下面小編將為大家詳細講解,有這方面需求的人可以來學習下,希望你能有所收獲。
目前計算機程序一般會遇到兩類I/O:硬盤I/O和網絡I/O。我就針對網絡I/O的場景分析下python3下進程、線程、協程效率的對比。進程采用multiprocessing.Pool進程池,線程是自己封裝的進程池,協程采用gevent的庫。用python3自帶的urlllib.request和開源的requests做對比。代碼如下:
import urllib.request
import requests
import time
import multiprocessing
import threading
import queue
def startTimer():
return time.time()
def ticT(startTime):
useTime = time.time() - startTime
return round(useTime, 3)
#def tic(startTime, name):
# useTime = time.time() - startTime
# print('[%s] use time: %1.3f' % (name, useTime))
def download_urllib(url):
req = urllib.request.Request(url,
headers={'user-agent': 'Mozilla/5.0'})
res = urllib.request.urlopen(req)
data = res.read()
try:
data = data.decode('gbk')
except UnicodeDecodeError:
data = data.decode('utf8', 'ignore')
return res.status, data
def download_requests(url):
req = requests.get(url,
headers={'user-agent': 'Mozilla/5.0'})
return req.status_code, req.text
class threadPoolManager:
def __init__(self,urls, workNum=10000,threadNum=20):
self.workQueue=queue.Queue()
self.threadPool=[]
self.__initWorkQueue(urls)
self.__initThreadPool(threadNum)
def __initWorkQueue(self,urls):
for i in urls:
self.workQueue.put((download_requests,i))
def __initThreadPool(self,threadNum):
for i in range(threadNum):
self.threadPool.append(work(self.workQueue))
def waitAllComplete(self):
for i in self.threadPool:
if i.isAlive():
i.join()
class work(threading.Thread):
def __init__(self,workQueue):
threading.Thread.__init__(self)
self.workQueue=workQueue
self.start()
def run(self):
while True:
if self.workQueue.qsize():
do,args=self.workQueue.get(block=False)
do(args)
self.workQueue.task_done()
else:
break
urls = ['http://www.ustchacker.com'] * 10
urllibL = []
requestsL = []
multiPool = []
threadPool = []
N = 20
PoolNum = 100
for i in range(N):
print('start %d try' % i)
urllibT = startTimer()
jobs = [download_urllib(url) for url in urls]
#for status, data in jobs:
# print(status, data[:10])
#tic(urllibT, 'urllib.request')
urllibL.append(ticT(urllibT))
print('1')
requestsT = startTimer()
jobs = [download_requests(url) for url in urls]
#for status, data in jobs:
# print(status, data[:10])
#tic(requestsT, 'requests')
requestsL.append(ticT(requestsT))
print('2')
requestsT = startTimer()
pool = multiprocessing.Pool(PoolNum)
data = pool.map(download_requests, urls)
pool.close()
pool.join()
multiPool.append(ticT(requestsT))
print('3')
requestsT = startTimer()
pool = threadPoolManager(urls, threadNum=PoolNum)
pool.waitAllComplete()
threadPool.append(ticT(requestsT))
print('4')
import matplotlib.pyplot as plt
x = list(range(1, N+1))
plt.plot(x, urllibL, label='urllib')
plt.plot(x, requestsL, label='requests')
plt.plot(x, multiPool, label='requests MultiPool')
plt.plot(x, threadPool, label='requests threadPool')
plt.xlabel('test number')
plt.ylabel('time(s)')
plt.legend()
plt.show()運行結果如下:
從上圖可以看出,python3自帶的urllib.request效率還是不如開源的requests,multiprocessing進程池效率明顯提升,但還低于自己封裝的線程池,有一部分原因是創建、調度進程的開銷比創建線程高(測試程序中我把創建的代價也包括在里面)。
下面是gevent的測試代碼:
import urllib.request
import requests
import time
import gevent.pool
import gevent.monkey
gevent.monkey.patch_all()
def startTimer():
return time.time()
def ticT(startTime):
useTime = time.time() - startTime
return round(useTime, 3)
#def tic(startTime, name):
# useTime = time.time() - startTime
# print('[%s] use time: %1.3f' % (name, useTime))
def download_urllib(url):
req = urllib.request.Request(url,
headers={'user-agent': 'Mozilla/5.0'})
res = urllib.request.urlopen(req)
data = res.read()
try:
data = data.decode('gbk')
except UnicodeDecodeError:
data = data.decode('utf8', 'ignore')
return res.status, data
def download_requests(url):
req = requests.get(url,
headers={'user-agent': 'Mozilla/5.0'})
return req.status_code, req.text
urls = ['http://www.ustchacker.com'] * 10
urllibL = []
requestsL = []
reqPool = []
reqSpawn = []
N = 20
PoolNum = 100
for i in range(N):
print('start %d try' % i)
urllibT = startTimer()
jobs = [download_urllib(url) for url in urls]
#for status, data in jobs:
# print(status, data[:10])
#tic(urllibT, 'urllib.request')
urllibL.append(ticT(urllibT))
print('1')
requestsT = startTimer()
jobs = [download_requests(url) for url in urls]
#for status, data in jobs:
# print(status, data[:10])
#tic(requestsT, 'requests')
requestsL.append(ticT(requestsT))
print('2')
requestsT = startTimer()
pool = gevent.pool.Pool(PoolNum)
data = pool.map(download_requests, urls)
#for status, text in data:
# print(status, text[:10])
#tic(requestsT, 'requests with gevent.pool')
reqPool.append(ticT(requestsT))
print('3')
requestsT = startTimer()
jobs = [gevent.spawn(download_requests, url) for url in urls]
gevent.joinall(jobs)
#for i in jobs:
# print(i.value[0], i.value[1][:10])
#tic(requestsT, 'requests with gevent.spawn')
reqSpawn.append(ticT(requestsT))
print('4')
import matplotlib.pyplot as plt
x = list(range(1, N+1))
plt.plot(x, urllibL, label='urllib')
plt.plot(x, requestsL, label='requests')
plt.plot(x, reqPool, label='requests geventPool')
plt.plot(x, reqSpawn, label='requests Spawn')
plt.xlabel('test number')
plt.ylabel('time(s)')
plt.legend()
plt.show()運行結果如下:
從上圖可以看到,對于I/O密集型任務,gevent還是能對性能做很大提升的,由于協程的創建、調度開銷都比線程小的多,所以可以看到不論使用gevent的Spawn模式還是Pool模式,性能差距不大。
因為在gevent中需要使用monkey補丁,會提高gevent的性能,但會影響multiprocessing的運行,如果要同時使用,需要如下代碼:
gevent.monkey.patch_all(thread=False, socket=False, select=False)
可是這樣就不能充分發揮gevent的優勢,所以不能把multiprocessing Pool、threading Pool、gevent Pool在一個程序中對比。不過比較兩圖可以得出結論,線程池和gevent的性能最優的,其次是進程池。附帶得出個結論,requests庫比urllib.request庫性能要好一些哈:-)
看完上述內容是否對您有幫助呢?如果還想對相關知識有進一步的了解或閱讀更多相關文章,請關注億速云行業資訊頻道,感謝您對億速云的支持。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。