在Python中,有幾個常用的庫可以用于代理IP爬蟲:
http_proxy或https_proxy參數。例如:import requests
proxies = {
'http': 'http://proxy.example.com:8080',
'https': 'http://proxy.example.com:8080',
}
response = requests.get('http://example.com', proxies=proxies)
settings.py文件中設置HTTP_PROXY和DOWNLOADER_MIDDLEWARES。例如:HTTP_PROXY = 'http://proxy.example.com:8080'
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.ProxyMiddleware': 100,
}
然后,在middlewares.py文件中實現代理中間件:
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = spider.settings.get('HTTP_PROXY')
proxies參數來使用代理。例如:import urllib.request
proxies = {
'http': 'http://proxy.example.com:8080',
'https': 'http://proxy.example.com:8080',
}
url = 'http://example.com'
opener = urllib.request.build_opener(urllib.request.ProxyHandler(proxies))
response = opener.open(url)
這些庫都可以幫助你實現代理IP爬蟲的功能。你可以根據自己的需求和項目規模選擇合適的庫。