在Python中,可以使用多種方法對多線程爬蟲的任務狀態進行監控。以下是一些建議:
queue.Queue
)來存儲任務狀態。這樣,所有線程都可以訪問和更新任務狀態,而不會出現競爭條件。import queue
import threading
task_queue = queue.Queue()
def add_task(task_id, task):
task_queue.put((task_id, task))
def monitor_tasks():
while True:
task_id, task = task_queue.get()
if task.is_completed():
print(f"Task {task_id} completed.")
elif task.is_failed():
print(f"Task {task_id} failed.")
task_queue.task_done()
task.complete()
方法,表示任務已完成。如果任務失敗,調用task.fail()
方法,表示任務失敗。class Task:
def __init__(self, task_id):
self.task_id = task_id
self.status = "pending"
def complete(self):
self.status = "completed"
def fail(self):
self.status = "failed"
def is_completed(self):
return self.status == "completed"
def is_failed(self):
return self.status == "failed"
task_queue.join()
方法,以便監控線程可以完成所有剩余任務。def main():
# 創建任務并添加到隊列
task1 = Task(1)
task2 = Task(2)
add_task(1, task1)
add_task(2, task2)
# 啟動監控線程
monitor_thread = threading.Thread(target=monitor_tasks)
monitor_thread.start()
# 啟動爬蟲線程
crawl_threads = []
for _ in range(5):
thread = threading.Thread(target=crawl_task, args=(task_queue,))
thread.start()
crawl_threads.append(thread)
# 等待所有爬蟲線程完成
for thread in crawl_threads:
thread.join()
# 等待監控線程完成剩余任務
task_queue.join()
if __name__ == "__main__":
main()
通過這種方式,您可以輕松地監控多線程爬蟲的任務狀態,并在任務完成或失敗時采取相應的措施。