溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

如何使用python爬取天氣數據

發布時間：2022-01-13 15:57:42 來源：億速云閱讀：259 作者：小新欄目：大數據

# 如何使用Python爬取天氣數據

## 前言

在數據驅動的時代，天氣數據對農業、交通、旅游等行業具有重要意義。Python作為強大的編程語言，憑借豐富的庫生態系統，成為網絡爬蟲開發的首選工具。本文將詳細介紹使用Python爬取天氣數據的完整流程，涵蓋從環境準備到數據存儲的全過程。

## 一、準備工作

### 1.1 開發環境配置

首先需要確保已安裝Python環境（推薦3.7+版本），并安裝必要的庫：

```bash
pip install requests beautifulsoup4 pandas selenium

1.2 選擇目標網站

常見天氣數據來源： - 中國天氣網（www.weather.com.cn） - 中央氣象臺（www.nmc.cn） - World Weather Online（www.worldweatheronline.com）

注意：爬取前務必查看網站的robots.txt文件和使用條款

二、靜態網頁爬?。ㄒ灾袊鞖饩W為例）

2.1 分析網頁結構

打開目標城市頁面（如北京）
使用瀏覽器開發者工具（F12）檢查元素
定位溫度、濕度等關鍵數據的HTML標簽

2.2 使用Requests+BeautifulSoup實現

import requests
from bs4 import BeautifulSoup
import pandas as pd

def get_weather(city_code):
    url = f"http://www.weather.com.cn/weather/{city_code}.shtml"
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
    }
    
    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.encoding = 'utf-8'
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # 提取7天天氣數據
        weather_list = []
        for item in soup.select(".t.clearfix li"):
            date = item.select_one("h1").get_text()
            weather = item.select_one(".wea").get_text()
            temp = item.select_one(".tem").get_text().replace("\n", "")
            wind = item.select_one(".win em span")["title"]
            
            weather_list.append({
                "日期": date,
                "天氣": weather,
                "溫度": temp,
                "風向": wind
            })
        
        return pd.DataFrame(weather_list)
    
    except Exception as e:
        print(f"爬取失敗: {e}")
        return None

# 使用示例
df = get_weather('101010100')  # 北京城市代碼
print(df.head())

2.3 處理反爬機制

User-Agent輪換：準備多個常用瀏覽器UA
請求間隔：使用time.sleep(random.uniform(1,3))
代理IP池：應對IP封鎖
Cookies處理：維持會話狀態

三、動態網頁爬?。ㄒ訵orld Weather Online為例）

3.1 Selenium自動化工具

當數據通過JavaScript動態加載時，需要使用瀏覽器自動化工具：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

def get_dynamic_weather():
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # 無頭模式
    driver = webdriver.Chrome(options=chrome_options)
    
    try:
        driver.get("https://www.worldweatheronline.com/beijing-weather/beijing/cn.aspx")
        time.sleep(5)  # 等待頁面加載
        
        # 使用XPath定位元素
        temp = driver.find_element_by_xpath('//div[@class="temp"]').text
        condition = driver.find_element_by_xpath('//div[@class="condition"]').text
        
        print(f"當前溫度: {temp}, 天氣狀況: {condition}")
        
    finally:
        driver.quit()

get_dynamic_weather()

3.2 高級技巧

顯式等待：使用WebDriverWait替代固定等待
截圖調試：driver.save_screenshot('debug.png')
無頭瀏覽器檢測規避：添加--disable-blink-features=AutomationControlled

四、API接口調用（推薦方式）

4.1 尋找開放API

和風天氣（商業API，有免費額度）
OpenWeatherMap（免費版有限制）
國家氣象局開放平臺

4.2 示例：和風天氣API

import requests
import json

def get_weather_by_api(location="101010100", key="YOUR_API_KEY"):
    url = f"https://devapi.qweather.com/v7/weather/now?location={location}&key={key}"
    
    response = requests.get(url)
    data = json.loads(response.text)
    
    if data['code'] == '200':
        weather_info = {
            '觀測時間': data['updateTime'],
            '溫度': f"{data['now']['temp']}°C",
            '體感溫度': f"{data['now']['feelsLike']}°C",
            '天氣': data['now']['text'],
            '風向': data['now']['windDir'],
            '風速': f"{data['now']['windSpeed']}km/h"
        }
        return weather_info
    else:
        return None

# 使用示例
result = get_weather_by_api()
print(json.dumps(result, indent=2, ensure_ascii=False))

五、數據存儲與管理

5.1 存儲到CSV

df.to_csv('weather_data.csv', index=False, encoding='utf_8_sig')

5.2 存儲到數據庫（MySQL示例）

import pymysql
from sqlalchemy import create_engine

def save_to_mysql(df, table_name='weather_data'):
    engine = create_engine('mysql+pymysql://user:password@localhost:3306/weather_db')
    df.to_sql(table_name, engine, if_exists='append', index=False)

save_to_mysql(df)

5.3 定時爬?。ˋPScheduler）

from apscheduler.schedulers.blocking import BlockingScheduler

def job():
    print("開始執行定時爬取...")
    df = get_weather('101010100')
    save_to_mysql(df)

scheduler = BlockingScheduler()
scheduler.add_job(job, 'interval', hours=3)
scheduler.start()

六、數據分析與可視化

6.1 使用Pandas分析

# 讀取數據
df = pd.read_csv('weather_data.csv')

# 溫度分析
print(f"平均溫度: {df['溫度'].mean()}°C")
print(f"最高溫度: {df['溫度'].max()}°C")

6.2 使用Matplotlib可視化

import matplotlib.pyplot as plt

df['日期'] = pd.to_datetime(df['日期'])
df['最高溫'] = df['溫度'].str.extract('(\d+)').astype(int)

plt.figure(figsize=(10,5))
plt.plot(df['日期'], df['最高溫'], marker='o')
plt.title('北京近期氣溫變化')
plt.xlabel('日期')
plt.ylabel('溫度(°C)')
plt.grid()
plt.show()

七、項目優化建議

異常處理：增加網絡請求重試機制
日志記錄：使用logging模塊記錄運行狀態
分布式爬取：Scrapy-Redis框架
數據清洗：處理缺失值和異常值
遵守法律：控制請求頻率，避免對目標服務器造成壓力

八、完整項目結構示例

weather_crawler/
│── config.py         # API密鑰等配置
│── crawler.py        # 爬蟲主程序
│── requirements.txt  # 依賴庫
│── utils/            # 工具函數
│   ├── logger.py     # 日志配置
│   └── proxy.py      # 代理管理
└── data/             # 數據存儲
    ├── raw/          # 原始數據
    └── processed/    # 處理后的數據

結語

本文詳細介紹了使用Python爬取天氣數據的多種方法，包括靜態頁面爬取、動態頁面處理和API調用等。在實際應用中，建議優先選擇官方API，并始終遵守網絡爬蟲道德規范。通過合理的數據存儲和分析，天氣數據可以為企業決策和個人生活提供有價值的參考。

注意：本文示例代碼僅供學習參考，實際使用時請遵守相關網站的使用條款，合理控制爬取頻率。 “`

這篇文章包含了約2550字，采用Markdown格式編寫，涵蓋了： 1. 環境準備和工具選擇 2. 靜態/動態網頁爬取技術 3. API調用最佳實踐 4. 數據存儲和分析方法 5. 項目優化建議 6. 完整的代碼示例

可根據需要調整具體細節或補充更多高級技巧。

向AI問一下細節

推薦閱讀：

免責聲明：本站發布的內容（圖片、視頻和文字）以原創、轉載和分享為主，文章觀點不代表本網站立場，如果涉及侵權請聯系站長郵箱：is@yisu.com進行舉報，并提供相關證據，一經查實，將立刻刪除涉嫌侵權內容。

上一篇新聞：
如何使用Python爬取B站18000條黑神話悟空實機演示彈幕
下一篇新聞：
Android狀態圖遷移的方法是什么

猜你喜歡

AI
助
手

產品服務

地區劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網站二維碼

亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女