# Python中如何實現文字識別功能
文字識別(OCR,Optical Character Recognition)是計算機視覺領域的重要應用,Python憑借豐富的庫生態成為實現OCR的首選語言之一。本文將詳細介紹Python中實現文字識別的多種方法,涵蓋庫的選擇、代碼實現和性能優化技巧。
## 一、常用OCR庫介紹
### 1. Tesseract OCR
- **特點**:Google開源的OCR引擎,支持100+語言
- 安裝方法:
```bash
pip install pytesseract
sudo apt install tesseract-ocr # Linux
brew install tesseract # macOS
pip install easyocr
pip install paddleocr
import pytesseract
from PIL import Image
def basic_ocr(image_path):
img = Image.open(image_path)
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
return text
print(basic_ocr('sample.png'))
import easyocr
reader = easyocr.Reader(['ch_sim','en'])
result = reader.readtext('multi_lang.jpg', detail=0)
print('\n'.join(result))
import cv2
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True)
result = ocr.ocr('table.png', cls=True)
for line in result:
print(line[1][0])
import easyocr
import matplotlib.pyplot as plt
reader = easyocr.Reader(['en'])
results = reader.readtext('street_sign.jpg')
# 可視化結果
img = plt.imread('street_sign.jpg')
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(img)
for (bbox, text, prob) in results:
ax.plot([bbox[0][0], bbox[1][0], bbox[2][0], bbox[3][0], bbox[0][0]],
[bbox[0][1], bbox[1][1], bbox[2][1], bbox[3][1], bbox[0][1]], 'r')
ax.text(bbox[0][0], bbox[0][1], f'{text} ({prob:.2f})', color='blue')
plt.show()
def preprocess_image(image_path):
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
return thresh
from concurrent.futures import ThreadPoolExecutor
def batch_ocr(image_paths):
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(basic_ocr, image_paths))
return results
識別精度低
--psm
參數調整Tesseract的分頁模式pytesseract.image_to_string(img, config='--psm 6')
處理傾斜文字 “`python import numpy as np
def deskew(image): coords = np.column_stack(np.where(image > 0)) angle = cv2.minAreaRect(coords)[-1] if angle < -45: angle = -(90 + angle) else: angle = -angle M = cv2.getRotationMatrix2D((w//2, h//2), angle, 1.0) rotated = cv2.warpAffine(image, M, (w, h)) return rotated
## 六、應用案例
### 1. 文檔數字化系統
```python
import os
from pdf2image import convert_from_path
def pdf_to_text(pdf_path):
pages = convert_from_path(pdf_path, 500)
for i, page in enumerate(pages):
page.save(f'page_{i}.jpg', 'JPEG')
text = basic_ocr(f'page_{i}.jpg')
with open(f'output_{i}.txt', 'w') as f:
f.write(text)
def license_plate_recognition(image_path):
plate_cascade = cv2.CascadeClassifier('haarcascade_russian_plate_number.xml')
img = cv2.imread(image_path)
plates = plate_cascade.detectMultiScale(img, 1.1, 4)
for (x,y,w,h) in plates:
plate_img = img[y:y+h, x:x+w]
text = pytesseract.image_to_string(plate_img,
config='--psm 8 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
return text.strip()
Python實現文字識別既可以通過傳統OCR庫如Tesseract,也可以選擇基于深度學習的現代解決方案。根據具體場景選擇合適工具,配合圖像預處理和后處理,可以顯著提升識別準確率。未來隨著Transformer等新架構的應用,OCR技術將實現更強大的語義理解能力。 “`
注:本文代碼示例需要預先安裝相應依賴庫,實際運行時應根據具體環境調整參數。建議在虛擬環境中測試不同OCR方案,以獲得最佳性能表現。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。