在CentOS上配置PyTorch多GPU訓練需要以下幾個步驟:
首先,確保你的系統已經安裝了NVIDIA GPU驅動,并且已經安裝了CUDA和cuDNN。
sudo yum install epel-release
sudo yum install nvidia-driver-latest-dkms
sudo reboot
下載CUDA Toolkit:
wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.2.89-1.x86_64.rpm
sudo yum localinstall cuda-repo-rhel7-10.2.89-1.x86_64.rpm
sudo yum clean all
sudo yum install cuda
配置環境變量:
echo 'export PATH=/usr/local/cuda-10.2/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
下載cuDNN庫(需要注冊NVIDIA開發者賬號):
解壓并安裝:
tar -xzvf cudnn-10.2-linux-x64-v8.0.5.39.tgz
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
使用pip安裝PyTorch,確保選擇與CUDA版本兼容的PyTorch版本。
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu102
驗證CUDA和PyTorch是否正確安裝并能檢測到GPU。
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
在PyTorch中,可以使用torch.nn.DataParallel
或torch.nn.parallel.DistributedDataParallel
來進行多GPU訓練。
DataParallel
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
# 假設你有一個模型和一個數據加載器
model = YourModel()
model = nn.DataParallel(model)
model.to('cuda')
# 數據加載器
train_loader = DataLoader(your_dataset, batch_size=your_batch_size, shuffle=True)
for data, target in train_loader:
data, target = data.to('cuda'), target.to('cuda')
output = model(data)
loss = nn.CrossEntropyLoss()(output, target)
loss.backward()
optimizer.step()
DistributedDataParallel
DistributedDataParallel
通常用于更復雜的分布式訓練場景。
初始化分布式環境:
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
def train(rank, world_size):
dist.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)
model = YourModel().to(rank)
ddp_model = DDP(model, device_ids=[rank])
optimizer = torch.optim.SGD(ddp_model.parameters(), lr=0.01)
# 數據加載器
train_sampler = torch.utils.data.distributed.DistributedSampler(your_dataset, num_replicas=world_size, rank=rank)
train_loader = DataLoader(your_dataset, batch_size=your_batch_size, sampler=train_sampler)
for epoch in range(num_epochs):
train_sampler.set_epoch(epoch)
for data, target in train_loader:
data, target = data.to(rank), target.to(rank)
output = ddp_model(data)
loss = nn.CrossEntropyLoss()(output, target)
loss.backward()
optimizer.step()
if __name__ == '__main__':
world_size = torch.cuda.device_count()
mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)
運行腳本時需要設置環境變量:
export MASTER_ADDR='localhost'
export MASTER_PORT='12345'
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE YOUR_TRAINING_SCRIPT.py
通過以上步驟,你應該能夠在CentOS上成功配置并運行PyTorch的多GPU訓練。