溫馨提示×

如何在CentOS上配置PyTorch多GPU訓練

小樊
43
2025-09-16 02:50:15
欄目: 智能運維

在CentOS上配置PyTorch多GPU訓練需要以下幾個步驟:

1. 安裝CUDA和cuDNN

首先,確保你的系統已經安裝了NVIDIA GPU驅動,并且已經安裝了CUDA和cuDNN。

安裝NVIDIA驅動

sudo yum install epel-release
sudo yum install nvidia-driver-latest-dkms
sudo reboot

安裝CUDA

  1. 下載CUDA Toolkit:

    wget https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.2.89-1.x86_64.rpm
    sudo yum localinstall cuda-repo-rhel7-10.2.89-1.x86_64.rpm
    sudo yum clean all
    sudo yum install cuda
    
  2. 配置環境變量:

    echo 'export PATH=/usr/local/cuda-10.2/bin:$PATH' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
    source ~/.bashrc
    

安裝cuDNN

  1. 下載cuDNN庫(需要注冊NVIDIA開發者賬號):

  2. 解壓并安裝:

    tar -xzvf cudnn-10.2-linux-x64-v8.0.5.39.tgz
    sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
    sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
    sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*
    

2. 安裝PyTorch

使用pip安裝PyTorch,確保選擇與CUDA版本兼容的PyTorch版本。

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu102

3. 驗證安裝

驗證CUDA和PyTorch是否正確安裝并能檢測到GPU。

import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))

4. 配置多GPU訓練

在PyTorch中,可以使用torch.nn.DataParalleltorch.nn.parallel.DistributedDataParallel來進行多GPU訓練。

使用DataParallel

import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# 假設你有一個模型和一個數據加載器
model = YourModel()
model = nn.DataParallel(model)
model.to('cuda')

# 數據加載器
train_loader = DataLoader(your_dataset, batch_size=your_batch_size, shuffle=True)

for data, target in train_loader:
    data, target = data.to('cuda'), target.to('cuda')
    output = model(data)
    loss = nn.CrossEntropyLoss()(output, target)
    loss.backward()
    optimizer.step()

使用DistributedDataParallel

DistributedDataParallel通常用于更復雜的分布式訓練場景。

  1. 初始化分布式環境:

    import torch.distributed as dist
    import torch.multiprocessing as mp
    from torch.nn.parallel import DistributedDataParallel as DDP
    
    def train(rank, world_size):
        dist.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank)
        model = YourModel().to(rank)
        ddp_model = DDP(model, device_ids=[rank])
        optimizer = torch.optim.SGD(ddp_model.parameters(), lr=0.01)
    
        # 數據加載器
        train_sampler = torch.utils.data.distributed.DistributedSampler(your_dataset, num_replicas=world_size, rank=rank)
        train_loader = DataLoader(your_dataset, batch_size=your_batch_size, sampler=train_sampler)
    
        for epoch in range(num_epochs):
            train_sampler.set_epoch(epoch)
            for data, target in train_loader:
                data, target = data.to(rank), target.to(rank)
                output = ddp_model(data)
                loss = nn.CrossEntropyLoss()(output, target)
                loss.backward()
                optimizer.step()
    
    if __name__ == '__main__':
        world_size = torch.cuda.device_count()
        mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)
    
  2. 運行腳本時需要設置環境變量:

    export MASTER_ADDR='localhost'
    export MASTER_PORT='12345'
    python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE YOUR_TRAINING_SCRIPT.py
    

通過以上步驟,你應該能夠在CentOS上成功配置并運行PyTorch的多GPU訓練。

0
亚洲午夜精品一区二区_中文无码日韩欧免_久久香蕉精品视频_欧美主播一区二区三区美女