1、联网版:先创建pytorch环境:conda create -n ljj_torch112 python=3.8


torch gpu版本装没装上检测_torch gpu版本装没装上检测

先看自己的cuda版本:(最权威的看:nvcc --version)

torch gpu版本装没装上检测_CUDA_02



torch gpu版本装没装上检测_torch gpu版本装没装上检测_03

激活环境:conda activate ljj_torch112 


torch gpu版本装没装上检测_深度学习_04



Start Locally | PyTorch

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=10.2 -c pytorch

torch gpu版本装没装上检测_CUDA_05

torch gpu版本装没装上检测_torch gpu版本装没装上检测_06

torch gpu版本装没装上检测_CUDA_07

 1.2.1 查看torch、cuda等的版本

torch gpu版本装没装上检测_服务器_08


Deep Graph Library (dgl.ai)

Linux 64 :: Anaconda.org

torch gpu版本装没装上检测_深度学习_09

 conda install -c dglteam/label/cu102 dgl

torch gpu版本装没装上检测_深度学习_10


#conda install cudatoolkit
pip install scipy
pip install normflows==1.4
pip install tensorboardx==2.5.1
pip install tqdm

pip install torchtext==0.4.0

pip install scikit-learn
pip install pandas

pip install wandb

script -f a.log


sudo apt install git

git clone https://github.com/NVIDIA/apex.git(我使用了git,学校服务器不行,所以选择下面的wget)


torch gpu版本装没装上检测_深度学习_11

截图:(参考:linux安装nvidia/apex - 知乎 (zhihu.com)

torch gpu版本装没装上检测_人工智能_12

torch gpu版本装没装上检测_torch gpu版本装没装上检测_13


【【【【【【【(下载apex:) 如果git clone不行的话,再试试wget:

wget https://codeload.github.com/NVIDIA/apex/zip/refs/heads/master -O master.zip
#解压缩 unzip  master.zip -d /mnt/hdd1/ljj/apex

torch gpu版本装没装上检测_torch gpu版本装没装上检测_14


cd /mnt/hdd1/allusers/ljj/4other/apex/apex-master 

torch gpu版本装没装上检测_CUDA_15

 python3 setup.py install

torch gpu版本装没装上检测_人工智能_16

 来源:linux安装nvidia/apex - 知乎 (zhihu.com)

查看apex有没有安装成功:python               import apex

torch gpu版本装没装上检测_torch gpu版本装没装上检测_17

torch gpu版本装没装上检测_深度学习_18


参考: apex安装方法_51CTO博客_steam怎么下apex

torch gpu版本装没装上检测_CUDA_19


如果代码中下载:pip install torchscale

pip install torchscale

pip install torchtext==0.4.0

1.3.1  可以用conda list查看DGL是否下载成功

torch gpu版本装没装上检测_CUDA_20


torch gpu版本装没装上检测_CUDA_21


2.1 上传torch、dgl文件(红框是dgl放置的地方,dgl就不用pip install 了)


torch gpu版本装没装上检测_torch gpu版本装没装上检测_22


torch gpu版本装没装上检测_CUDA_23

torch gpu版本装没装上检测_torch gpu版本装没装上检测_24

 2.2 下载torch安装包(dgl就不用pip install 了)

torch gpu版本装没装上检测_torch gpu版本装没装上检测_25

torch gpu版本装没装上检测_torch gpu版本装没装上检测_26



conda install cudatoolkit
pip install scipy
pip install normflows==1.4
pip install tensorboardx==2.5.1
pip install tqdm

script -f a.log

CUDA_VISIBLE_DEVICES="1"  python main.py --dataset Wiki-One --data_path ./Wiki --few 5 --data_form Pre-Train --prefix np_rgcn_attn_planar_wiki_5shot_intrain_g_batch_1024_eval_8 --device 1 --batch_size 32 --flow Planar -dim 50 --g_batch 1024 --eval_batch 8 --eval_epoch 2000


结果运行报错!OSError: libcublas.so.11: cannot open shared object file: No such file or directory

(265条消息) OSError: libcublas.so.11: cannot open shared object file: No such file or directory【import onnx报错】_墨理学AI的博客-CSDN博客

torch gpu版本装没装上检测_CUDA_27

 解决方法:conda install cudatoolkit

torch gpu版本装没装上检测_深度学习_28



torch gpu版本装没装上检测_服务器_29

torch gpu版本装没装上检测_人工智能_30


untimeError: CUDA out of memory. Tried to allocate 1.25 GiB (GPU 0; 14.76 GiB total capacity; 10.37 GiB already allocated; 1.14 GiB free; 12.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF


torch gpu版本装没装上检测_torch gpu版本装没装上检测_31


torch gpu版本装没装上检测_torch gpu版本装没装上检测_32

torch gpu版本装没装上检测_人工智能_33


torch gpu版本装没装上检测_深度学习_34

# Your code before the problematic line

# Code leading to CUDA out-of-memory error
x = torch.cat([edges.src['h'], edges.data['feat'], edges.dst['feat']], dim=1)

# Release GPU memory

# Continue with the rest of your code
# ...

# Your code after the problematic line


(258条消息) wandb使用前提 - 注册,登陆_wandb注册_无脑敲代码,bug漫天飞的博客-CSDN博客


torch gpu版本装没装上检测_torch gpu版本装没装上检测_35

torch gpu版本装没装上检测_CUDA_36

torch gpu版本装没装上检测_服务器_37


torch gpu版本装没装上检测_人工智能_38



torch gpu版本装没装上检测_服务器_39

import wandb
wandb.watch_called = False  # Re-run the model without restarting the runtime, unnecessary after our next release


torch gpu版本装没装上检测_服务器_40

wandb.log({ "Examples": example_images, "Test Accuracy": 100. * correct / len(test_loader.dataset), "Test Loss": test_loss })


torch gpu版本装没装上检测_人工智能_41

config = wandb.config  # Initialize configconfig.batch_size = 4  # input batch size for training (default:64)
config.test_batch_size = 10  # input batch size for testing(default:1000)
config.epochs = 50  # number of epochs to train(default:10)
config.lr = 0.1  # learning rate(default:0.01)
config.momentum = 0.1  # SGD momentum(default:0.5)
config.no_cuda = False  # disables CUDA training
config.seed = 42  # random seed(default:42)
config.log_interval = 10  # how many batches to wait before logging training status


torch gpu版本装没装上检测_torch gpu版本装没装上检测_42

torch gpu版本装没装上检测_服务器_43

wandb.watch(model, log="all")
     for epoch in range(1, config.epochs + 1):
         train(config, model, device, train_loader, optimizer, epoch)
         test(config, model, device, test_loader, classes)
     torch.save(model.state_dict(), 'model.h5')

torch gpu版本装没装上检测_torch gpu版本装没装上检测_44


torch gpu版本装没装上检测_服务器_45

torch gpu版本装没装上检测_torch gpu版本装没装上检测_46


比如我想本机下载torch_geometric的whl文件,然后再用学校服务器再pip install:

torch gpu版本装没装上检测_深度学习_47


torch gpu版本装没装上检测_CUDA_48


torch gpu版本装没装上检测_CUDA_49

torch gpu版本装没装上检测_服务器_50



torch gpu版本装没装上检测_服务器_51

torch gpu版本装没装上检测_服务器_52

torch gpu版本装没装上检测_人工智能_53



torch gpu版本装没装上检测_CUDA_54