如何创建不同的python,cuda版本环境

Linux

我用得是Ubuntu环境,确切得来说是windows for Linux 2.0。本文采用得是conda创建不同的python环境。强烈不建议在系统直接装cuda,如果你需要复现多篇环境不同的论文。

用conda设置python版本

有些来自2020的远古论文且带有源码(神经网络的久远是指一年左右),如果直接用你自己的python环境,包非常容易冲突。

检查conda版本

conda -V

无法查到minconda版本的解决方法

设置新的python环境

conda create --name softnet_spotme -c anaconda python=3.7.16

激活这个环境

conda activate softnet_spotme

不使用这个环境

conda deactivate

删除这个环境

conda remove -n softnet_spotme --all
成功激活这个环境后的效果(这个很重要,如果换了环境就安装到别的地方了)
(softnet_spotme) zhutianci@DESKTOP-M29UJV1:~$

源码巨坑

平台导致的问题

原因我用得是Linux,他用得是Windows

ERROR: Could not find a version that satisfies the requirement pywin32==227 (from versions: none)
ERROR: No matching distribution found for pywin32==227

所以把requirement.txt里的pywin32删了。

连库都没有的包
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.4.1 (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.12.0)

如果你有一些包找不到,有可能是python版本高了,这篇文章应该用3.7~3.8版本。

dlib问题

如果你运行以下命令的时候,有可能导致各种错误。

pip install dlib==19.21.1

我这个方法应该不通用,我是把系统里的cuda版本删除之后就好了。还有一种解决方案是安装cmake。

cuda巨坑(cuda版本不同有可能导致各种错误)

删除系统内的cuda

Python 使用CUDA对视频进行增稳_python

Ubuntu清理源(不建议直接在系统里面装cuda)

在你安装cuda的时候,有一些教程会在你的Ubuntu里加入各种源,后面会造成各种冲突。

https://askubuntu.com/questions/307/how-can-ppas-be-removed

我用得是这个方法,但是不要把nvidia的驱动给删了。但是删了也没事,应该好装。

Python 使用CUDA对视频进行增稳_开发语言_02

我建议还是用conda以安装多个不同的版本。

在当前环境下安装cuda

记住,一旦你换了cuda环境,下面的安装的就失效了。

安装cudatoolkit这个根据你的tensorflow版本去查

conda install -c conda-forge cudatoolkit=11.1

安装cudnn

pip install nvidia-cudnn-cu11==8.6.0.163

安装tensorflow

pip install tensorflow==2.4.1

创建配置文件,不放心可以先$CONDA_PREFIX一下。

mkdir -p $CONDA_PREFIX/etc/conda/activate.d

往里面写CUDNN_PATH

echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn;print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

再写LD_LIBRARY_PATH

echo 'export LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$CUDNN_PATH/lib:$LD_LIBRARY_PATH' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

source一下

source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
nvcc运行不了?

这个要安装cudatoolkit-dev

conda install -c conda-forge cudatoolkit-dev=11.1

一些原因导致GPU用不了

一些文件没法load,找全局文件

sudo find / -name 'ibcusolver.so.10'

找一些特定文件夹

find . -type f -path '/home/zhutianci/miniconda3/envs/softnet_spotme/lib' -name 'libcusolver.so.10'

如果找到里就加到$LD_LIBRARY_PATH。但是一般找不到,这里可以创建硬链接

cd $LD_LIBRARY_PATH
sudo ln libcusolver.so.11 libcusolver.so.10  # hard link

跑模型之前验证GPU是否可用

python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

确保没什么报错,我这里的WSL2没得那个NUMA支持,这个支持NUMA感觉很麻烦,要自己编译内核。

2023-07-30 13:28:14.671071: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-07-30 13:28:15.338384: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-07-30 13:28:15.466891: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-30 13:28:15.466941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: NVIDIA GeForce RTX 3070 computeCapability: 8.6
coreClock: 1.725GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-07-30 13:28:15.466963: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-07-30 13:28:15.468342: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-07-30 13:28:15.468382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2023-07-30 13:28:15.468841: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-07-30 13:28:15.468974: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-07-30 13:28:15.470469: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2023-07-30 13:28:15.470803: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-07-30 13:28:15.470886: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-07-30 13:28:15.470959: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-30 13:28:15.471000: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:927] could not open file to read NUMA node: /sys/bus/pci/devices/0000:09:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-07-30 13:28:15.471022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0

windows

找不到某个文件

错误:

Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

如果是conda就去这个目录,每个电脑都不一定相同

cd D:\Softwares\miniconda3\envs\SFAMNet\Library\bin

用管理员身份打开终端,输入下面命令:

New-Item -ItemType SymbolicLink -Path .\cusolver64_10.dll -Target .\cusolver64_11.dll

设置pycharm terminal

cmd.exe "/K" "D:\Softwares\miniconda3\Scripts\activate.bat"
"D:\Softwares\miniconda3"

安装nvcc

conda install -c "nvidia/label/cuda-11.3.0" cuda-nvcc

验证pytorch

python -c "mport torch; print(torch.cuda.is_available())"