前言
很多时候配置深度学习的环境都会遇到这样一个问题,就是参考的不同的开源代码所用的环境不一定相同,特别是CUDA环境,一般会有CUDA9.0、CUDA10.0、CUDA10.1等版本。所对应的cuDNN也会不同。本文是在已安装CUDA10.0+cudnn7.6.4的基础上,加装CUDA9.0+cudnn7.3.1。
一、gcc降级
由于CUDA 9.0仅支持gcc6.0及以下版本,而Ubuntu 18.04预装GCC版本为7.3,因此需要手动进行降级,这里采用4.8版本。
sudo apt-get install gcc-4.8
sudo apt-get install g++-4.8
装完后进入到/usr/bin目录下
ls -l gcc*
会显示/usr/bin/gcc -> gcc-7.0
,发现gcc链接到gcc-7.0, 需要将它改为链接到gcc-4.8,方法如下
sudo mv gcc gcc.bak #备份
sudo ln -s gcc-4.8 gcc #重新链接
同理,对g++也做同样的修改,需要将g++链接改为g+±4.8,
sudo mv g++ g++.bak
sudo ln -s g++-4.8 g++
再查看gcc和g++版本号
gcc -v
g++ -v
均显示gcc version 4.8
,说明gcc 4.8安装成功。
二、安装CUDA9.0
先到NVIDIA官网CUDA9.0下载页面下载runfile,选择ubuntu16.04。(注:18.04版本的系统能够安装16.04版本对应的CUDA)
文件下载后用以下指令安装
sudo chmod a+x cuda_9.0.176_384.81_linux.run
sudo ./cuda_9.0.176_384.81_linux.run --no-opengl-libs
由于之前装CUDA10.0前已经安装了显卡驱动,所以在提问是否安装显卡驱动时选择no,其他的选择默认路径或者yes即可。
----------------------------------------------------------------------------------
Do you accept the previously read EULA?
accept/decline/quit: accept # 接受CUDA安装的协议
You are attempting to install on an unsupported configuration. Do you wish to continue?
(y)es/(n)o [ default is no ]: y
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n # 由于已经安装显卡驱动,选择n
Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-9.0 ]: # 工具包安装地址,默认回车即可
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: n # 添加链接**注意这个连接,因为安装过另一个版本的cuda10.0,这里就建议选no,因为指定该链接后会将cuda指向这个新的版本**
Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ubuntu ]:
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Missing recommended library: libGL.so
Installing the CUDA Samples in /home/ubuntu ...
Copying samples to /home/ubuntu/NVIDIA_CUDA-9.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-9.0
Samples: Installed in /home/ubuntu, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-9.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 9.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_14444.log
# ***至此安装完成***
安装完毕之后,修改环境变量
sudo vim /etc/profile
在.bashrc
中添加以下路径后source ~/.bashrc
:
export PATH=/usr/local/cuda-9.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:$LD_LIBRARY_PATH
export CUDA_HOME="/usr/local/cuda-9.0:$CUDA_HOME"
或者
【推荐采用】直接指定软链接后的(便于后续CUDA版本切换,只需要重建CUDA软链接而不用改路径):
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME="/usr/local/cuda:$CUDA_HOME"
三、安装cuDNN
下载对应的cudnn,如cudnn-9.0-linux-x64-v7.3.1.20.tgz
。
解压
tar -zxvf cudnn-9.0-linux-x64-v7.3.1.20.tgz
将相关文件复制到CUDA路径
sudo cp cuda/include/cudnn.h /usr/local/cuda-9.0/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/
sudo chmod a+r /usr/local/cuda-9.0/include/cudnn.h
sudo chmod a+r /usr/local/cuda-9.0/lib64/libcudnn*
四、CUDA版本选择
(1)将CUDA软链接到新安装的CUDA-9.0,
cd /usr/local
sudo rm -rf cuda #删除之前创建的软链接
sudo ln -s cuda-9.0 cuda #重建软链接
查看当前 cuda 版本
nvcc -V
nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2017 NVIDIA
Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation
tools, release 9.0, V9.0.176
查看cudnn信息
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
(2)切换到CUDA10.0
sudo rm -rf /usr/local/cuda #删除之前创建的软链接
sudo ln -s /usr/local/cuda-10.0/ /usr/local/cuda
nvcc -V #查看当前 cuda 版本
五、后记
(1)CUDA9.0安装时还有4个补丁包,可以视情况安装。主要是针对cuBLAS的更新。当然既然官方给了补丁包,肯定是安装最稳妥了。
(2)cuDNN9.0下载时还有3个附加包,可以视情况选择安装。
cuDNN v7.3.1 Runtime Library for Ubuntu16.04 (Deb)
cuDNN v7.3.1 Developer Library for Ubuntu16.04 (Deb)
cuDNN v7.3.1 Code Samples and User Guide for Ubuntu16.04 (Deb)
(3)导入TensorFlow时报错ImportError: /usr/local/cuda-9.0/lib64/libcudnn.so.7: file too short
是动态库链接出了问题,
首先进入到/usr/local/cuda/lib64
下执行rm libcudnn.so.7 libcudnn.so.7.3.1
,
然后切换到下载的cudnn7.3.1.20(该压缩包解压后名称为cuda)目录执行cp libcudnn.so.7.3.1 /usr/local/cuda/lib64
,
最后切回 /usr/local/cuda/lib64
再执行ln -s libcudnnn.so.7.3.1 libcudnn.so.7
。