文章目录
- Preparation
- Step1: 安装Nvidia驱动
- Step2 安装Cuda
- Step3: 安装Cudnn
- Step4: 安装Tensorflow-gpu包
- Step5: 测试案例
- Issues
- Issue1
- Issue2
- Issue3
- Issue4
- Other: Linux 服务器版 NVIDIA 驱动安装
- 1. Download Linux Server Version Drive
- 2. Install
- 3. nvidia-smi
Preparation
版本搭配如下:
GTX1050Ti + Cuda9.0 + Cudnn7.0.5 + Tensorflow-gpu1.8.0
安装前也许你需要看看Tensorflow、Cuda、Cudnn版本对应
cite 1cite 2
Step1: 安装Nvidia驱动
1.安装Nvidia驱动,参见我的另一篇Blog 2.下载Cuda和Cudnn,下载地址如下
选择Cuda9.0 deb版
选择Cudnn v7.0.5 Library for Linux and Cuda9.0
Step2 安装Cuda
sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
sudo apt-key add /cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get update
sudo aptitude install cuda
如果输了第一条指令后,输出是这样
$ sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
[sudo] fong 的密码:
正在选中未选择的软件包 cuda-repo-ubuntu1704-9-0-local。
(正在读取数据库 ... 系统当前共安装有 185992 个文件和目录。)
正准备解包 cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb ...
正在解包 cuda-repo-ubuntu1704-9-0-local (9.0.176-1) ...
正在设置 cuda-repo-ubuntu1704-9-0-local (9.0.176-1) ...
The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
那么你需要再执行一遍第一条指令
$ sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
$ sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
OK,接下来配置环境变量
$ sudo gedit ~/.bashrc
输入
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
Step3: 安装Cudnn
安装Cudnn,很简单,解压、拷贝
tar -zxvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Step4: 安装Tensorflow-gpu包
安装Tensorflow-gpu
sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.8
Step5: 测试案例
测试案例,如果import成功,打印出了GPU。Congratulation,You are done.
import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)
sess = tf.Session()
print(product)
2018-09-24 22:43:28.056291: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-24 22:43:28.139690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-24 22:43:28.140009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.23GiB
2018-09-24 22:43:28.140024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-24 22:43:28.321486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-24 22:43:28.321529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-24 22:43:28.321551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-24 22:43:28.321709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2949 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Issues
如果不幸你遇到了下列错误,试着去解决它
Issue1
Error: cannot find Toolkit in /usr/local/cuda-9.0
很可能你下载的就是Cuda的 .run安装包,类似于cuda_9.0.176_384.81_linux.run,我的解决方案是下载deb安装包并按照官网提供的方式执行。
Issue2
$ sudo apt-get install -f cuda
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
有一些软件包无法被安装。如果您用的是 unstable 发行版,这也许是
因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件
包尚未被创建或是它们已被从新到(Incoming)目录移出。
下列信息可能会对解决问题有所帮助:
下列软件包有未满足的依赖关系:
cuda : 依赖: cuda-9-0 (>= 9.0.176) 但是它将不会被安装
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。
先安装
sudo apt-get install aptitude
将 sudo apt-get install -f cuda 替换为 sudo aptitude install -f cuda
sudo aptitude install -f cuda
Issue3
ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
这个错误很常见,不用担心。这个错误出现在你安装好Cuda、Cudnn、Tensorflow-gpu之后,在import tensorflow时出现,错误是在提示你,你需要安装Cudnn7.0.x系列,所以你需要:
sudo rm -rf /usr/local/cuda/include/cudnn.h$ sudo rm -rf /usr/local/cuda/lib64/libcudnn*
接下来去官网重新下载Cudnn、解压、拷贝
tar -zxvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Issue4
与上述2同理,参见Issue2
Loaded runtime CuDNN library: 7103 (compatibility version 7100)
Other: Linux 服务器版 NVIDIA 驱动安装
Linux 服务器版 NVIDIA 驱动安装(不适合桌面版)
1. Download Linux Server Version Drive
webpage: http://www.nvidia.cn/Download/index.aspx?lang=cn
2. Install
sh NVIDIA-Linux-x86_64-390.48.run
3. nvidia-smi
Run nvidia-smi for more gpu running details.
nvidia-smi