文章目录

  • Preparation
  • Step1: 安装Nvidia驱动
  • Step2 安装Cuda
  • Step3: 安装Cudnn
  • Step4: 安装Tensorflow-gpu包
  • Step5: 测试案例
  • Issues
  • Issue1
  • Issue2
  • Issue3
  • Issue4
  • Other: Linux 服务器版 NVIDIA 驱动安装
  • 1. Download Linux Server Version Drive
  • 2. Install
  • 3. nvidia-smi


Preparation

版本搭配如下:
GTX1050Ti + Cuda9.0 + Cudnn7.0.5 + Tensorflow-gpu1.8.0
安装前也许你需要看看Tensorflow、Cuda、Cudnn版本对应
cite 1cite 2

Step1: 安装Nvidia驱动

1.安装Nvidia驱动,参见我的另一篇Blog 2.下载Cuda和Cudnn,下载地址如下

Cuda官网

Cudnn官网

选择Cuda9.0 deb版

UBUNTU怎么调用多个GPU ubuntu gpu_ubuntu


选择Cudnn v7.0.5 Library for Linux and Cuda9.0

UBUNTU怎么调用多个GPU ubuntu gpu_ubuntu_02

Step2 安装Cuda

sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
sudo apt-key add /cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get update
sudo aptitude install cuda

如果输了第一条指令后,输出是这样

$ sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb
[sudo] fong 的密码: 
正在选中未选择的软件包 cuda-repo-ubuntu1704-9-0-local。
(正在读取数据库 ... 系统当前共安装有 185992 个文件和目录。)
正准备解包 cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb  ...
正在解包 cuda-repo-ubuntu1704-9-0-local (9.0.176-1) ...
正在设置 cuda-repo-ubuntu1704-9-0-local (9.0.176-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

那么你需要再执行一遍第一条指令

$ sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
$ sudo dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64.deb

OK,接下来配置环境变量

$ sudo gedit ~/.bashrc

输入

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

Step3: 安装Cudnn

安装Cudnn,很简单,解压、拷贝

tar -zxvf cudnn-9.0-linux-x64-v7.tgz 
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d 
sudo chmod a+r /usr/local/cuda/include/cudnn.h 
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Step4: 安装Tensorflow-gpu包

安装Tensorflow-gpu

sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==1.8

Step5: 测试案例

测试案例,如果import成功,打印出了GPU。Congratulation,You are done.

import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])     
matrix2 = tf.constant([[2.],[2.]])    
product = tf.matmul(matrix1, matrix2) 
sess = tf.Session()    
print(product)

2018-09-24 22:43:28.056291: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-24 22:43:28.139690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-24 22:43:28.140009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.23GiB
2018-09-24 22:43:28.140024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-24 22:43:28.321486: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-24 22:43:28.321529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-09-24 22:43:28.321551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-09-24 22:43:28.321709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2949 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)

Issues

如果不幸你遇到了下列错误,试着去解决它

Issue1

Error: cannot find Toolkit in /usr/local/cuda-9.0

很可能你下载的就是Cuda的 .run安装包,类似于cuda_9.0.176_384.81_linux.run,我的解决方案是下载deb安装包并按照官网提供的方式执行。

Issue2

$ sudo apt-get install -f cuda
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
有一些软件包无法被安装。如果您用的是 unstable 发行版,这也许是
因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件
包尚未被创建或是它们已被从新到(Incoming)目录移出。
下列信息可能会对解决问题有所帮助:

下列软件包有未满足的依赖关系:
 cuda : 依赖: cuda-9-0 (>= 9.0.176) 但是它将不会被安装
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。

先安装

sudo apt-get install aptitude

将 sudo apt-get install -f cuda 替换为 sudo aptitude install -f cuda

sudo aptitude install -f cuda

Issue3

ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory

这个错误很常见,不用担心。这个错误出现在你安装好Cuda、Cudnn、Tensorflow-gpu之后,在import tensorflow时出现,错误是在提示你,你需要安装Cudnn7.0.x系列,所以你需要:

sudo rm -rf /usr/local/cuda/include/cudnn.h$ sudo rm -rf /usr/local/cuda/lib64/libcudnn*

接下来去官网重新下载Cudnn、解压、拷贝

tar -zxvf cudnn-9.0-linux-x64-v7.tgz 
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/ 
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d 
sudo chmod a+r /usr/local/cuda/include/cudnn.h 
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Issue4

与上述2同理,参见Issue2

Loaded runtime CuDNN library: 7103 (compatibility version 7100)

Other: Linux 服务器版 NVIDIA 驱动安装

Linux 服务器版 NVIDIA 驱动安装(不适合桌面版)

1. Download Linux Server Version Drive

webpage: http://www.nvidia.cn/Download/index.aspx?lang=cn

2. Install

sh NVIDIA-Linux-x86_64-390.48.run

3. nvidia-smi

Run nvidia-smi for more gpu running details.

nvidia-smi

UBUNTU怎么调用多个GPU ubuntu gpu_Ubuntu日常安装_03