2019.02.19:
查看显卡驱动: nvidia-smi
查看cuda版本:cat /usr/local/cuda/version.txt
查看cudnn版本:cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
2018.10.02: update tensorflow version with pip: pip install tensorflow==1.5
# Windows 下的环境配置
制作Windows10 的usb启动盘;
BIOS设置 Boot Device Control Legacy OPROM only -> UEFI only
安装完成后,Boot Device Control 变更为:UEFI and Legacy OPROM (在bios里面选择启动哪个系统)
1. cuda 安装: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows
验证:控制台窗口运行下面的程序:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\demo_suite\deviceQuery.exe
Device 0: "GeForce GTX 1070"
CUDA Driver Version / Runtime Version 9.0/9.0
CUDA Capability Major/Minor version number: 6.1
.....
Result = PASS
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\extras\demo_suite\bandwidthTest.exe
Result = PASS
2. cudnn安装:
#--------------------------------------------------------------------------
system configuration
OS: 16.04.3 LTS (Xenial Xerus) # command:test@test-ML:~$ cat /etc/os-release
Processor: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz # command: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Graphic card: GeForce GTX 1070 # command: nvidia-smi
#--------------------------------------------------------------------------
tensorflow-gpu-1.5 required cuda 9.0
#--------------------------------------------------------------------------
Method 1: not the official method; below method came from my colleague last company, i checked it on one PC which seems work.
第一步:安装nvidia显卡驱动
首先要做的就是更新Ubuntu16.04的源,终端输入
cd /etc/apt/ # change current directory to apt directory
sudo cp sources.list sources.list.bak # backup sources.list file
sudo vi sources.list # edit sources.list; add below sources into source.list(speed up download from sources)
debhttp://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universemultiverse
debhttp://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricteduniverse multiverse
debhttp://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricteduniverse multiverse
debhttp://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricteduniverse multiverse
debhttp://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricteduniverse multiverse
deb-srchttp://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universemultiverse
deb-srchttp://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricteduniverse multiverse
deb-srchttp://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricteduniverse multiverse
deb-srchttp://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricteduniverse multiverse
deb-srchttp://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricteduniverse multiverse
最后更新源和更新已安装的包,在终端中输入:
sudo apt-get update
sudo apt-get upgrade
我们使用add-apt-repository脚本添加英伟达驱动ppa到当前库中并且自动导入公钥。
sudo add-apt-repository ppa:graphics-drivers/ppa
回车后继续
sudo apt-get update
sudo apt-get install nvidia-367 # using the latest version from the web: https://launchpad.net/~graphics-drivers
# /+archive/ubuntu/ppa
sudo apt-get install mesa-common-dev # I do not know why?
sudo apt-get install freeglut3-dev # I do not know why?
安装完成后就可以重启系统,然后GTX1070显卡驱动就会生效,重启完以后进行测试:
终端输入:
nvidia-smi
第二步:cuda安装
所需文件:download cuda_8.0.61_375.26_linux.run from Nvidia web
安装cuda8.0
进入到cuda_8.0.61_375.26_linux.run所在目录,在终端执行命令如下:
sudo sh cuda_8.0.61_375.26_linux.run –override
按下上面命令后,安装程序随即便启动了。接下来执行这几步操作:
1、一直按空格到最后,然后输入accept接受条款。
2、输入n不安装nvidia图像驱动,因为我们先前已经安装过了。
3、输入y安装cuda8.0工具。
4、回车确认cuda默认安装路径:/user/local/cuda-8.0。
5、输入y用sudo权限运行安装,输入密码。
6、可以输入y或者n安装或者不安装指向/usr/local/cuda的符号链接。
7、输入y安装CUDA8.0Samples,以便后面进行测试。
8、回车确认CUDA8.0Samples默认安装路径:/home/自己的用户名,该安装路径测试完就可以删除了。
安装cudnnv6
Download file from Nvidia web
先进入到cudnn....tgz所在目录,然后解压该文件:
tar zxvf cudnn-8.0-linux-x64-v6.0.tgz
解压后会产生一个cuda目录,进入cuda/include目录下,执行如下操作:
cd cuda/include/
sudo cp cudnn.h /usr/local/cuda/include/ #复制头文件
再进入cuda/lib64目录下,执行如下操作:
cd ../lib64 #打开lib64目录
sudo cp lib* /usr/local/cuda/lib64/ #复制库文件
给所有用户增加这些文件的读权限
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
建立软链接
首先在终端输入如下命令:
cd /usr/local/cuda/lib64/
sudo rm -rf libcudnn.so libcudnn.so.5
sudo ln -s libcudnn.so.5.1.5 libcudnn.so.5 # version number may different depends on download
sudoln -s libcudnn.so.5 libcudnn.so
设置环境变量,终端输入
sudo gedit /etc/profile
在末尾加入
PATH=/usr/local/cuda/bin:$PATH
export PATH
保存后,创建链接文件
sudo vim /etc/ld.so.conf.d/cuda.conf
按a进入插入模式,增加下面一行
/usr/local/cuda/lib64 # load cuda libraries into cache; may check with ldconfig -p
最后在终端输入sudo ldconfig使链接生效
**************************2018.03.13****************************************
GTX1070 CUDA CUDNN installation
$PATH
bash: /home/test/anaconda3/bin:/home/test/bin:/home/test/.local/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin: No such file or directory
method:
test@test-ML:~$ gedit /etc/profile
PATH=/usr/local/cuda/bin:$PATH
export PATH
test@test-ML:~$ sudo vim /etc/ld.so.conf.d/cuda.conf
/usr/local/cuda/lib64
**************************2018.03.13****************************************
cudaSamples测试
打开cuda8.0 Samples默认安装路径,终端输入
cd /home/username/NVIDIA_CUDA-8.0_Samples #username是自己的用户名
sudo make all -j4 #4核
等待编译完成
完成后继续向终端输入
cd bin/x86_64/linux/release
./deviceQuery
- 查看显卡驱动的信息
cat /proc/driver/nvidia/version
NVRM version:: NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04..6)
- 编译Examples后,运行下面的命令,看是否Pass
~/NVIDIA_CUDA-8.0_Samples/bin/x86_64/release $ ./deviceQuery
~/NVIDIA_CUDA-8.0_Samples/bin/x86_64/release $ ./bandwithTest
- PATH里面有cuda的路径: /usr/local/cuda/bin; 但是没有LD_LIBRARY_PATH
硬件:
CPU:Intel i7-7700
显卡:GTX1070
操作系统: Ubuntu 16.04
配置环境: GPU模式的TensorFlow
配置概要:
操作系统安装 Ubuntu 16.04
GTX 1070 驱动安装
CUDA8.0 安装
cuDNN v6 安装
Anaconda 安装
GPU版本的TensorFlow安装
1. 操作系统安装
安装非EUFI模式启动的Ubuntu 16.04
BIOS设定为Legacy only
安装时没有选择下载更新,也没有选择安装第三方软件
2. 参照NVIDIA的CUDA安装说明 http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A
部分步骤如下:
Reboot into text mode (runlevel 3).
方法1:
sudo systemctl set-default multi-user.target
还原:sudo systemctl set-default graphical.target
查看目前的设置:sudo systemctl get-default
方法2:sudo service lightdm stop
确认方法:alt+ctrl+F7 不能切换到图形桌面
如果预先已经有安装过cuda,NVIDIA显卡驱动,重装前如果需要卸载的化,参照下面的方法:
Use the following command to uninstall a Toolkit runfile installation:
$ sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
Use the following command to uninstall a Driver runfile installation:
$ sudo /usr/bin/nvidia-uninstall
Nouveau drivers disable的确认方法:
lsmod | grep nouveau
上面的命令执行的时候没有显示任何信息
安装cuda8.0
$ sudo sh cuda_<version>_linux.run -override -no-opengl-lib
设备结点确认:
参考事项:
- Disable secure UEFI boot
Linux won't boot into desktop once graphics drivers are installed - Almost certainly this is due to the fact you didn't disable Secure UEFI Boot in your motherboard. You will need to disable this setting, uninstall the Nvidia drivers and then reinstall them. This will require you to use the built-in terminals which can be accessed by pressing Ctrl+Alt+F1 at the login screen. You will need to run the commands to uninstall the Nvidia drivers and reinstall the native drivers that come with Ubuntu first.
- NVIDIA GPU显卡最新的驱动版本查看网页:
https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
- 增加的环境变量;环境变量设置后需要re-source或者关闭终端重新打开
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- 驱动安装成功的确认命令;下面的命令能够输出显卡的正确信息;
nvidia-smi
- 确认驱动版本:
$ cat /proc/driver/nvidia/version
- CUDA 安装确认信息
~/NVIDIA_CUDA-X.X_Samples/bin/deviceQuery
./deviceQuery # 该命令能够测试设备
- bandwidthTest程序可以测试带宽
- Anaconda下载地址:https://repo.continuum.io/archive/ ;
下载方法:wget http://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh (具体的文件名需要修改)
- 创建虚拟机(前提是Anaconda已经安装)
$ conda create -n tensorflow python=3.6
$ source activate tensorflow # Windows 下面的命令为 activate tensorflow
- TensorFlow的安装方法:如果下载不了,需要找其它的方法,文件名需要根据实际情况修改
(tensorflow)$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.4.0-cp36-cp36m-linux_x86_64.whl
( CPU mode : pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp36-cp36m-linux_x86_64.whl )
- Nvidia-smi cannot be found - It is likely you didn't add the environment variables to your .bashrc file—or you didn't re-source the file or re-open the Terminal.
感谢下面的文章:
https://www.quantstart.com/articles/installing-tensorflow-on-ubuntu-1604-with-an-nvidia-gpu
https://www.linkedin.com/pulse/installing-nvidia-cuda-80-ubuntu-1604-linux-gpu-new-victor/
2018.03.08 there are 4 files under /dev related to gpu as below
nvidia0
nvidiactl
nvidia-modeset
nvidia-uvm
CUDA is a parallel computing platform and programming model invented by NVIDIA.It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit(GPU)