从零安装深度学习环境Ubuntu16.0.4+TensorFlow0.11

主机配置: CPU:E3 1230 V5+ GPU:EVGA GTX1080 8G SC ACX 3.0 内存:DDR4 2133 8G 两根 主板:技嘉X150M-PRO-ECC

选择Ubuntu 16 LTS,因为它是一个长期支持版本,而且我的硬件比较新,可能驱动方面在支持和兼容性上面可能会更好 另外选择这块主板一个原因是M.2的SSD接口,结果兼容性问题很严重,网上很多都在吐槽二次启动问题,没想到中标了,最后放弃了M.2,老老实实用SATA3.0。

概览

安装Ubuntu 16.0.4

配置系统编译环境

编译安装TensorFlow

安装Ubuntu 16.0.4

由于本人是两块硬盘,准备安装双系统。先在第一块硬盘装好win10,然后把下载好的Ubuntu ISO 文件烧写到U盘,修改系统BIOS,把U盘启动顺序设置到第一然后重启,重启完成后根据安装提示一步一步往下走就行,在选择系统语言的步骤最好选择英文,少折腾。

可能会遇到的问题:

  • 安装完成重启后黑屏
  • 安装完成后登录无法进入桌面

安装完成重启后黑屏

由于我是双系统,在开机后显示引导菜单时候按e按钮进入编辑grub,找到quiet splash,修改为 quiet splash nomodeset,就是在末尾添加nomodeset,然后按F10键引导。如果进入到登录界面,按住ctrl+alt+f1,进入命令行登录,输入用户名密码后,编辑sudo vi /etc/default/grub 文件,找到如下行:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"

修改为:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset"

保存后,重启

sudo reboot

安装ubuntu16.0.4后无法进入桌面

不要急,按住ctrl+alt+f1,进入命令行登录,然后第一件事,更新source,大局域网,你懂的:)

sudo vi /etc/apt/sources.list
sudo vi /etc/apt/sources.list

如果你不习惯,或者是linux小白,可以用nano编辑器来修改:

sudo nano /etc/apt/sources.list
sudo nano /etc/apt/sources.list

添加mirrors.163.com的源,ubuntu 16的代号 xenial

deb http://mirrors.163.com/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.163.com/ubuntu/ xenial-backports main restricted universe multiverse

我个人感觉电信宽带用163的镜像源会更快一点,如果是教育网,可以用中科大的源。

修改完成后保存,apt update,然后upgrade

sudo apt-get update
sudo apt-get upgrade
sudo apt-get update
sudo apt-get upgrade

然后升级内核(安装好后是4.4,建议升级到4.6.7),此步骤可以跳过 先看看内核版本:

uname -r

wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-headers-4.6.7-040607_4.6.7-040607.201608160432_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-headers-4.6.7-040607-generic_4.6.7-040607.201608160432_amd64.deb
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6.7/linux-image-4.6.7-040607-generic_4.6.7-040607.201608160432_amd64.deb
sudo dpkg -i linux-*.deb sudo update-grub
sudo reboot now

重启完成后开始安装显卡驱动了。(我这个地方是gtx1080的显卡,选择nvidia-367驱动)

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev
sudo reboot
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-367
sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev
sudo reboot

完成后重启,应该能进入桌面了,电脑分辨率也正常了。(我的带鱼屏2560X1080)

配置系统编译环境

下载安装CUDA 8.0.44(Nvidia下载 或者 百度网盘下载)

sudo sh cuda_8.0.44_linux.run
sudo sh cuda_8.0.44_linux.run

开始安装后会不断询问安装内容,请一定要注意

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.**?

(y)es/(n)o/(q)uit: n

这个步骤一定要选择no,否者前面最新的显卡驱动就白装了(如果实在不小心踩了这个坑,没关系,把前面步骤的显卡驱动重新装一次,安装前先卸载)

完成后注意看提示,如果有问题可以参考这篇blog(我没遇到)

配置环境变量:

nano ~/.bashrc
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

如果在桌面的Terminal配置环境变量,完成后exit下,再进入让环境变量生效,如果在系统命令行模式,可以手动执行以下上面的export两行命令。

完成后开始安装Cudnn 5.1,官方下载地址 或者 百度网盘地址

下载完成后,解压复制到目录(如果CUDA8.0是默认安装路径,这个地方就不用修改路径了)

tar xvf  cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
tar xvf  cudnn-8.0-linux-x64-v5.1.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

好了,现在输入命令看看是否正常显示显卡信息:

nvidia-smi

然后进入刚刚CUDA安装的sample目录,默认是~/下,然后make,编译完成后输入

./NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/deviceQuery
./NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/deviceQuery

应该会正常显示详细设备信息:

CUDA Device Query (Runtime API) version (CUDART static linking)

  Detected 1 CUDA Capable device(s)

  Device 0: "GeForce GTX 1080"
    CUDA Driver Version / Runtime Version 8.0 / 8.0
    CUDA Capability Major/Minor version number: 6.1
    Total amount of global memory: 8110 MBytes (8504279040 bytes)
    (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
    GPU Max Clock rate: 1848 MHz (1.85 GHz)
    Memory Clock rate: 5005 Mhz
    Memory Bus Width: 256-bit
    L2 Cache Size: 2097152 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
    Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: Yes
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

  deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
  Result = PASS
  CUDA Device Query (Runtime API) version (CUDART static linking)

  Detected 1 CUDA Capable device(s)

  Device 0: "GeForce GTX 1080"
    CUDA Driver Version / Runtime Version 8.0 / 8.0
    CUDA Capability Major/Minor version number: 6.1
    Total amount of global memory: 8110 MBytes (8504279040 bytes)
    (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
    GPU Max Clock rate: 1848 MHz (1.85 GHz)
    Memory Clock rate: 5005 Mhz
    Memory Bus Width: 256-bit
    L2 Cache Size: 2097152 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
    Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: Yes
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

  deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1080
  Result = PASS

编译安装TensorFlow

好了,现在开始安装TensorFlow的编译环境了,如果不想自己编译,这里下载我编译好的whl

本文篇幅有点长,所以Bazel安装配置可以看官方手册,点这里传送

官方下载有点慢,可以到这里下载 bazel 0.3.1版本

继续安装 如果你是python 2.7

sudo apt-get install python-numpy swig python-dev python-wheel python-pip

或者是3.x

sudo apt-get install python3-numpy swig python3-dev python3-wheel python3-pip

拉取TensorFlow代码:

git clone https://github.com/tensorflow/tensorflow
git clone https://github.com/tensorflow/tensorflow

切到最新的 r0.11分支

git checkout r0.11
git checkout r0.11

开始配置:

$./configure
Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.1

...
...
...
$./configure
Please specify the location of python. [Default is /usr/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/dist-packages]

/usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.1

...
...
...

配置完成后,编译GPU版本whl。不想编译的同学可以到这里下载我编译好的whl

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.11.0rc0-py2-none-any.whl
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

sudo pip install /tmp/tensorflow_pkg/tensorflow-0.11.0rc0-py2-none-any.whl

到这里就全部完成了,完成后可以跑一下google的测试集验证下,点这里传送

大家安装如果有任何疑问可以给我留言:)