在生产环境中,经常遇到docker image 在资源池中的主机上存留的数据,由于随着业务系统的升级,旧的image 需要进行清理。这里梳理下,docker image的在linux 系统上的存储目录,以针对性的进行数据清理。

 

在3.10内核上 docker基于aufs管理存储

 

下面的命令可以看到所有pull到的images

cat /var/lib/docker/repositories-aufs | python -m json.tool

该命令的结果与docker images看到的images数目相同

 

cat /var/lib/docker/repositories-aufs | python -m json.tool
 {
     "Repositories": {
         "172.30.30.241:5000/centos1": {
             "latest": "e099197b794f459b777cc82ba53f2ecdcfb52c0a3245a9b010ca239b50fd72ad"
         },
         "centos": {
             "6": "b9aeeaeb5e17b5414e5caa9a6b2f99e9ccef50561bdfe137cd05956961f1cec6",
             "latest": "fd44297e2ddb050ec4fa9752b7a4e3a8439061991886e2091e7c1f007c906d75",
             "new": "8390535f8e613861a3715cf1af4a082ac80108c1d098944def5aa1391207e33a",
             "new2": "e099197b794f459b777cc82ba53f2ecdcfb52c0a3245a9b010ca239b50fd72ad"
         },
         "hello-world": {
             "latest": "91c95931e552b11604fea91c2f537284149ec32fff0f700a4769cfd31d7696ae"
         },
         "quay.io/coreos/etcd": {
             "v2.0.11": "c02fd8670851ce85ace68db5cff8694a3ed3656bedd9fa8054de8aff2f39e631"
         },
         "registry": {
             "latest": "204704ce31375bcf4afecf672563b4881bbef0d59135c68d273235bb7254fb4b"
         },
         "ubuntu": {
             "14.04": "07f8e8c5e66084bef8f848877857537ffe1c47edd01a93af27e7161672ad0e95"
         }
     }
 }

上述出现的image的ID全局唯一和image hub上相同

 

可以看到通过docker images看到的IMAGE ID实际就是上面的ID的前几位

docker images
 REPOSITORY                   TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
 quay.io/coreos/etcd          v2.0.11             c02fd8670851        11 days ago         12.83 MB
 172.30.30.241:5000/centos1   latest              e099197b794f        12 days ago         306.1 MB
 centos                       new2                e099197b794f        12 days ago         306.1 MB
 centos                       new                 8390535f8e61        12 days ago         306.1 MB
 registry                      latest              204704ce3137        13 days ago         413.9 MB
 ubuntu                       14.04               07f8e8c5e660        3 weeks ago         188.3 MB
 centos                       6                   b9aeeaeb5e17        5 weeks ago         202.6 MB
 centos                       latest              fd44297e2ddb        5 weeks ago         215.7 MB
 hello-world                latest              91c95931e552        5 weeks ago         910 B

 

docker images -tree可以看到image的层级结构, docker 的image是一层层叠加的, -tree参数可以看到具体的叠加方式。

通过-tree看到的image的数目比上面的要多很多,出现了很多没有见过的imageID,

但是tree中第0层的image ID和 docker image命令的中的ID相符

docker images -tree
 Warning: '-tree' is deprecated, it will be removed soon. See usage.
 ?..8093db4276d5 Virtual Size: 0 B
 ?.?..f9c3a06edd7a Virtual Size: 6.642 MB
 ?.  ?..546a4b0d3153 Virtual Size: 12.83 MB
 ?.    ?..9caa77989e25 Virtual Size: 12.83 MB
 ?.      ?..c02fd8670851 Virtual Size: 12.83 MB Tags: quay.io/coreos/etcd:v2.0.11
 ?..e9e06b06e14c Virtual Size: 188.1 MB
 ?.?..a82efea989f9 Virtual Size: 188.3 MB
 ?.  ?..37bea4ee0c81 Virtual Size: 188.3 MB
 ?.    ?..07f8e8c5e660 Virtual Size: 188.3 MB Tags: ubuntu:14.04
 ?.      ?..1f4ab7282e19 Virtual Size: 375.1 MB
 ?.        ?..0e4483abe66b Virtual Size: 377.5 MB
 ?.          ?..c6153b5d8f1f Virtual Size: 377.5 MB
 ?.            ?..2bc4611f2ed7 Virtual Size: 389.1 MB
 ?.              ?..30887473610f Virtual Size: 413.9 MB
 ?.                ?..3f8e22c413b1 Virtual Size: 413.9 MB
 ?.                  ?..22b1c756fa19 Virtual Size: 413.9 MB
 ?.                    ?..90607d8d09d1 Virtual Size: 413.9 MB
 ?.                      ?..4f4a5acb19eb Virtual Size: 413.9 MB
 ?.                        ?..204704ce3137 Virtual Size: 413.9 MB Tags: registry:latest
 ?..f1b10cd84249 Virtual Size: 0 B
 ?.?..b9aeeaeb5e17 Virtual Size: 202.6 MB Tags: centos:6
 ?.  ?..8390535f8e61 Virtual Size: 306.1 MB Tags: centos:new
 ?.    ?..0570b3aa38fb Virtual Size: 306.1 MB
 ?.      ?..e099197b794f Virtual Size: 306.1 MB Tags: 172.30.30.241:5000/centos1:latest, centos:new2
 ?..6941bfcbbfca Virtual Size: 0 B
 ?.?..41459f052977 Virtual Size: 215.7 MB
 ?.  ?..fd44297e2ddb Virtual Size: 215.7 MB Tags: centos:latest
 ?..a8219747be10 Virtual Size: 910 B
   ?..91c95931e552 Virtual Size: 910 B Tags: hello-world:latest

 

我们到graph目录

/var/lib/docker/graph

可以看到graph目录的imageID数目和 docker images -tree看到的相同

进入到每个image ID的目录,每个目录下有两个文件

json  layersize

其中json文件描述了image的元数据,layersize只有一个值,表示了这个层级的image的大小,可以为0

ls /var/lib/docker/graph/
0570b3aa38fbdd1defba6929282656a39cfabcf70dc39e68848139d649d8921b  41459f052977938b824dd011e1f2bec2cb4d133dfc7e1aa0e90f7c5d337ca9c4  a82efea989f94b1d9fac76e26e37b0bbde11047a3afcaa47064949dfa3b3209b
 07f8e8c5e66084bef8f848877857537ffe1c47edd01a93af27e7161672ad0e95  9caa77989e25e788a5a75faff4e77b011e68d2fd5975a0bd20f5d14f61154bc0  fd44297e2ddb050ec4fa9752b7a4e3a8439061991886e2091e7c1f007c906d75
 3f8e22c413b1783145e785a4729c4d5f98f9baca025b74d73774ed438ac82ba2  a8219747be10611d65b7c693f48e7222c0bf54b5df8467d3f99003611afa1fd8  _tmp 
/var/lib/docker/graph/0570b3aa38fbdd1defba6929282656a39cfabcf70dc39e68848139d649d8921b# ls
 json  layersize 
/var/lib/docker/aufs/diff

可以看到这个目录的所有文件都是ImageID并且与graph目录相对应

根据image tree,进入到某个imageID的0层目录,可能没有数据,再往上,可以看到文件系统的目录,就是我们操作docker生成的文件

 

进过简单的测试,我们可以发现,

当我们pull 一个images时, /var/lib/docker/aufs/diff 目录下会多一个image ID

如果此时再根据该image创建文件会出现一个容器ID前缀的目录,以及ID相同带有-init结尾的目录

/var/lib/docker/aufs/mnt/这一层是多层diff的view

 

当我们在容器中操作时,创建的数据会保存在该容器ID的目录中,容器退出时,该目录不会删除。每次 根据image运行一次都会创建新的容器ID的两个目录。docker ps -a可以看到退出状态的容器,执行删除后,这两个目录消失。当我们commit修改后的容器时,又会生成一个新的目录。

docker images tree看到的结构 image从0层到最后一层或者tag,都是直接从hub上pull下来的各种imageID层次。

当容器创建时有VOLUME,或者-v启动,会在目录/var/lib/docker/volumes下创建一个随机ID的目录,并将这个随机ID添加到容器的元数据中

容器启动时,会绑定/var/lib/docker/volumes/*/到指定的路径中,volumes不是运行时绑定的,只是挂载的目录。

在新版本的docker中,docker使用vfs driver存储数据 /var/lib/docker/volumes/只存储元数据,实际数据存储在/var/lib/docker/vfs/dir/<volume id>

 

每当docker创建一个数据卷时,就会在目录/var/lib/docker/vfs/dir/* 下创建一个随机ID的目录,表示这个数据卷,如果数据卷不是和host共享的,写入数据卷中的数据会在这个目录。 元数据信息在/var/lib/docker/volumes/*中。

 

 

总结: 

每当创建一个pull一个新的image或者容器时会在/var/lib/docker/graph/*生成对应ID的目录,存储元数据,/var/lib/docker/aufs/diff/*目录生成对应ID的目录,存储数据。

当容器被删除或者image被移除时,对应的目录也会被移除。

 

每当docker创建一个数据卷时,就会在目录/var/lib/docker/vfs/dir/* 下创建一个随机ID的目录,表示这个数据卷,如果数据卷不是和host共享的,写入数据卷中的数据会在这个目录。 元数据信息在/var/lib/docker/volumes/*中。

 docker rm -v containerid  当指定-v时,如果卷没有关联容器,会删除该卷的数据。

 

可能有些情况下,一些异常或者docker rm -v潜在的bug,导致存储空间有残留,这种情况就需要手动清理。

 

1 对于image或者容器数据文件: docker image -a 以及docker ps -a 看到的ID是所有有效的文件ID,/var/lib/docker/aufs/diff/下的ID不在这个范围中的就是失效的文件。 这种情况情况其实比较少。

2 对于数据卷, 通过docker  inspect 查看所有的容器id会得到所有合法的volume的范围,/var/lib/docker/vfs/dir/不在这个范围内的的ID就是失效文件。这种情况比较多,因为删除容器时可能没有指定-v,在docker低版本中好像有bug,不能删除卷数据。

脚本2来源于: https://github.com/chadoe/docker-cleanup-volumes/blob/master/docker-cleanup-volumes.sh

 

对应的脚本分别为:

cat docker_images_clean.sh

#! /bin/bash

set -eou pipefail

#usage: sudo ./docker-cleanup-volumes.sh [--dry-run]

docker_bin="$(which docker.io 2> /dev/null || which docker 2> /dev/null)"

# Default dir
dockerdir=/var/lib/docker

# Look for an alternate docker directory with -g/--graph option
dockerpid=$(ps ax | grep "$docker_bin" | grep -v grep | awk '{print $1; exit}') || :
if [[ -n "$dockerpid" && $dockerpid -gt 0 ]]; then
    next_arg_is_dockerdir=false
    while read -d $'\0' arg
    do
        if [[ $arg =~ ^--graph=(.+) ]]; then
            dockerdir=${BASH_REMATCH[1]}
            break
        elif [ $arg = '-g' ]; then
            next_arg_is_dockerdir=true
        elif [ $next_arg_is_dockerdir = true ]; then
            dockerdir=$arg
            break
        fi
    done < /proc/$dockerpid/cmdline
fi

dockerdir=$(readlink -f "$dockerdir")

volumesdir=${dockerdir}/volumes
vfsdir=${dockerdir}/vfs/dir
allvolumes=()
dryrun=false
verbose=false

function log_verbose() {
    if [ "${verbose}" = true ]; then
        echo "$1"
    fi;
}

function delete_volumes() {
  local targetdir=$1
  echo
  if [[ ! -d "${targetdir}" || ! "$(ls -A "${targetdir}")" ]]; then
        echo "Directory ${targetdir} does not exist or is empty, skipping."
        return
  fi
  echo "Delete unused volume directories from $targetdir"
  local dir
  while read -d $'\0' dir
  do
        dir=$(basename "$dir")
        if [[ -d "${targetdir}/${dir}/_data" || "${dir}" =~ [0-9a-f]{64} ]]; then
                if [ ${#allvolumes[@]} -gt 0 ] && [[ ${allvolumes[@]} =~ "${dir}" ]]; then
                        echo "In use ${dir}"
                else
                        if [ "${dryrun}" = false ]; then
                                echo "Deleting ${dir}"
                                rm -rf "${targetdir}/${dir}"
                        else
                                echo "Would have deleted ${dir}"
                        fi
                fi
        else
                echo "Not a volume ${dir}"
        fi
  done < <(find "${targetdir}" -mindepth 1 -maxdepth 1 -type d -print0 2>/dev/null)
}

if [ $UID != 0 ]; then
    echo "You need to be root to use this script."
    exit 1
fi

if [ -z "$docker_bin" ] ; then
    echo "Please install docker. You can install docker by running \"wget -qO- https://get.docker.io/ | sh\"."
    exit 1
fi

while [[ $# > 0 ]]
do
    key="$1"

    case $key in
        -n|--dry-run)
            dryrun=true
        ;;
        -v|--verbose)
            verbose=true
        ;;
        *)
            echo "Cleanup docker volumes: remove unused volumes."
            echo "Usage: ${0##*/} [--dry-run] [--verbose]"
            echo "   -n, --dry-run: dry run: display what would get removed."
            echo "   -v, --verbose: verbose output."
            exit 1
        ;;
    esac
    shift
done

# Make sure that we can talk to docker daemon. If we cannot, we fail here.
${docker_bin} version >/dev/null

container_ids=$(${docker_bin} ps -a -q --no-trunc)

#All volumes from all containers
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for container in $container_ids; do
        #add container id to list of volumes, don't think these
        #ever exists in the volumesdir but just to be safe
        allvolumes+=${container}
        #add all volumes from this container to the list of volumes
        log_verbose "Inspecting container ${container}"
        for volpath in $(
                ${docker_bin} inspect --format='{{range $key, $val := .}}{{if eq $key "Volumes"}}{{range $vol, $path := .}}{{$path}}{{"\n"}}{{end}}{{end}}{{if eq $key "Mounts"}}{{range $mount := $val}}{{$mount.Source}}{{"\n"}}{{end}}{{end}}{{end}}' ${container} \
        ); do
                log_verbose "Processing volumepath ${volpath}"
                #try to get volume id from the volume path
                vid=$(echo "${volpath}" | sed 's|.*/\(.*\)/_data$|\1|;s|.*/\([0-9a-f]\{64\}\)$|\1|')
                # check for either a 64 character vid or then end of a volumepath containing _data:
                if [[ "${vid}" =~ ^[0-9a-f]{64}$ || (${volpath} =~ .*/_data$ && ! "${vid}" =~ "/") ]]; then
                        log_verbose "Found volume ${vid}"
                        allvolumes+=("${vid}")
                else
                        #check if it's a bindmount, these have a config.json file in the ${volumesdir} but no files in ${vfsdir} (docker 1.6.2 and below)
                        for bmv in $(find "${volumesdir}" -name config.json -print | xargs grep -l "\"IsBindMount\":true" | xargs grep -l "\"Path\":\"${volpath}\""); do
                                bmv="$(basename "$(dirname "${bmv}")")"
                                log_verbose "Found bindmount ${bmv}"
                                allvolumes+=("${bmv}")
                                #there should be only one config for the bindmount, delete any duplicate for the same bindmount.
                                break
                        done
                fi
        done
done
IFS=$SAVEIFS

delete_volumes "${volumesdir}"
delete_volumes "${vfsdir}"

cat docker_volumes_clean.sh

#! /bin/bash

set -eo pipefail

#usage: sudo ./docker-cleanup-volumes.sh [--dry-run]

dockerdir=/var/lib/docker
volumesdir=${dockerdir}/volumes
vfsdir=${dockerdir}/vfs/dir
allvolumes=()
dryrun=false

function delete_volumes() {
  targetdir=$1
  echo
  if [[ ! -d ${targetdir} ]]; then
        echo "Directory ${targetdir} does not exist, skipping."
        return
  fi
  echo "Delete unused volume directories from $targetdir"
  for dir in $(ls -d ${targetdir}/* 2>/dev/null)
  do
        dir=$(basename $dir)
        if [[ "${dir}" =~ [0-9a-f]{64} ]]; then
                if [[ ${allvolumes[@]} =~ "${dir}" ]]; then
                        echo In use ${dir}
                else
                        if [ "${dryrun}" = false ]; then
                                echo Deleting ${dir}
                                rm -rf ${targetdir}/${dir}
                        else
                                echo Would have deleted ${dir}
                        fi
                fi
        else
                echo Not a volume ${dir}
        fi
  done
}

if [ $UID != 0 ]; then
    echo "You need to be root to use this script."
    exit 1
fi

docker_bin=$(which docker.io || which docker)
if [ -z "$docker_bin" ] ; then
    echo "Please install docker. You can install docker by running \"wget -qO- https://get.docker.io/ | sh\"."
    exit 1
fi

if [ "$1" = "--dry-run" ]; then
        dryrun=true
else if [ -n "$1" ]; then
        echo "Cleanup docker volumes: remove unused volumes."
        echo "Usage: ${0##*/} [--dry-run]"
        echo "   --dry-run: dry run: display what would get removed."
        exit 1
fi
fi

# Make sure that we can talk to docker daemon. If we cannot, we fail here.
docker info >/dev/null

#All volumes from all containers
for container in `${docker_bin} ps -a -q --no-trunc`; do
        #add container id to list of volumes, don't think these
        #ever exists in the volumesdir but just to be safe
        allvolumes+=${container}
        #add all volumes from this container to the list of volumes
        for vid in `${docker_bin} inspect --format='{{range $vol, $path := .Volumes}}{{$path}}{{"\n"}}{{end}}' ${container}`; do
                if [[ ${vid} == ${vfsdir}* && "${vid##*/}" =~ [0-9a-f]{64} ]]; then
                        allvolumes+=("${vid##*/}")
                else
                        #check if it's a bindmount, these have a config.json file in the ${volumesdir} but no files in ${vfsdir}
                        for bmv in `grep --include config.json -Rl "\"IsBindMount\":true" ${volumesdir} | xargs grep -l "\"Path\":\"${vid}\""`; do
                                bmv="$(basename "$(dirname "${bmv}")")"
                                allvolumes+=("${bmv}")
                                #there should be only one config for the bindmount, delete any duplicate for the same bindmount.
                                break
                        done
                fi
        done
done

delete_volumes ${volumesdir}
delete_volumes ${vfsdir}

资料

http://stackoverflow.com/questions/24353387/how-docker-container-volumes-work-even-when-they-arent-running

http://stackoverflow.com/questions/24353387/how-docker-container-volumes-work-even-when-they-arent-running


https://github.com/docker/docker/issues/3925

https://github.com/chadoe/docker-cleanup-volumes/blob/master/docker-cleanup-volumes.sh