1.dcoker介绍
docker实际是基于 Linux 内核的 Cgroup,Namespace,以及 Union FS 等技术,对进程进行封装隔离,属于操作系统层面的虚拟化技术,由于隔离的进程独立于宿主和其它的隔离的进程,因此也称其为容器。
优点:
容器在操作系统中只是一个进程,很轻量,所有容器共用一个操作系统内核。而传统KVM虚拟化,每一个虚拟机中都有一个独立的操作系统,再不做优化的情况下,虚拟机自身就要占用100-200MB内存。
缺点:
1.由于多个容器只是多个进程,公用一个操作系统内核,所以如果某个容器对内核有特殊需求,就需要使用新的节点,并指定这种类型的容器只能运行在该特殊节点上。
2.在inux内核中,时间是不能被namespace化的,如果容器中程序使用settimeofday(2) 系统调用修改了时间,那么整个宿主机的时间都会被修改
3.多个容器间虽然能够通过Namespace技术进行隔离,但隔离的还不是十分彻底。这对传统的KVM虚拟化来说就没有这样的问题。
2.容器实际是一个进程
2.1环境准备
在讲解之前,我们需要先准备一个ubuntu虚拟机环境,由于我是在mac上操作,为了方便,我这里使用vagrant来创建虚拟机。
2.1.1安装virtualBox
安装好后,配置一下“主机网络管理器”
2.1.2使用vagrant来创建虚拟机
1.安装vagrant
brew install vagrant
2.mkdir ebpf&&cd ebpf
哈哈,由于vagrant是跟教ebpf老师学到的,所以这么命名了
3.创建和启动Ubuntu 21.10虚拟机
vagrant init ubuntu/impish64
在https://app.vagrantup.com/ubuntu 这里可以找到ubuntu的其他版本
A `Vagrantfile` has been placed in this directory. You are nowready to `vagrant up` your first virtual environment! Please read
the comments in the Vagrantfile as well as documentation on
`vagrantup.com` for more information on using Vagrant.
4.vagrant up 启动虚拟机
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Box 'ubuntu/impish64' could not be found. Attempting to find and install...
default: Box Provider: virtualbox
default: Box Version: >= 0
==> default: Loading metadata for box 'ubuntu/impish64'
default: URL: https://vagrantcloud.com/ubuntu/impish64
==> default: Adding box 'ubuntu/impish64' (v20220121.0.0) for provider: virtualbox
default: Downloading: https://vagrantcloud.com/ubuntu/boxes/impish64/versions/20220121.0.0/providers/virtualbox.box
Download redirected to host: cloud-images.ubuntu.com
==> default: Successfully added box 'ubuntu/impish64' (v20220121.0.0) for 'virtualbox'!
==> default: Importing base box 'ubuntu/impish64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'ubuntu/impish64' version '20220121.0.0' is up to date...
==> default: Setting the name of the VM: ebpf_default_1642853654001_35017
Vagrant is currently configured to create VirtualBox synced folders with
the `SharedFoldersEnableSymlinksCreate` option enabled. If the Vagrant
guest is not trusted, you may want to disable this option. For more
information on this option, please refer to the VirtualBox manual:
https://www.virtualbox.org/manual/ch04.html#sharedfolders
This option can be disabled globally with an environment variable:
VAGRANT_DISABLE_VBOXSYMLINKCREATE=1
or on a per folder basis within the Vagrantfile:
config.vm.synced_folder '/host/path', '/guest/path', SharedFoldersEnableSymlinksCreate: false
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Connection reset. Retrying...
default: Warning: Remote connection disconnect. Retrying...
default:
default: Vagrant insecure key detected. Vagrant will automatically replace
default: this with a newly generated keypair for better security.
default:
default: Inserting generated public key within guest...
default: Removing insecure key from the guest if it's present...
default: Key inserted! Disconnecting and reconnecting using new SSH key...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
default: The guest additions on this VM do not match the installed version of
default: VirtualBox! In most cases this is fine, but in rare cases it can
default: prevent things such as shared folders from working properly. If you see
default: shared folder errors, please make sure the guest additions within the
default: virtual machine match the version of VirtualBox you have installed on
default: your host and reload your VM.
default:
default: Guest Additions Version: 6.0.0 r127566
default: VirtualBox Version: 6.1
==> default: Mounting shared folders...
default: /vagrant => /Users/dz0400819/Desktop/ebpf
dz0400819@MacBook-Pro ~/Desktop/ebpf vagrant ssh
Welcome to Ubuntu 21.10 (GNU/Linux 5.13.0-27-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sat Jan 22 12:17:33 UTC 2022
System load: 0.09 Processes: 110
Usage of /: 3.2% of 38.71GB Users logged in: 0
Memory usage: 17% IPv4 address for enp0s3: 10.0.2.15
Swap usage: 0%
0 updates can be applied immediately.
在vitualBox上就能看到这个虚拟机了
5.vagrant ssh 进入虚拟机
Welcome to Ubuntu 21.10 (GNU/Linux 5.13.0-27-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sat Jan 22 13:52:50 UTC 2022
System load: 0.1 Processes: 109
Usage of /: 6.1% of 39.86GB Users logged in: 0
Memory usage: 16% IPv4 address for enp0s3: 10.0.2.15
Swap usage: 0%
0 updates can be applied immediately.
Last login: Sat Jan 22 12:18:17 2022 from 10.0.2.2
2.1.3虚拟机中安装docker并启动
1.安装docker
vagrant@ubuntu-impish:~$ curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
# Executing docker install script, commit: 93d2499759296ac1f9c510605fef85052a2c32be
+ sudo -E sh -c 'apt-get update -qq >/dev/null'
+ sudo -E sh -c 'DEBIAN_FRONTEND=noninteractive apt-get install -y -qq apt-transport-https ca-certificates curl >/dev/null'
+ sudo -E sh -c 'curl -fsSL "https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg" | gpg --dearmor --yes -o /usr/share/keyrings/docker-archive-keyring.gpg'
+ sudo -E sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu impish stable" > /etc/apt/sources.list.d/docker.list'
+ sudo -E sh -c 'apt-get update -qq >/dev/null'
+ sudo -E sh -c 'DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends docker-ce-cli docker-scan-plugin docker-ce >/dev/null'
+ version_gte 20.10
+ '[' -z '' ']'
+ return 0
+ sudo -E sh -c 'DEBIAN_FRONTEND=noninteractive apt-get install -y -qq docker-ce-rootless-extras >/dev/null'
+ sudo -E sh -c 'docker version'
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:33 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:43:41 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
================================================================================
To run Docker as a non-privileged user, consider setting up the
Docker daemon in rootless mode for your user:
dockerd-rootless-setuptool.sh install
Visit https://docs.docker.com/go/rootless/ to learn about rootless mode.
To run the Docker daemon as a fully privileged service, but granting non-root
users access, refer to https://docs.docker.com/go/daemon-access/
WARNING: Access to the remote API on a privileged Docker daemon is equivalent
to root access on the host. Refer to the 'Docker daemon attack surface'
documentation for details: https://docs.docker.com/go/attack-surface/
================================================================================
2.启动docker并查看状态
sudo service docker start
sudo service docker status
docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-01-22 13:56:00 UTC; 4min 54s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 2502 (dockerd)
Tasks: 8
Memory: 39.3M
CPU: 342ms
CGroup: /system.slice/docker.service
└─2502 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
Jan 22 13:55:59 ubuntu-impish dockerd[2502]: time="2022-01-22T13:55:59.821521689Z" level=info msg="scheme \"unix\" not registered, fallback>
Jan 22 13:55:59 ubuntu-impish dockerd[2502]: time="2022-01-22T13:55:59.821737767Z" level=info msg="ccResolverWrapper: sending update to cc:>
Jan 22 13:55:59 ubuntu-impish dockerd[2502]: time="2022-01-22T13:55:59.821904849Z" level=info msg="ClientConn switching balancer to \"pick_>
Jan 22 13:55:59 ubuntu-impish dockerd[2502]: time="2022-01-22T13:55:59.874804710Z" level=info msg="Loading containers: start."
Jan 22 13:56:00 ubuntu-impish dockerd[2502]: time="2022-01-22T13:56:00.060803788Z" level=info msg="Default bridge (docker0) is assigned wit>
Jan 22 13:56:00 ubuntu-impish dockerd[2502]: time="2022-01-22T13:56:00.140157834Z" level=info msg="Loading containers: done."
Jan 22 13:56:00 ubuntu-impish dockerd[2502]: time="2022-01-22T13:56:00.153742664Z" level=info msg="Docker daemon" commit=459d0df graphdrive>
Jan 22 13:56:00 ubuntu-impish dockerd[2502]: time="2022-01-22T13:56:00.153853677Z" level=info msg="Daemon has completed initialization"
Jan 22 13:56:00 ubuntu-impish systemd[1]: Started Docker Application Container Engine.
Jan 22 13:56:00 ubuntu-impish dockerd[2502]: time="2022-01-22T13:56:00.179098792Z" level=info msg="API listen on /run/docker.sock"
2.2创建容器并查看容器进程
2.2.1切换到root用户创建容器
容器中PID为1的进程是/bin/sh
vagrant@ubuntu-impish:~$ sudo -i
root@ubuntu-impish:~# docker run -it busybox /bin/sh
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
5cc84ad355aa: Pull complete
Digest: sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
Status: Downloaded newer image for busybox:latest
/ # ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh
7 root 0:00 ps
2.2.2查看容器在宿主机上的进程ID
vagrant@ubuntu-impish:~$ sudo -i
查看容器ID
root@ubuntu-impish:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ecfc45bcf970 busybox "/bin/sh" 18 seconds ago Up 17 seconds brave_haibt
查看容器进程ID
root@ubuntu-impish:~# docker inspect ecfc45bcf970 --format "{{ .State.Pid }}"
5786
查看进程信息,看到了“/bin/sh”
root@ubuntu-impish:~# ps -ef|grep 5786
root 5786 5765 0 14:33 pts/0 00:00:00 /bin/sh
root 6304 5833 0 14:42 pts/1 00:00:00 grep --color=auto 5786
2.3纠正一个问题
由于容器在宿主上是一个劲成功,Docker在宿主机上也是一个进程,所以Docker Engine应该是与应用同等级的,不应该在应用之下。
3.Namespace进程隔离
我们先来看一下一个容器进程有哪些独立的namespace
看一下我们上面创建的容器有哪些namespace
root@ubuntu-impish:~# ls -la /proc/5786/ns
total 0
dr-x--x--x 2 root root 0 Jan 22 14:33 .
dr-xr-xr-x 9 root root 0 Jan 22 14:33 ..
lrwxrwxrwx 1 root root 0 Jan 22 14:53 cgroup -> 'cgroup:[4026532258]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 ipc -> 'ipc:[4026532198]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 mnt -> 'mnt:[4026532196]'
lrwxrwxrwx 1 root root 0 Jan 22 14:33 net -> 'net:[4026532201]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 pid -> 'pid:[4026532199]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 pid_for_children -> 'pid:[4026532199]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jan 22 14:53 uts -> 'uts:[4026532197]'
再看一下一个系统进程有哪些namespace
root 5765 1 0 14:33 ? 00:00:00 /usr/bin/containerd-shim-runc-v2 -namespace moby -id ecfc45bcf97085c1f566c9c2f5738b14bdf
root@ubuntu-impish:~# ls -la /proc/5765/ns
total 0
dr-x--x--x 2 root root 0 Jan 22 14:59 .
dr-xr-xr-x 9 root root 0 Jan 22 14:33 ..
lrwxrwxrwx 1 root root 0 Jan 22 14:59 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 net -> 'net:[4026531992]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jan 22 14:59 uts -> 'uts:[4026531838]'
可见容器进程和系统中其他进程一样,拥有相同的namespace类别。
3.1Namespace介绍
Linux Namespace是一种linux 内核提供的资源隔离方案。
在Linux系统中,系统可以为进程分配不同的Namespace,并且能够保证不同的Namespace资源独立分配、进程彼此隔离,即不同的Namespace下的进程互不干扰。
namespace的类别如下:
由于我安装的是ubuntu 最新版本20.10版本,所以我这里又多了
pid_for_children 、time_for_children 、time三个Namespace。
3.2各个namespace介绍
3.2.1pid namespace
不同用户进程通过pid namespace进行隔离,且不同namespace可以有相同pid。
有了pid namespace,每个namespace中的pid能够相互隔离。
3.2.2net namespace
网络隔离是通过net namespace实现的,每个net namespace有独立的network devices、ip addresses、ip routing tables、/proc/net目录。
3.2.3ipc namespace
Container 中进程交互还是采用 linux 常见的进程间交互方法 (interprocess communication – IPC), 包括常见的信号量、消息队列和共享内存。
container 的进程间交互实际上还是 host上 具有相同 Pid namespace 中的进程间交互,因此需要在 IPC资源申请时加入 namespace 信息 - 每个 IPC 资源有一个唯一的 32 位 ID。
3.2.4mnt namespace
mnt namespace允许不同的namespace的进程看到不同的文件结构,这样每个namespace中的进程所看到的文件目录就被隔离开了。
3.2.5uts namespace
UTS(“UNIX Time-sharing System”) namespace允许每个 container 拥有独立的 hostname 和domain name, 使其在网络上可以被视作一个独立的节点而非 Host 上的一个进程。
3.2.6user namespace
每个 container 可以有不同的 user 和 group id, 也就是说可以在 container 内部用 container 内部的用户执行程序而非 Host 上的用户。
3.3Linux内核中的namespace结构体
3.4Linux中对namespace的操作方法
3.4.1clone
在创建新进程的系统调用时,可以通过 flags 参数指定需要新建的 Namespace 类型:
// CLONE_NEWCGROUP / CLONE_NEWIPC / CLONE_NEWNET / CLONE_NEWNS / CLONE_NEWPID / CLONE_NEWUSER / CLONE_NEWUTS
例如:
当我们用clone()系统调用创建一个新进城时,可以在参数中指定CLONE_NEWPID参数
int pid = clone(main_function, stack_size, CLONE_NEWPID | SIGCHLD, NULL);
这时,新建的进程就会看到一个全新的进程空间,在这个进程空间中,它的pid是1。
3.4.2setns
该系统调用可以让调用进程加入某个已经存在的 Namespace 中:
Int setns(int fd, int nstype)
3.4.3 unshare
该系统调用可以将调用进程移动到新的 Namespace 下:
int unshare(int flags)
3.5关于namespace的常用操作
3.5.1 查看当前系统的 namespace
lsns –t <type>
查看有哪些网络namespace
root@ubuntu-impish:~# lsns -t net
NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
4026531992 net 113 1 root unassigned /sbin/init
4026532201 net 1 5786 root 0 /run/docker/netns/fdccb211a1f0 /bin/sh
查看有哪些pid namespace
root@ubuntu-impish:~# lsns -t pid
NS TYPE NPROCS PID USER COMMAND
4026531836 pid 113 1 root /sbin/init
4026532199 pid 1 5786 root /bin/sh
查看有哪些mnt namespace
root@ubuntu-impish:~# lsns -t mnt
NS TYPE NPROCS PID USER COMMAND
4026531840 mnt 106 1 root /sbin/init
4026531860 mnt 1 23 root kdevtmpfs
4026532176 mnt 1 382 root /lib/systemd/systemd-udevd
4026532179 mnt 1 524 systemd-timesync /lib/systemd/systemd-timesyncd
4026532181 mnt 1 572 systemd-network /lib/systemd/systemd-networkd
4026532191 mnt 1 574 systemd-resolve /lib/systemd/systemd-resolved
4026532196 mnt 1 5786 root /bin/sh
4026532248 mnt 1 614 root /usr/sbin/irqbalance --foreground
4026532249 mnt 1 624 root /lib/systemd/systemd-logind
查看有哪些time namespace
root@ubuntu-impish:~# lsns -t time
NS TYPE NPROCS PID USER COMMAND
4026531834 time 114 1 root /sbin/init
3.5.2查看某进程的namespace
ls -la /proc/<pid>/ns/
上面已经操作过了,就不展示了
3.5.3进入某namespace运行命令
nsenter -t <pid> -n ip addr
进入上面的容器执行ip addr命令
root@ubuntu-impish:~# nsenter -t 5786 -n ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
root@ubuntu-impish:~#
在容器中执行相同的命令,结果和上面相同
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
/ #
3.5.4namespace练习
在新 network namespace 执行 sleep 指令:
root@ubuntu-impish:~# unshare -fn sleep 600
查看进程信息 进程ID为9664
root@ubuntu-impish:~# ps -ef|grep sleep
root 9664 5833 0 15:51 pts/1 00:00:00 unshare -fn sleep 600
root 9665 9664 0 15:51 pts/1 00:00:00 sleep 600
root 9675 5690 0 15:52 pts/0 00:00:00 grep --color=auto sleep
查看网络 Namespace,进程ID9664
root@ubuntu-impish:~# lsns -t net
NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
4026531992 net 115 1 root unassigned /sbin/init
4026532195 net 2 9664 root unassigned unshare -fn sleep 600
进入该进程所在 Namespace 查看网络配置,与主机不一致
root@ubuntu-impish:~# nsenter -t 9664 -n ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
查看主机上的网络
root@ubuntu-impish:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 02:bb:60:9a:24:15 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic enp0s3
valid_lft 79139sec preferred_lft 79139sec
inet6 fe80::bb:60ff:fe9a:2415/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:30:d9:3d:6c brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:30ff:fed9:3d6c/64 scope link
valid_lft forever preferred_lft forever
4.docker exec 原理
1.手动创建一个容器,并查看容器pid
首先查看一下宿主机的网卡,后面会用到对比
root@ubuntu-focal:/opt# ifconfig
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:a7ff:fe6c:6699 prefixlen 64 scopeid 0x20<link>
ether 02:42:a7:6c:66:99 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 526 (526.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.2.15 netmask 255.255.255.0 broadcast 10.0.2.255
inet6 fe80::cd:1cff:fe36:98 prefixlen 64 scopeid 0x20<link>
ether 02:cd:1c:36:00:98 txqueuelen 1000 (Ethernet)
RX packets 29308 bytes 41481001 (41.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3528 bytes 300795 (300.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 36 bytes 3616 (3.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 36 bytes 3616 (3.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vethd7a1b3b: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::d005:b7ff:fef8:3708 prefixlen 64 scopeid 0x20<link>
ether d2:05:b7:f8:37:08 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20 bytes 1672 (1.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
创建一个容器
root@ubuntu-focal:~# docker run -it ubuntu:latest /bin/sh
#
另一个终端查看刚创建的容器
root@ubuntu-focal:~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
69a4dbf98af5 ubuntu:latest "/bin/sh" 12 minutes ago Up 12 minutes eloquent_satoshi
查看容器PID
root@ubuntu-focal:~# docker inspect --format '{{ .State.Pid }}' 69a4dbf98af5
1717
容器pid=1717
2.查看容器进程对应的namespace文件
root@ubuntu-focal:~# ls -l /proc/1717/ns/
total 0
lrwxrwxrwx 1 root root 0 Jan 27 07:34 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 ipc -> 'ipc:[4026532180]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 mnt -> 'mnt:[4026532178]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 net -> 'net:[4026532183]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 pid -> 'pid:[4026532181]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 pid_for_children -> 'pid:[4026532181]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jan 27 07:34 uts -> 'uts:[4026532179]'
一个进程,可以选择加入到某个进程已有的 Namespace 当中,从而达到“进入”这个进程所在容器的目的,这正是 docker exec 的实现原理。
而这个操作所依赖的,是一个名叫 setns() 的 Linux 系统调用。它的调用方法,我可以用如下程序进行演示:
3.准备一段c程序
root@ubuntu-focal:/opt# cat exec.c
int main(int argc, char *argv[]) {
int fd;
fd = open(argv[1], O_RDONLY);
if (setns(fd, 0) == -1) {
errExit("setns");
}
execvp(argv[2], &argv[2]);
errExit("execvp");
}
程序介绍:
它一共接收两个参数
第一个参数是 argv[1],即当前进程要加入的 Namespace 文件的路径,比如 /proc/1717/ns/net;
第二个参数,是要在这个 Namespace 里运行的进程,比如 /bin/bash。
这段代码的核心操作,则是通过 open() 系统调用打开了指定的 Namespace 文件,并把这个文件的描述符 fd 交给 setns() 使用。在 setns() 执行后,当前进程就加入了这个文件对应的 Linux Namespace 当中了。
4.编译并运行,查看网卡信息
进到 1717这个容器的网络namespace中,查看网卡信息
发现比宿主机的网卡少了很多,说明我们进到了容器中。
root@ubuntu-focal:/opt# gcc -o exec exec.c
root@ubuntu-focal:/opt# ./exec /proc/1717/ns/net /bin/bash
root@ubuntu-focal:/opt# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.2 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet)
RX packets 19 bytes 1602 (1.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
进到另一个namespace中执行的/bin/bash命令,在宿主机上也有所体现
查看/bin/bash的进程的相关namespace,发现两个进程的net namespace对应的文件是相同的,说明两个进程共享了同一个net namespace。
lrwxrwxrwx 1 root root 0 Jan 27 07:43 net -> 'net:[4026532183]'
root@ubuntu-focal:/opt# ps -ef|grep /bin/bash
root 3482 1878 0 07:43 pts/1 00:00:00 /bin/bash
root 3552 3482 0 07:44 pts/1 00:00:00 grep --color=auto /bin/bash
root@ubuntu-focal:/opt# ll /proc/3482/ns/
total 0
dr-x--x--x 2 root root 0 Jan 27 07:44 ./
dr-xr-xr-x 9 root root 0 Jan 27 07:44 ../
lrwxrwxrwx 1 root root 0 Jan 27 07:44 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 net -> 'net:[4026532183]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jan 27 07:44 uts -> 'uts:[4026531838]'
root@ubuntu-focal:/opt# ll /proc/1717/ns/
total 0
dr-x--x--x 2 root root 0 Jan 27 07:43 ./
dr-xr-xr-x 9 root root 0 Jan 27 07:43 ../
lrwxrwxrwx 1 root root 0 Jan 27 07:43 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 ipc -> 'ipc:[4026532180]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 mnt -> 'mnt:[4026532178]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 net -> 'net:[4026532183]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 pid -> 'pid:[4026532181]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 pid_for_children -> 'pid:[4026532181]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Jan 27 07:43 uts -> 'uts:[4026532179]'
root@ubuntu-focal:/opt#
5.新启动一个容器,与最初创建的容器公用网络
这里 ifconfig 返回的网卡信息,跟我前面那个小程序返回的结果一模一样。
root@ubuntu-focal:/opt# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
69a4dbf98af5 ubuntu:latest "/bin/sh" 36 minutes ago Up 36 minutes eloquent_satoshi
使用上面容器的网络
root@ubuntu-focal:/opt# docker run -it --net container:69a4dbf98af5 busybox ifconfig
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
5cc84ad355aa: Pull complete
Digest: sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
Status: Downloaded newer image for busybox:latest
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02
inet addr:172.17.0.2 Bcast:172.17.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:20 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1672 (1.6 KiB) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)