QEMU-KVM 提供磁盘虚拟化,从虚拟机角度看其自身拥有的磁盘即是实际的物理磁盘。实际上,虚拟机读写的磁盘数据保存在 host 上的物理磁盘。
QEMU-KVM 主要有如下几种方式虚拟磁盘:
本地存储虚拟机镜像文件。
host 上物理磁盘或磁盘分区。
LVM(Logical Volume Management),逻辑分区。
NFS(Network File System),网络文件系统。
GFS(Gluster File System),分布式文件系统。
本节针对常用的虚拟磁盘方式进行介绍,包括本地存储虚拟机镜像文件和 LVM 逻辑分区:
本地存储镜像文件,首先需要在本地创建镜像文件,然后指定本地镜像文件为虚拟机的磁盘。
通过 qemu-img 命令创建镜像文件,qemu-img 是编译安装完 QEMU 即默认自带的软件程序,常用的 qemu-img 选项有 create 和 info,create 用来创建镜像 img,info 用来查看镜像信息:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo.img -o preallocation=off 10G
Formatting 'lianhua_demo.img', fmt=raw size=10737418240 preallocation=off
real 0m0.040s
user 0m0.015s
sys 0m0.015s
[lianhua@host ~]$ qemu-img info lianhua_demo.img
image: lianhua_demo.img
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 0
如上所示,创建了一个名为 lianhua_demo.img 的格式为 raw 的镜像,该镜像大小为 10G。但是,使用 info 查看镜像信息时,镜像的实际大小为 0(disk size)。这是由于 raw 格式的镜像可以指定自己为稀疏文件,如果是稀疏文件,那么只在写数据到镜像的时候才会真正为其分配空间。
qemu-img 的 preallocation 选项可以指定是否预分配空间,它有三个值,off/full 和 falloc。off 表示禁止预分配空间;full 表示为镜像预分配空间,预分配的方式是给镜像逐字节写 0;falloc 表示预分配磁盘空间给镜像文件,但不往镜像文件中写数据。比较上述三种分配方式,如下:
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_full.img -o preallocation=full 10G
Formatting 'lianhua_demo_on.img', fmt=raw size=10737418240 preallocation=full
real 0m22.955s
user 0m0.013s
sys 0m8.930s
[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_falloc.img -o preallocation=falloc 10G
Formatting 'lianhua_demo_falloc.img', fmt=raw size=10737418240 preallocation=falloc
real 0m8.256s
user 0m0.008s
sys 0m8.114s
[lianhua@host ~]$ du -h lianhua_demo*.img
11G lianhua_demo_falloc.img
0 lianhua_demo.img
11G lianhua_demo_full.img
镜像格式有多种,除了 raw 外,还有常用的磁盘格式 qcow2,vdi 等。
镜像分配完磁盘空间后,使用 qemu-kvm 创建虚拟机,并且将镜像文件作为虚拟机的磁盘,命令如下:
[lianhua@host ~]$ /usr/libexec/qemu-kvm -m 1024 -smp 2 -hda lianhua_demo_falloc.img -monitor stdio
WARNING: Image format was not specified for 'lianhua_demo_falloc.img' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) VNC server running on '::1;5900'
(qemu) info pci
Bus 0, device 0, function 0:
Host bridge: PCI device 8086:1237
id ""
Bus 0, device 1, function 1:
IDE controller: PCI device 8086:7010
BAR4: I/O at 0xc040 [0xc04f].
id ""
Bus 0, device 1, function 3:
Bridge: PCI device 8086:7113
IRQ 9.
id ""
Bus 0, device 3, function 0:
Ethernet controller: PCI device 8086:100e
IRQ 11.
BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff].
BAR1: I/O at 0xc000 [0xc03f].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
id ""
可以看出,镜像在虚拟机中的 pci 号为 00:01:1,且该设备为 IDE 设备。qemu-kvm 的 -hda 选项将镜像文件作为虚拟机的第一个 IDE 设备,在虚拟机中表现为 /dev/hda 设备或 /dev/sda 设备(驱动不同,表现的设备名称不同),更多磁盘选项配置可以查看 qemu-kvm 的 man 文档。
LVM 逻辑分区,首先需要使用 LVM 创建 volume,然后将此 volume attach 到虚拟机上作为虚拟机的磁盘。
在 OpenStack 平台上创建 LVM volume,如下:
[root@host ~]# pvdisplay
--- Physical volume ---
PV Name /dev/loop2
VG Name cinder-volumes
PV Size 602.34 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 154199
Free PE 146519
Allocated PE 7680
PV UUID pTkQ5Z-zNdc-LRrn-qWAX-13D6-bhbG-DdcGFD
[root@host ~]# vgdisplay
--- Volume group ---
VG Name cinder-volumes
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 1447
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 2
Max PV 0
Cur PV 1
Act PV 1
VG Size 602.34 GiB
PE Size 4.00 MiB
Total PE 154199
Alloc PE / Size 7680 / 30.00 GiB
Free PE / Size 146519 / 572.34 GiB
VG UUID Mrrh1r-qKQw-bCgW-0WXi-d5Bd-OiVV-cTBrg5
[root@host ~]# lvdisplay
--- Logical volume ---
LV Path /dev/cinder-volumes/volume-c34555f0-fd26-42fe-a3b2-86098b590be2
LV Name volume-c34555f0-fd26-42fe-a3b2-86098b590be2
VG Name cinder-volumes
LV UUID n81Af6-cWEe-LvAm-wjA3-vgKD-RxgV-qMtIq6
LV Write Access read/write
LV Creation host, time host.localdomain, 2020-08-02 00:47:30 +0800
LV Status available
# open 1
LV Size 26.00 GiB
Current LE 6656
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:0
(看这里详细了解 LVM)
volume 创建成功后将 volume attach 到虚拟机上作为虚拟机的磁盘:
[root@host ~]# openstack volume list
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
| ID | Display Name | Status | Size | Attached to |
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
| c34555f0-fd26-42fe-a3b2-86098b590be2 | lianhua-vm1-vol | in-use | 26 | Attached to lianhua-vm1-vol on /dev/vdb |
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
volume attach 到虚拟机上,在虚拟机中的磁盘设备名为 /dev/vdb。进入虚拟机,查看该磁盘设备的详细信息:
[root@lianhua-vm1:/home/robot]
# fdisk -l | grep vdb
Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors
[root@lianhua-vm1:/home/robot]
# lspci
...
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
[root@lianhua-vm1:/home/robot]
# lspci -s 00:0b.0 -vvv
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
Subsystem: Red Hat, Inc. Device 0002
Physical Slot: 11
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 1000 [size=64]
Region 1: Memory at c0004000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at c0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [98] MSI-X: Enable+ Count=2 Masked-
Vector table: BAR=1 offset=00000000
PBA: BAR=1 offset=00000800
Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
BAR=0 offset=00000000 size=00000000
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
BAR=4 offset=00003000 size=00001000 multiplier=00000004
Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
BAR=4 offset=00002000 size=00001000
Capabilities: [50] Vendor Specific Information: VirtIO: ISR
BAR=4 offset=00001000 size=00001000
Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
BAR=4 offset=00000000 size=00001000
Kernel driver in use: virtio-pci
不同于 -hda 指定的磁盘设备,这里的磁盘设备名以 vd 开头,这是因为它们是通过 virtio 半虚拟化方式分配的磁盘设备,从上例可以看出磁盘设备 vdb 的 pci 号为 00:0b:0,其使用的驱动为 virtio-pci。
在 libvirt XML 文件的 devices 标签下定义 disk, 实现使用 virtio 半虚拟化方式分配磁盘设备:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/>
<target dev='vdb' bus='virtio'/>
<serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
</disk>
同理,也可以在 qemu-kvm 的 device 和 drive 选项下指定 virtio-blk-pci 参数实现 virtio 半虚拟化方式分配磁盘设备:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# /usr/libexec/qemu-kvm -m 1024 -smp 2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -monitor stdio
根据上两节的描述,部署一个简单的环境实现磁盘虚拟化及磁盘文件共享,部署环境如下:
使用 virtio 半虚拟化方式指定镜像文件实现磁盘虚拟化,虚拟出的磁盘设备名为 vda。
使用 virtio 半虚拟化方式指定 volume 实现磁盘虚拟化,虚拟出的磁盘设备名为 vdb。
在虚拟机内部使用 LVM 分割磁盘设备 vdb 为 lv volume,并将 volume 指定为文件系统。
使用 NFS 方式共享虚拟机的文件系统。
示意图如下:
1) 查看 virtio 的 libvirt XML 配置:
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/>
<target dev='vdb' bus='virtio'/>
<serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
</disk>
</devices>
vda 磁盘设备所使用的镜像文件为 /var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk,它在虚拟机的磁盘设备名为 /dev/vda,且 pci 号为 00:06:0。
使用 qemu-info 查看 disk 镜像文件:
[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# qemu-img info disk
image: disk
file format: qcow2
virtual size: 40G (42949672960 bytes)
disk size: 1.7G
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/52968ae0bfbfeef835844ee0b97be5e45d382e4c
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
可以看出,disk 分配的虚拟磁盘容量为 40G,而它现在占的磁盘空间是 1.7G。
vdb 为 volume 分配的磁盘设备,在虚拟机中的磁盘设备名为 /dev/vdb,且 pci 号为 00:0b:0。
进入虚拟机查看磁盘设备是否分配:
[root@lianhua-vm1:/home/robot]
# fdisk -l | grep vd
Disk /dev/vda: 40 GiB, 42949672960 bytes, 83886080 sectors # 这里 vda 的磁盘容量为 40G
/dev/vda1 * 2048 83886046 83883999 40G 83 Linux
Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors
[root@lianhua-vm1:/home/robot]
# lspci | grep block
00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
虚拟机中成功分配磁盘设备,且从 vda 中分出磁盘分区 vda1 给操作系统的文件系统使用。
2) 查看虚拟机内磁盘设备 vdb 分割的 lv volume:
[root@lianhua-vm1:/home/robot]
# pvdisplay
--- Physical volume ---
PV Name /dev/vdb
VG Name lianhua-vm1-vol
PV Size 26.00 GiB / not usable 4.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 6655
Free PE 1405
Allocated PE 5250
PV UUID OqdKmO-PspN-0ZKe-M0l4-0vGD-cY7k-VjZvTJ
[root@lianhua-vm1:/home/robot]
# vgdisplay
--- Volume group ---
VG Name lianhua-vm1-vol
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 7
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 6
Open LV 6
Max PV 0
Cur PV 1
Act PV 1
VG Size <26.00 GiB
PE Size 4.00 MiB
Total PE 6655
Alloc PE / Size 5250 / <20.51 GiB
Free PE / Size 1405 / <5.49 GiB
VG UUID JcVrao-YnJ7-mRpK-8Rxc-i07i-WVH4-aVgAoD
[root@lianhua-vm1:/home/robot]
# lvdisplay
--- Logical volume ---
LV Path /dev/lianhua-vm1-vol/provider_sys
LV Name provider_sys
VG Name lianhua-vm1-vol
LV UUID C6byt7-5cby-h2RT-xcLg-OJU0-Qq1E-27G6jB
LV Write Access read/write
LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800
LV Status available
# open 1
LV Size <9.77 GiB
Current LE 2500
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:0
--- Logical volume ---
LV Path /dev/lianhua-vm1-vol/provider_lianhua
LV Name provider_lianhua
VG Name lianhua-vm1-vol
LV UUID vfeZg8-PKVR-kKxv-yidf-rQqp-A7De-CvXqws
LV Write Access read/write
LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800
LV Status available
# open 1
LV Size 4.88 GiB
Current LE 1250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
--- Logical volume ---
LV Path /dev/lianhua-vm1-vol/provider_log
LV Name provider_log
VG Name lianhua-vm1-vol
LV UUID mHdD60-QjSy-sRlz-GLmK-CFIM-l42c-QGthXa
LV Write Access read/write
LV Creation host, time lianhua-vm1, 2020-08-02 00:50:12 +0800
LV Status available
# open 1
LV Size 1000.00 MiB
Current LE 250
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:2
3) 指定 lv 的文件系统为 log/sys/lianhua,并且通过 NFS 的方式共享文件系统:
[root@lianhua-vm1:/home/robot]
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 40G 7.0G 31G 19% /
/dev/mapper/lianhua-vm1-vol-provider_sys 9.1G 37M 8.6G 1% /mnt/sys
/dev/mapper/lianhua-vm1-vol-provider_lianhua 4.6G 20M 4.3G 1% /mnt/lianhua
/dev/mapper/lianhua-vm1-vol-provider_log 922M 18M 838M 3% /mnt/log
(看这里详细了解 NFS)
进入 VM2 查看文件系统是否共享成功:
[root@lianhua-vm2:/mnt/log]
# ls
[root@lianhua-vm2:/mnt/log]
# mkdir lianhua
[root@lianhua-vm2:/mnt/log]
# ls
lianhua
[root@lianhua-vm1:/mnt/log]
# ls
lianhua
文件共享成功,环境部署完毕。
芝兰生于空谷,不以无人而不芳。