1 磁盘虚拟化简介

QEMU-KVM 提供磁盘虚拟化,从虚拟机角度看其自身拥有的磁盘即是实际的物理磁盘。实际上,虚拟机读写的磁盘数据保存在 host 上的物理磁盘。

 

QEMU-KVM 主要有如下几种方式虚拟磁盘:

本地存储虚拟机镜像文件。

host 上物理磁盘或磁盘分区。

LVM(Logical Volume Management),逻辑分区。

NFS(Network File System),网络文件系统。

GFS(Gluster File System),分布式文件系统。

2 磁盘虚拟化配置

本节针对常用的虚拟磁盘方式进行介绍,包括本地存储虚拟机镜像文件和 LVM 逻辑分区:

2.1 本地存储镜像

本地存储镜像文件,首先需要在本地创建镜像文件,然后指定本地镜像文件为虚拟机的磁盘。

通过 qemu-img 命令创建镜像文件,qemu-img 是编译安装完 QEMU 即默认自带的软件程序,常用的 qemu-img 选项有 create 和 info,create 用来创建镜像 img,info 用来查看镜像信息:

[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo.img -o preallocation=off 10G
Formatting 'lianhua_demo.img', fmt=raw size=10737418240 preallocation=off
 
real    0m0.040s
user    0m0.015s
sys     0m0.015s
 
[lianhua@host ~]$ qemu-img info lianhua_demo.img
image: lianhua_demo.img
file format: raw
virtual size: 10G (10737418240 bytes)
disk size: 0

 

如上所示,创建了一个名为 lianhua_demo.img 的格式为 raw 的镜像,该镜像大小为 10G。但是,使用 info 查看镜像信息时,镜像的实际大小为 0(disk size)。这是由于 raw 格式的镜像可以指定自己为稀疏文件,如果是稀疏文件,那么只在写数据到镜像的时候才会真正为其分配空间。

 

qemu-img 的 preallocation 选项可以指定是否预分配空间,它有三个值,off/full 和 falloc。off 表示禁止预分配空间;full 表示为镜像预分配空间,预分配的方式是给镜像逐字节写 0;falloc 表示预分配磁盘空间给镜像文件,但不往镜像文件中写数据。比较上述三种分配方式,如下:

[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_full.img -o preallocation=full 10G
Formatting 'lianhua_demo_on.img', fmt=raw size=10737418240 preallocation=full
 
real    0m22.955s
user    0m0.013s
sys     0m8.930s

[lianhua@host ~]$ time qemu-img create -f raw lianhua_demo_falloc.img -o preallocation=falloc 10G
Formatting 'lianhua_demo_falloc.img', fmt=raw size=10737418240 preallocation=falloc
 
real    0m8.256s
user    0m0.008s
sys     0m8.114s
 
[lianhua@host ~]$ du -h lianhua_demo*.img
11G     lianhua_demo_falloc.img
0       lianhua_demo.img
11G     lianhua_demo_full.img

 

镜像格式有多种,除了 raw 外,还有常用的磁盘格式 qcow2,vdi 等。

镜像分配完磁盘空间后,使用 qemu-kvm 创建虚拟机,并且将镜像文件作为虚拟机的磁盘,命令如下:

[lianhua@host ~]$ /usr/libexec/qemu-kvm -m 1024 -smp 2 -hda lianhua_demo_falloc.img -monitor stdio
WARNING: Image format was not specified for 'lianhua_demo_falloc.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) VNC server running on '::1;5900'
 
(qemu) info pci
  Bus  0, device   0, function 0:
    Host bridge: PCI device 8086:1237
      id ""
  Bus  0, device   1, function 1:
    IDE controller: PCI device 8086:7010
      BAR4: I/O at 0xc040 [0xc04f].
      id ""
  Bus  0, device   1, function 3:
    Bridge: PCI device 8086:7113
      IRQ 9.
      id ""
  Bus  0, device   3, function 0:
    Ethernet controller: PCI device 8086:100e
      IRQ 11.
      BAR0: 32 bit memory at 0xfebc0000 [0xfebdffff].
      BAR1: I/O at 0xc000 [0xc03f].
      BAR6: 32 bit memory at 0xffffffffffffffff [0x0003fffe].
      id ""

 

可以看出,镜像在虚拟机中的 pci 号为 00:01:1,且该设备为 IDE 设备。qemu-kvm 的 -hda 选项将镜像文件作为虚拟机的第一个 IDE 设备,在虚拟机中表现为 /dev/hda 设备或 /dev/sda 设备(驱动不同,表现的设备名称不同),更多磁盘选项配置可以查看 qemu-kvm 的 man 文档。

2.2 LVM 逻辑分区

LVM 逻辑分区,首先需要使用 LVM 创建 volume,然后将此 volume attach 到虚拟机上作为虚拟机的磁盘。

 

在 OpenStack 平台上创建 LVM volume,如下:

[root@host ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/loop2
  VG Name               cinder-volumes
  PV Size               602.34 GiB / not usable 4.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              154199
  Free PE               146519
  Allocated PE          7680
  PV UUID               pTkQ5Z-zNdc-LRrn-qWAX-13D6-bhbG-DdcGFD
 
[root@host ~]# vgdisplay
  --- Volume group ---
  VG Name               cinder-volumes
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1447
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               602.34 GiB
  PE Size               4.00 MiB
  Total PE              154199
  Alloc PE / Size       7680 / 30.00 GiB
  Free  PE / Size       146519 / 572.34 GiB
  VG UUID               Mrrh1r-qKQw-bCgW-0WXi-d5Bd-OiVV-cTBrg5
 
[root@host ~]# lvdisplay
  --- Logical volume ---
  LV Path                /dev/cinder-volumes/volume-c34555f0-fd26-42fe-a3b2-86098b590be2
  LV Name                volume-c34555f0-fd26-42fe-a3b2-86098b590be2
  VG Name                cinder-volumes
  LV UUID                n81Af6-cWEe-LvAm-wjA3-vgKD-RxgV-qMtIq6
  LV Write Access        read/write
  LV Creation host, time host.localdomain, 2020-08-02 00:47:30 +0800
  LV Status              available
  # open                 1
  LV Size                26.00 GiB
  Current LE             6656
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0

(看这里详细了解 LVM)

 

volume 创建成功后将 volume attach 到虚拟机上作为虚拟机的磁盘:

[root@host ~]# openstack volume list
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
| ID                                   | Display Name                               | Status | Size | Attached to                                 |
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+
| c34555f0-fd26-42fe-a3b2-86098b590be2 | lianhua-vm1-vol                            | in-use |   26 | Attached to lianhua-vm1-vol on /dev/vdb     |
+--------------------------------------+--------------------------------------------+--------+------+---------------------------------------------+

 

volume attach 到虚拟机上,在虚拟机中的磁盘设备名为 /dev/vdb。进入虚拟机,查看该磁盘设备的详细信息:

[root@lianhua-vm1:/home/robot]
# fdisk -l | grep vdb
Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors
 
[root@lianhua-vm1:/home/robot]
# lspci
...
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
 
[root@lianhua-vm1:/home/robot]
# lspci -s 00:0b.0 -vvv
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device
        Subsystem: Red Hat, Inc. Device 0002
        Physical Slot: 11
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 10
        Region 0: I/O ports at 1000 [size=64]
        Region 1: Memory at c0004000 (32-bit, non-prefetchable) [size=4K]
        Region 4: Memory at c0000000 (64-bit, prefetchable) [size=16K]
        Capabilities: [98] MSI-X: Enable+ Count=2 Masked-
                Vector table: BAR=1 offset=00000000
                PBA: BAR=1 offset=00000800
        Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
                BAR=0 offset=00000000 size=00000000
        Capabilities: [70] Vendor Specific Information: VirtIO: Notify
                BAR=4 offset=00003000 size=00001000 multiplier=00000004
        Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
                BAR=4 offset=00002000 size=00001000
        Capabilities: [50] Vendor Specific Information: VirtIO: ISR
                BAR=4 offset=00001000 size=00001000
        Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
                BAR=4 offset=00000000 size=00001000
        Kernel driver in use: virtio-pci

 

不同于 -hda 指定的磁盘设备,这里的磁盘设备名以 vd 开头,这是因为它们是通过 virtio 半虚拟化方式分配的磁盘设备,从上例可以看出磁盘设备 vdb 的 pci 号为 00:0b:0,其使用的驱动为 virtio-pci。

 

在 libvirt XML 文件的 devices 标签下定义 disk, 实现使用 virtio 半虚拟化方式分配磁盘设备:

<disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/>
      <target dev='vdb' bus='virtio'/>
      <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
</disk>

 

同理,也可以在 qemu-kvm 的 device 和 drive 选项下指定 virtio-blk-pci 参数实现 virtio 半虚拟化方式分配磁盘设备:

[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# /usr/libexec/qemu-kvm -m 1024 -smp 2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk.config,format=raw,if=none,id=drive-ide0-0-0,readonly=on,cache=none -monitor stdio
3 磁盘虚拟化环境部署

根据上两节的描述,部署一个简单的环境实现磁盘虚拟化及磁盘文件共享,部署环境如下:

使用 virtio 半虚拟化方式指定镜像文件实现磁盘虚拟化,虚拟出的磁盘设备名为 vda。

使用 virtio 半虚拟化方式指定 volume 实现磁盘虚拟化,虚拟出的磁盘设备名为 vdb。

在虚拟机内部使用 LVM 分割磁盘设备 vdb 为 lv volume,并将 volume 指定为文件系统。

使用 NFS 方式共享虚拟机的文件系统。

 

示意图如下:


1) 查看 virtio 的 libvirt XML 配置:

<devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/disk/by-path/ip-172.18.0.22:3260-iscsi-iqn.2010-10.org.openstack:volume-c34555f0-fd26-42fe-a3b2-86098b590be2-lun-0'/>
      <target dev='vdb' bus='virtio'/>
      <serial>c34555f0-fd26-42fe-a3b2-86098b590be2</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </disk>
</devices>

 

vda 磁盘设备所使用的镜像文件为 /var/lib/nova/instances/2177d777-2a46-4e5b-ac92-ba7ad27e21a3/disk,它在虚拟机的磁盘设备名为 /dev/vda,且 pci 号为 00:06:0。

使用 qemu-info 查看 disk 镜像文件:

[root@host 2177d777-2a46-4e5b-ac92-ba7ad27e21a3]# qemu-img info disk
image: disk
file format: qcow2
virtual size: 40G (42949672960 bytes)
disk size: 1.7G
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/52968ae0bfbfeef835844ee0b97be5e45d382e4c
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

可以看出,disk 分配的虚拟磁盘容量为 40G,而它现在占的磁盘空间是 1.7G。

 

vdb 为 volume 分配的磁盘设备,在虚拟机中的磁盘设备名为 /dev/vdb,且 pci 号为 00:0b:0。

 

进入虚拟机查看磁盘设备是否分配:

[root@lianhua-vm1:/home/robot]
# fdisk -l | grep vd
Disk /dev/vda: 40 GiB, 42949672960 bytes, 83886080 sectors              # 这里 vda 的磁盘容量为 40G
/dev/vda1  *     2048 83886046 83883999  40G 83 Linux                   
Disk /dev/vdb: 26 GiB, 27917287424 bytes, 54525952 sectors
 
[root@lianhua-vm1:/home/robot]
# lspci | grep block
00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:0b.0 SCSI storage controller: Red Hat, Inc. Virtio block device

 

虚拟机中成功分配磁盘设备,且从 vda 中分出磁盘分区 vda1 给操作系统的文件系统使用。

 

2) 查看虚拟机内磁盘设备 vdb 分割的 lv volume:

[root@lianhua-vm1:/home/robot]
# pvdisplay
  --- Physical volume ---
  PV Name               /dev/vdb
  VG Name               lianhua-vm1-vol
  PV Size               26.00 GiB / not usable 4.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              6655
  Free PE               1405
  Allocated PE          5250
  PV UUID               OqdKmO-PspN-0ZKe-M0l4-0vGD-cY7k-VjZvTJ
 
[root@lianhua-vm1:/home/robot]
# vgdisplay
  --- Volume group ---
  VG Name               lianhua-vm1-vol
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  7
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                6
  Open LV               6
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <26.00 GiB
  PE Size               4.00 MiB
  Total PE              6655
  Alloc PE / Size       5250 / <20.51 GiB
  Free  PE / Size       1405 / <5.49 GiB
  VG UUID               JcVrao-YnJ7-mRpK-8Rxc-i07i-WVH4-aVgAoD
 
[root@lianhua-vm1:/home/robot]
# lvdisplay
  --- Logical volume ---
  LV Path                /dev/lianhua-vm1-vol/provider_sys
  LV Name                provider_sys
  VG Name                lianhua-vm1-vol
  LV UUID                C6byt7-5cby-h2RT-xcLg-OJU0-Qq1E-27G6jB
  LV Write Access        read/write
  LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800
  LV Status              available
  # open                 1
  LV Size                <9.77 GiB
  Current LE             2500
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0
 
  --- Logical volume ---
  LV Path                /dev/lianhua-vm1-vol/provider_lianhua
  LV Name                provider_lianhua
  VG Name                lianhua-vm1-vol
  LV UUID                vfeZg8-PKVR-kKxv-yidf-rQqp-A7De-CvXqws
  LV Write Access        read/write
  LV Creation host, time lianhua-vm1, 2020-08-02 00:50:11 +0800
  LV Status              available
  # open                 1
  LV Size                4.88 GiB
  Current LE             1250
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1
 
  --- Logical volume ---
  LV Path                /dev/lianhua-vm1-vol/provider_log
  LV Name                provider_log
  VG Name                lianhua-vm1-vol
  LV UUID                mHdD60-QjSy-sRlz-GLmK-CFIM-l42c-QGthXa
  LV Write Access        read/write
  LV Creation host, time lianhua-vm1, 2020-08-02 00:50:12 +0800
  LV Status              available
  # open                 1
  LV Size                1000.00 MiB
  Current LE             250
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:2

 

3) 指定 lv 的文件系统为 log/sys/lianhua,并且通过 NFS 的方式共享文件系统:

[root@lianhua-vm1:/home/robot]
# df -h
Filesystem                                                                                     Size  Used Avail Use% Mounted on
/dev/vda1                                                                                       40G  7.0G   31G  19% /
/dev/mapper/lianhua-vm1-vol-provider_sys                                                        9.1G   37M  8.6G   1% /mnt/sys
/dev/mapper/lianhua-vm1-vol-provider_lianhua                                                    4.6G   20M  4.3G   1% /mnt/lianhua
/dev/mapper/lianhua-vm1-vol-provider_log                                                        922M   18M  838M   3% /mnt/log

(看这里详细了解 NFS)

 

进入 VM2 查看文件系统是否共享成功:

[root@lianhua-vm2:/mnt/log]
# ls
[root@lianhua-vm2:/mnt/log]
# mkdir lianhua
[root@lianhua-vm2:/mnt/log]
# ls
lianhua
 
[root@lianhua-vm1:/mnt/log]
# ls
lianhua

 

文件共享成功,环境部署完毕。

 

 

 

 

芝兰生于空谷,不以无人而不芳。