openstack stein版本在虚拟机热迁移的时候,在目标宿主机上,有如下报错:
Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server [req-866c0bbc-8b80-4374-ac69-8caebba87b64 b5451d5a424d4de7a7b36a42e911b6d8 ddcb686a055047b4ab9ab9bb0bf66258 - default default] Exception during message handling: InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
0
Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 166, in _process_incoming
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 79, in wrapped
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server function_name, call_dict, binary, tb)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server self.force_reraise()
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/utils.py", line 1346, in decorated_function
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 215, in decorated_function
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server self.force_reraise()
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 203, in decorated_function
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6514, in check_can_live_migrate_destination
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server block_migration, disk_over_commit)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7326, in check_can_live_migrate_destination
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server self._compare_cpu(None, source_cpu_info, instance)
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 7588, in _compare_cpu
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server 0
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult
2021-03-25 23:07:16.729 17480 ERROR oslo_messaging.rpc.server
问题
从字面上看,就是CPU有不支持的特性。
解法
由于openstack提供了配置CPU型号的方法,所以从逻辑上,我们应该有2中方法解决该问题
- 修改nova的CPU mode
- 修改源码
实践
修改CPU mode
openstack 默认情况下virt_type使用的是"kvm"。该模式一下,CPU model支持如下是那种配置:
- host-passthrough:libvirt 令 KVM 把宿主机的 CPU 指令集全部透传给虚拟机。因此虚拟机能够最大限度的使用宿主机 CPU 指令集,故性能是最好的。但是在热迁移时,它要求目的节点的 CPU 和源节点的一致。
- host-model: libvirt 根据当前宿主机 CPU 指令集从配置文件 /usr/share/libvirt/cpu_map.xml 选择一种最相配的 CPU 型号。在这种 mode 下,虚拟机的指令集往往比宿主机少,性能相对 host-passthrough 要差一点,但是热迁移时,它允许目的节点 CPU 和源节点的存在一定的差异。
- custom:这种模式下虚拟机 CPU 指令集数最少,故性能相对最差,但是它在热迁移时跨不同型号 CPU 的能力最强。此外,custom 模式下支持用户添加额外的指令集。该模式下,需要同时配置cpu_model选项。该选项的值,参考/usr/share/libvirt/cpu_map.xml。
注:在stein版本中,我们发现并没有/usr/share/libvirt/cpu_map.xml文件,但是在/usr/share/libvirt/cpu_map目录中,存放了支持的cpu架构。
三种mode的性能排序是:host-passthrough > host-model > custom根据 HEPSpec06 测试标准给出了如下性能数据。
host-passthrough | host-model | custom |
100% | 95.84% | 94.73% |
三种mode的热迁移通用性是: custom > host-model > host-passthrough
本次测试配置如下,一下测试均只需要在虚拟机目前运行以及将要迁移的目标宿主机上修改,修改完以后重启nova服务,并强制重启已有的虚拟机:
方案1:host-passthrough模式
[libvirt]
virt_type=kvm
cpu_mode=host-passthrough
方案2:host-model模式
[libvirt]
virt_type=kvm
cpu_mode=host-model
方案3:custom模式。由于我们使用的是Gold 6230CPU,采用的是Skylake-SP核心,所以我们测试的时候cpu_model选择Skylake-Server。
[libvirt]
virt_type=kvm
cpu_mode=custom
cpu_model=Haswell
在三中方案中,只有第三种方案,最终能够正确迁移。
修改源码
从报错中,我们看到问题出在/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py文件的7588行。我们review一下代码:
def _compare_cpu(self, guest_cpu, host_cpu_str, instance):
"""Check the host is compatible with the requested CPU
:param guest_cpu: nova.objects.VirtCPUModel or None
:param host_cpu_str: JSON from _get_cpu_info() method
If the 'guest_cpu' parameter is not None, this will be
validated for migration compatibility with the host.
Otherwise the 'host_cpu_str' JSON string will be used for
validation.
:returns:
None. if given cpu info is not compatible to this server,
raise exception.
"""
# NOTE(kchamart): Comparing host to guest CPU model for emulated
# guests (<domain type='qemu'>) should not matter -- in this
# mode (QEMU "TCG") the CPU is fully emulated in software and no
# hardware acceleration, like KVM, is involved. So, skip the CPU
# compatibility check for the QEMU domain type, and retain it for
# KVM guests.
if CONF.libvirt.virt_type not in ['kvm']:
return
if guest_cpu is None:
info = jsonutils.loads(host_cpu_str)
LOG.info('Instance launched has CPU info: %s', host_cpu_str)
cpu = vconfig.LibvirtConfigCPU()
cpu.arch = info['arch']
cpu.model = info['model']
cpu.vendor = info['vendor']
cpu.sockets = info['topology']['sockets']
cpu.cores = info['topology']['cores']
cpu.threads = info['topology']['threads']
for f in info['features']:
cpu.add_feature(vconfig.LibvirtConfigCPUFeature(f))
else:
cpu = self._vcpu_model_to_cpu_config(guest_cpu)
# s390x doesn't support cpu model in host info, so compare
# cpu info will raise an error anyway, thus have to avoid check
# see bug 1854126 for more info
min_libvirt_version = (5, 9, 0)
if (cpu.arch in (arch.S390X, arch.S390) and
not self._host.has_min_version(min_libvirt_version)):
LOG.debug("on s390x platform, the min libvirt version "
"support cpu model compare is %s",
min_libvirt_version)
return
u = ("http://libvirt.org/html/libvirt-libvirt-host.html#"
"virCPUCompareResult")
m = _("CPU doesn't have compatibility.\n\n%(ret)s\n\nRefer to %(u)s")
# unknown character exists in xml, then libvirt complains
try:
cpu_xml = cpu.to_xml()
LOG.debug("cpu compare xml: %s", cpu_xml, instance=instance)
ret = self._host.compare_cpu(cpu_xml)
except libvirt.libvirtError as e:
error_code = e.get_error_code()
if error_code == libvirt.VIR_ERR_NO_SUPPORT:
LOG.debug("URI %(uri)s does not support cpu comparison. "
"It will be proceeded though. Error: %(error)s",
{'uri': self._uri(), 'error': e})
return
else:
LOG.error(m, {'ret': e, 'u': u})
raise exception.MigrationPreCheckError(
reason=m % {'ret': e, 'u': u})
if ret <= 0:
LOG.error(m, {'ret': ret, 'u': u})
raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})
从报错的代码段上看:
- 是从ret <= 0 逻辑触发的。
- 而ret 是 ret = self._host.compare_cpu(cpu_xml) 代码进行了赋值。
- self._host.compare_cpu(cpu_xml)调用了compare_cpu()函数。compare_cpu()函数传入了cpu_xml参数
- cpu_xml参数是通过cpu_xml = cpu.to_xml()赋值。
- cpu这个变量是在def _compare_cpu()的时候生成的。
从_compare_cpu()的注释中,我们可以看出来,在guest_cpu等于none的时候,是要做CPU的属性比较的。guest_cpu实际上就是nova配置文件中的cpu_model。cpu_model不为none的时候,直接读取指定的cpu_model信息;如果为none,则需要从新生成。生成的信息主要包括【arch、model、vendor、sockets、cores、threads以及info[‘features’]】,下面是通过debug输出的info的信息,从信息中可以看到info[‘features’]主要指CPU的指令集:
{u’arch’: u’x86_64’, u’model’: u’Skylake-Server-IBRS’, u’vendor’: u’Intel’, u’features’: [u’pku’, u’rtm’, u’tsc_adjust’, u’vme’, u’pge’, u’xsaveopt’, u’smep’, u’fpu’, u’monitor’, u’lm’, u’tsc’, u’adx’, u’fxsr’, u’tm’, u’pclmuldq’, u’xgetbv1’, u’tsc-deadline’, u’arat’, u’de’, u’aes’, u’pse’, u’sse’, u’f16c’, u’ds’, u’mpx’, u’avx512f’, u’avx2’, u’pbe’, u’cx16’, u’ds_cpl’, u’movbe’, u’intel-pt’, u’vmx’, u’sep’, u’avx512dq’, u’stibp’, u’xsave’, u’erms’, u’hle’, u’est’, u’smx’, u’abm’, u’sse4.1’, u’sse4.2’, u’ssbd’, u’acpi’, u’mmx’, u’osxsave’, u’clwb’, u’dca’, u’popcnt’, u’invtsc’, u’tm2’, u’pcid’, u’pdcm’, u’avx512vl’, u’x2apic’, u’smap’, u’clflush’, u’dtes64’, u’xtpr’, u’avx512bw’, u’msr’, u’fma’, u’cx8’, u’mce’, u’avx512cd’, u’ht’, u’lahf_lm’, u’rdseed’, u’apic’, u’fsgsbase’, u’rdtscp’, u’ssse3’, u’pse36’, u’mtrr’, u’avx’, u’syscall’, u’invpcid’, u’cmov’, u’spec-ctrl’, u’clflushopt’, u’pat’, u’3dnowprefetch’, u’nx’, u’pae’, u’avx512vnni’, u’mca’, u’pdpe1gb’, u’rdrand’, u’xsavec’, u’pni’, u’sse2’, u’ss’, u’bmi1’, u’bmi2’, u’xsaves’, u’arch-facilities’], u’topology’: {u’cores’: 20, u’cells’: 2, u’threads’: 2, u’sockets’: 1}}
我们尝试把features给去掉:
......
cpu.arch = info['arch']
cpu.model = info['model']
cpu.vendor = info['vendor']
cpu.sockets = info['topology']['sockets']
cpu.cores = info['topology']['cores']
cpu.threads = info['topology']['threads']
#for f in info['features']:
# cpu.add_feature(vconfig.LibvirtConfigCPUFeature(f))
else:
cpu = self._vcpu_model_to_cpu_config(guest_cpu)
......
注:虚拟机当前运行节点和准备迁移过去的节点都要修改。
重启一下nova服务:
systemctl restart openstack-nova-compute.service
测试正常。但是有一个严重的问题:
因为屏蔽了cpu特性的检查,在异构cpu计算节点上也可以迁移。迁移完了以后,虚拟机的cpu信息和宿主机的cpu信息不一致。