unable to open OSD superblock on /var/lib/ceph/osd/ceph-45
key: use_rdma, val: 1
default pool attr 0, nr_hugepages 0, no neet set hugepages
2021-03-28 21:46:07.548765 7f904d555ec0 2818102 20 ERROR bluestore(/var/lib/ceph/osd/ceph-45/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-45/block: (2) No such file or directory
2021-03-28 21:46:07.548788 7f904d555ec0 2818102 20 ERROR ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-45: (2) No such file or directory
OSD 号不对,ceph osd tree 查看
[root@rdma61 osd]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-10 0 root maintain
-9 123.71407 root hddpool
-11 123.71407 rack rack.hddpool
-15 36.38649 host rdma61.hddpool
3 hdd 7.27730 osd.3 down 1.00000 1.00000
6 hdd 7.27730 osd.6 down 1.00000 1.00000
10 hdd 7.27730 osd.10 down 1.00000 1.00000
13 hdd 7.27730 osd.13 down 1.00000 1.00000
16 hdd 7.27730 osd.16 down 1.00000 1.00000
-6 43.66379 host rdma63.hddpool
1 hdd 7.27730 osd.1 down 1.00000 1.00000
4 hdd 7.27730 osd.4 down 1.00000 1.00000
7 hdd 7.27730 osd.7 down 1.00000 1.00000
9 hdd 7.27730 osd.9 down 1.00000 1.00000
12 hdd 7.27730 osd.12 down 1.00000 1.00000
15 hdd 7.27730 osd.15 down 1.00000 1.00000
-3 43.66379 host rdma64.hddpool
2 hdd 7.27730 osd.2 down 1.00000 1.00000
5 hdd 7.27730 osd.5 down 1.00000 1.00000
8 hdd 7.27730 osd.8 down 1.00000 1.00000
11 hdd 7.27730 osd.11 down 1.00000 1.00000
14 hdd 7.27730 osd.14 down 1.00000 1.00000
17 hdd 7.27730 osd.17 down 1.00000 1.00000
-5 10.47839 root ssdpool
-25 10.47839 rack rack.ssdpool
-28 3.49280 host rdma61.ssdpool
18 ssd 0.87320 osd.18 down 1.00000 1.00000
21 ssd 0.87320 osd.21 down 1.00000 1.00000
24 ssd 0.87320 osd.24 down 1.00000 1.00000
28 ssd 0.87320 osd.28 down 1.00000 1.00000
-31 3.49280 host rdma63.ssdpool
20 ssd 0.87320 osd.20 down 1.00000 1.00000
22 ssd 0.87320 osd.22 down 1.00000 1.00000
25 ssd 0.87320 osd.25 down 1.00000 1.00000
27 ssd 0.87320 osd.27 down 1.00000 1.00000
-34 3.49280 host rdma64.ssdpool
19 ssd 0.87320 osd.19 down 1.00000 1.00000
23 ssd 0.87320 osd.23 down 1.00000 1.00000
26 ssd 0.87320 osd.26 down 1.00000 1.00000
29 ssd 0.87320 osd.29 down 0 1.00000
-1 0 root default
未挂盘
[root@ceph]#lsblk
解决Ceph ERROR: unable to open OSD superblock - 简书
fio 报错
[root@localhost ~]# ./fio --ioengine=rbd --iodepth=4 --numjobs=8 --pool=.rbdpool.rbd --rbdname=lun0 --name=write5 --rw=randwrite --bs=1M --size=10G --group_reporting --direct=1
write5: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=4
...
fio-2.1.10
Starting 8 processes
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
rbd engine: RBD version: 1.12.0
rados_ioctx_create failed.
fio_rbd_connect failed.
OSD 池不对
fio 测到一半段错误
[2021/9/18 14:54] liangchaoxi (Cloud):
无标题
Core was generated by `fio fio_10.conf'.
Program terminated with signal 11, Segmentation fault.
#0 Mutex::lock (this=0xab1aaa000, no_lockdep=false) at /compiledir/zhangtao/ceph-L/src/common/Mutex.cc:97
97 /compiledir/zhangtao/ceph-L/src/common/Mutex.cc: 没有那个文件或目录.
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64 glibc-2.17-324.el7_9.x86_64 nspr-4.25.0-2.el7_9.x86_64 nss-3.53.1-7.el7_9.x86_64 nss-softokn-3.53.1-6.el7_9.x86_64 nss-softokn-freebl-3.53.1-6.el7_9.x86_64 nss-util-3.53.1-1.el7_9.x86_64 xz-libs-5.2.2-1.el7.x86_64
(gdb) bt
#0 Mutex::lock (this=0xab1aaa000, no_lockdep=false) at /compiledir/zhangtao/ceph-L/src/common/Mutex.cc:97
#1 0x00007f90effe1710 in librbd::io::AioCompletion::get_return_value() () from /opt/h3c/lib/librbd.so.1
#2 0x000000000045d6f4 in _fio_rbd_finish_aiocb (comp=<optimized out>, data=<optimized out>) at engines/rbd.c:190
#3 0x00007f90effe0c35 in librbd::io::AioCompletion::complete() () from /opt/h3c/lib/librbd.so.1
#4 0x00007f90effe1ba3 in librbd::io::AioCompletion::complete_request(long, bool) () from /opt/h3c/lib/librbd.so.1
#5 0x00007f90f0000ee6 in librbd::io::ReadResult::C_SparseReadRequestBase::finish(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >&, std::vector<std::pair<unsigned long, unsigned long>, std::allocator<std::pair<unsigned long, unsigned long> > > const&, unsigned long, unsigned long, ceph::buffer::list&, int) () from /opt/h3c/lib/librbd.so.1
#6 0x00007f90effe43fd in librbd::io::ReadResult::C_SparseReadRequest<librbd::ImageCtx>::finish(int) () from /opt/h3c/lib/librbd.so.1
#7 0x00007f90eff06dc9 in Context::complete(int) () from /opt/h3c/lib/librbd.so.1
#8 0x00007f90efffbc8f in librbd::io::ObjectRequest<librbd::ImageCtx>::complete(int) () from /opt/h3c/lib/librbd.so.1
#9 0x00007f90efb292b2 in librados::C_AioComplete::finish(int) () from /opt/h3c/lib/librados.so.2
#10 0x00007f90efaff5b9 in Context::complete(int) () from /opt/h3c/lib/librados.so.2
#11 0x00007f90e5c92268 in Finisher::finisher_thread_entry (this=0x7f90ac12cd90) at /compiledir/zhangtao/ceph-L/src/common/Finisher.cc:83
#12 0x00007f90eef35ea5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f90eea5a9fd in clone () from /lib64/libc.so.6
原来是fio 版本不对,将fio 2.20 替换成fio 3.20就好了
ceph -s 失败……timed out
例子:
[root@rdma55 ~]# ceph -s
2021-08-24 19:02:27.925124 7f6a05f63700 548321 2 WARNING monclient(hunting): authenticate timed out after 75
2021-08-24 19:02:27.925168 7f6a05f63700 548321 2 WARNING librados: client.admin authentication error (110) Connection timed out
原因:
查看/var/log/ceph/ceph-mon.log 看到
monitor data filesystem reached concerning levels of available storage space
you may adjust mon data avail crit to a lower value to make this go away (default: 5%)
解决办法见本文:Mon节点启动失败
Mon节点启动失败
错误描述:monitor data filesystem reached concerning levels of available storage space
you may adjust mon data avail crit to a lower value to make this go away (default: 5%)
这里涉及到一个参数:mon-data-avail-crit,这个参数用来监控mon节点的数据存储,ceph mon节点数据存储default情况是在目录: /var/lib/ceph目录下,如果该目录的使用率超过95%,就会导致Mon进程被kill掉。
解决方法:
说明磁盘空间不够了,查看一下磁盘是什么数据占用大量的空间。(存储的数据、core文件等),清除无用数据腾出空间。
清理 /var/lib/这个目录的数据,或者设置参数:mon-data-avail-crit
《Ceph集群常见问题处理方法》
《ceph搭建过程中遇到的问题汇总》
经验教训:
fio 版本问题可能会导致fio测试的过程中崩溃。
日志默认是开启的,日志没关将导致性能很低。
部署ceph后,配置文件中缺少rdma和libgo的配置、/etc/ceph/目录下没有libgo的配置文件libgo_josn
编译环境上的libgo版本需要与运行环境上从libgo版本相同。
性能太低
日志没关
broken_new
engine.15.20 engine_create_state: broken_new
可能是元数据坏了,删池善卷重建