尚雷 openGauss 2023-07-29 17:58 发表于四川
收录于合集#第六届openGauss技术文章征集初审合格文章62个
前言:继前几日测试部署openGauss 5.0 并写了[[Centos/RHEL 7 安装部署openGauss 5.0 企业版 一主二备一级联操作指南]](http://mp.weixin.qq.com/s?__biz=MzIyMDE3ODk1Nw==&mid=2247510278&idx=1&sn=399a4a82472f5c30967e33a556c66420&chksm=97cd1664a0ba9f72b034fe055129128f74ecb337e8ec1badbbde6226ae650b02c08fdc797641&scene=21#wechat_redirect)的文章,近日测试了openGauss 从3.1.1升级 5.0.0,在升级过程中也遇到了一些问题。也非常希望看到此文的朋友,如果你在参照此文升级过程中遇到什么问题或者对此文有什么异议的地方,也希望能和我交流,不胜感激。
一、环境概要
本套数据库环境为openGauss 3.1.1企业版一主一备环境,前期安装部署openGauss 3.1.1前已参照openGauss官网安装了依赖包、关闭了防火墙\SElinux、调整了内核参数等其它相关所要求的环境准备,数据库相关环境信息如下:
对openGauss 3企业版集群安装部署不熟悉的可参照我之前写的文章:[Centos 7 系统 openGauss 3.1.0 一主两备集群安装部署指南],文章链接:https://www.modb.pro/db/551221
1.1 主机名称
主机名称 | 描述说明 |
opengauss-db1 | 主节点服务器名称 |
opengauss-db2 | 备节点服务器名称 |
1.2 主机地址
IP地址 | 描述说明 |
10.110.3.155 | 主节点IP地址 |
10.110.3.156 | 备节点一IP地址 |
1.3 端口号信息
端口号 | 参数名称 | 描述说明 |
15300 | cmServerPortBase | 主CM Server端口号 |
15300 | cmServerPortStandby | 备CM Server端口号 |
26000 | dataPortBase | 数据库节点的基础端口号 |
1.4 用户及组信息
项目名称 | 名称 | 所属类型 | 规划建议 |
用户名 | omm | 操作系统 | 建议集群各节点密码及ID相同 |
组名 | dbgrp | 操作系统 | 建议集群各节点组ID相同 |
1.5 软件目录信息
目录名称 | 对应名称 | 目录作用 |
/opt/software/openGauss | software | 安装软件存放目录 |
/opt/gaussdb/install/app | gaussdbAppPath | 数据库安装目录 |
/var/log/omm | gaussdbLogPath | 日志目录 |
/opt/gaussdb/tmp | tmpMppdbPath | 临时文件目录 |
/opt/gaussdb/install/om | gaussdbToolPath | 数据库工具目录 |
/opt/gaussdb/corefile | corePath | 数据库core文件目录 |
/opt/gaussdb/data/cmserver | cmDir | CM数据目录 |
/opt/gaussdb/install/data/dn | dataNode | 数据库主备节点数据目录 |
1.6 XML配置文件信息
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<!-- openGauss整体信息 -->
<CLUSTER>
<!-- 数据库名称 -->
<PARAM name="clusterName" value="openGSDB" />
<!-- 数据库节点名称(hostname) -->
<PARAM name="nodeNames" value="opengauss-db1,opengauss-db2" />
<!-- 节点IP,与nodeNames一一对应 -->
<PARAM name="backIp1s" value="10.110.3.155,10.110.3.156"/>
<!-- 数据库安装目录-->
<PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" />
<!-- 日志目录-->
<PARAM name="gaussdbLogPath" value="/var/log/omm" />
<!-- 临时文件目录-->
<PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp"/>
<!--数据库工具目录-->
<PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" />
<!--数据库core文件目录-->
<PARAM name="corePath" value="/opt/gaussdb/corefile"/>
<!-- openGauss类型,此处示例为单机类型,"single-inst"表示单机一主多备部署形态-->
<PARAM name="clusterType" value="single-inst"/>
</CLUSTER>
<!-- 每台服务器上的节点部署信息 -->
<DEVICELIST>
<!-- opengauss-db1上的节点部署信息 -->
<DEVICE sn="1000001">
<!-- opengauss-db1的hostname -->
<PARAM name="name" value="opengauss-db1"/>
<!-- opengauss-db1所在的AZ及AZ优先级 -->
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="10.110.3.155"/>
<PARAM name="sshIp1" value="10.110.3.155"/>
<!--CM-->
<!--CM数据目录-->
<PARAM name="cmDir" value="/opt/gaussdb/install/cm" />
<PARAM name="cmsNum" value="1" />
<!--CM监听端口-->
<PARAM name="cmServerPortBase" value="15300" />
<PARAM name="cmServerlevel" value="1" />
<!--CM所有实例所在节点名及监听ip-->
<PARAM name="cmServerListenIp1" value="10.110.3.155,10.110.3.156" />
<PARAM name="cmServerRelation" value="opengauss-db1,opengauss-db2" />
<!--dbnode-->
<PARAM name="dataNum" value="1"/>
<!--DBnode端口号-->
<PARAM name="dataPortBase" value="26000"/>
<!--DBnode主节点上数据目录,及备机数据目录-->
<PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>
<!--DBnode节点上设定同步模式的节点数-->
<PARAM name="dataNode1_syncNum" value="0"/>
</DEVICE>
<!-- opengauss-db2上的节点部署信息,其中"name"的值配置为主机名称(hostname) -->
<DEVICE sn="1000002">
<PARAM name="name" value="opengauss-db2"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
<PARAM name="backIp1" value="10.110.3.156"/>
<PARAM name="sshIp1" value="10.110.3.156"/>
<PARAM name="cmDir" value="/opt/gaussdb/install/cm" />
</DEVICE>
</DEVICELIST>
</ROOT>
二、准备工作
2.1 下载5.0.0软件安装包
2.1.1 下载安装包
使用注册账号登录openGauss官网https://www.opengauss.org/zh/download/下载页面,下载与操作系统匹配的openGauss 5.0.0软件安装包,选择openGauss_5.0.0 企业版下载,并将下载的软件包上传至服务器/opt/software/openGauss目录下。
注:如果服务器可联网,可通过wget方式下载软件安装包。可用鼠标右键点击,然后选择“复制链接”,如数据库服务器可连外网,可在服务器上通过wget获取openGauss 5.0.0企业版软件安装包。
# root用户执行【主节点】
[root@opengauss-db1 ~]# cd /opt/software/openGauss
[root@opengauss-db1 openGauss]# wget https://opengauss.obs.cn-south-1.myhuaweicloud.com/5.0.0/x86/openGauss-5.0.0-CentOS-64bit-all.tar.gz
2.1.2 校验安装包
点击上图
后
,将复制的内容粘贴到文本文件,显示内容为:aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5,然后将下载的文件通过sha256sum命令进行校验,以确保下载安装包完整性。
# root用户执行【主节点】
[root@opengauss-db1 openGauss]# sha256sum openGauss-5.0.0-CentOS-64bit-all.tar.gz
aa9fc724c5030f4cc79dad201675183029c8f36a07667028e681169a2f6482f5 openGauss-5.0.0-CentOS-64bit-all.tar.gz
-- 如校验的值和官网SHA256值相同,表明文件完整
2.1.3 解压安装包
# root用户执行【主节点】
[root@opengauss-db1 ~]# cd /opt/software/openGauss
[root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-all.tar.gz
[root@opengauss-db1 openGauss]# tar -zxvf openGauss-5.0.0-CentOS-64bit-om.tar.gz
[root@xsky-node1 openGauss]# ll
total 261040
drwxr-xr-x 14 root root 302 Mar 29 03:22 lib
-rw-r--r-- 1 root root 133071038 Mar 29 20:11 openGauss-5.0.0-CentOS-64bit-all.tar.gz
-rw-r--r-- 1 root root 105 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.sha256
-rw-r--r-- 1 root root 22356000 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit-cm.tar.gz
-rw-r--r-- 1 root root 65 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.sha256
-rw-r--r-- 1 root root 11963876 Mar 29 03:22 openGauss-5.0.0-CentOS-64bit-om.tar.gz
-rw-r--r-- 1 root root 65 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.sha256
-rw-r--r-- 1 root root 99384569 Mar 29 03:23 openGauss-5.0.0-CentOS-64bit.tar.bz2
drwxr-xr-x 10 root root 4096 Mar 29 03:22 script
-rw------- 1 root root 65 Mar 29 03:21 upgrade_sql.sha256
-rw------- 1 root root 493211 Mar 29 03:21 upgrade_sql.tar.gz
-rw-r--r-- 1 root root 32 Mar 29 03:22 version.cfg
2.2 检查健康状态
# root用户执行【任一节点】
-- 执行 gs_checkos -i A 命令
[root@opengauss-dbxxx ~]# /opt/software/openGauss/script/gs_checkos -i A --detail
Checking items:
A1. [ OS version status ] : Normal
[opengauss-db1]
centos_7.9.2009_64bit
A2. [ Kernel version status ] : Normal
The names about all kernel versions are same. The value is "3.10.0-1160.92.1.el7.x86_64".
A3. [ Unicode status ] : Normal
The values of all unicode are same. The value is "LANG=en_US.UTF-8".
A4. [ Time zone status ] : Normal
The informations about all timezones are same. The value is "+0800".
A5. [ Swap memory status ] : Normal
The value about swap memory is correct.
A6. [ System control parameters status ] : Normal
All values about system control parameters are correct.
A7. [ File system configuration status ] : Normal
Both soft nofile and hard nofile are correct.
A8. [ Disk configuration status ] : Normal
The value about XFS mount parameters is correct.
A9. [ Pre-read block size status ] : Normal
The value about Logical block size is correct.
A11.[ Network card configuration status ] : Normal
The configuration about network card is correct.
A12.[ Time consistency status ] : Normal
The ntpd service is started, local time is "2023-07-21 16:24:44".
A13.[ Firewall service status ] : Normal
The firewall service is stopped.
A14.[ THP service status ] : Normal
The THP service is stopped.
Total numbers:13. Abnormal numbers:0. Warning numbers:0.
-- 对非Normal值要进行调整
2.3 检查磁盘空间
# root用户执行【所有节点】
-- 通过 df -H 及 df -i 查看磁盘相应信息是否可用
-- df -h 查看磁盘空间
-- df -i 查看inode空闲数
2.4 检查版本信息
-- omm 用户 【任一节点】
-- 查询所有节点版本信息
[root@opengauss-dbxxx ~]# su - omm
Last login: Fri Jul 21 16:07:06 CST 2023 on pts/1
[omm@opengauss-dbxxx ~]$ gs_ssh -c "gsql -V"
Successfully execute command on all nodes.
Output:
[SUCCESS] opengauss-db1:
gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr
[SUCCESS] opengauss-db2:
gsql (openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr
2.5 检查集群状态
-- omm 用户 【任一节点】
[omm@opengauss-dbxxx ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal
2.6 备份数据库
物理备份数据库
-- omm 用户执行【主节点】
[root@opengauss-db1 ~]# su - omm
Last login: Fri Jul 21 16:51:53 CST 2023 on pts/1
-- 创建目录
[omm@opengauss-db1 ~]$ BACKUP_DIR=/opt/gaussdb/backup/`date '+%Y%m%d_%H%M%S'`
[omm@opengauss-db1 ~]$ mkdir -p $BACKUP_DIR
-- 执行物理备份
[omm@opengauss-db1 backup]$ gs_basebackup -D $BACKUP_DIR -p 26000 -P -l $BACKUP_DIR
INFO: The starting position of the xlog copy of the full build is: 0/400E8B0. The slot minimum LSN is: 0/400E8B0. The disaster slot minimum LSN is: 0/0. The logical slot minimum LSN is: 0/0.
[2023-07-21 17:11:55]:begin build tablespace list
[2023-07-21 17:11:55]:finish build tablespace list
[2023-07-21 17:11:55]:begin get xlog by xlogstream
check identify system successpace[2023-07-21 17:11:55]:
[2023-07-21 17:11:55]: send START_REPLICATION 0/4000000 success
[2023-07-21 17:11:55]: keepalive message is received
[2023-07-21 17:11:55]: keepalive message is received
97981/97981 kB (100%), 1/1 tablespace
[2023-07-21 17:12:00]:gs_basebackup: base backup successfully
-- 查看备份信息
[omm@opengauss-db1 ~]$ ls -l /opt/gaussdb/backup/20230721_171855
total 5084
-rw------- 1 omm dbgrp 216 Jul 21 17:19 backup_label
-rw------- 1 omm dbgrp 198 Jul 21 17:19 backup_label.old
drwx------ 5 omm dbgrp 4096 Jul 21 17:19 base
-rw------- 1 omm dbgrp 0 Jul 21 17:19 build_completed.done
-rw------- 1 omm dbgrp 4399 Jul 21 17:19 cacert.pem
drwx------ 4 omm dbgrp 4096 Jul 21 17:19 dbe_perf_standby
-rw------- 1 omm dbgrp 56 Jul 21 17:19 full_backup_label
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 global
-rw------- 1 omm dbgrp 4915200 Jul 21 17:19 gswlm_userinfo.cfg
-rw------- 1 omm dbgrp 21016 Jul 21 17:19 mot.conf
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_clog
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_csnlog
-rw------- 1 omm dbgrp 0 Jul 21 17:19 pg_ctl.lock
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_errorinfo
-rw------- 1 omm dbgrp 4676 Jul 21 17:19 pg_hba.conf
-rw------- 1 omm dbgrp 4676 Jul 21 17:19 pg_hba.conf.bak
-rw------- 1 omm dbgrp 1024 Jul 21 17:19 pg_hba.conf.lock
-rw------- 1 omm dbgrp 1636 Jul 21 17:19 pg_ident.conf
drwx------ 4 omm dbgrp 4096 Jul 21 17:19 pg_llog
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_logical
drwx------ 4 omm dbgrp 4096 Jul 21 17:19 pg_multixact
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_notify
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_replslot
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_serial
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_snapshots
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_stat_tmp
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_tblspc
drwx------ 2 omm dbgrp 4096 Jul 21 17:19 pg_twophase
-rw------- 1 omm dbgrp 4 Jul 21 17:19 PG_VERSION
drwx------ 3 omm dbgrp 4096 Jul 21 17:19 pg_xlog
-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf
-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf.guc.bak
-rw------- 1 omm dbgrp 1024 Jul 21 17:19 postgresql.conf.lock
-rw------- 1 omm dbgrp 35919 Jul 21 17:19 postgresql.conf.wal.bak
-rw------- 1 omm dbgrp 0 Jul 21 17:19 postmaster.pid.lock
-rw------- 1 omm dbgrp 10 Jul 21 17:19 rewind_lable
-rw------- 1 omm dbgrp 4402 Jul 21 17:19 server.crt
-rw------- 1 omm dbgrp 1766 Jul 21 17:19 server.key
-rw------- 1 omm dbgrp 56 Jul 21 17:19 server.key.cipher
-rw------- 1 omm dbgrp 24 Jul 21 17:19 server.key.rand
-rw------- 1 omm dbgrp 4 Jul 21 17:19 term_file
drwx------ 5 omm dbgrp 4096 Jul 21 17:19 undo
2.7 停止集群
执行灰度升级,该步骤可不执行,此处停止集群,只为升级失败方便回退。
-- 停集群,omm 用户执行【主节点】
[omm@opengauss-db1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Down
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Down
cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
2.8 备份目录及文件
-- root 用户执行【所有节点】
-- 升级前建议参照clusterconfig.xml文件对相应目录及文件进行备份,以防升级失败
-- 本次测试环境数据库相应目录如下,请参照实际生产环境执行
<PARAM name="gaussdbAppPath" value="/opt/gaussdb/install/app" />
<PARAM name="gaussdbLogPath" value="/var/log/omm" />
<PARAM name="tmpMppdbPath" value="/opt/gaussdb/tmp" />
<PARAM name="gaussdbToolPath" value="/opt/gaussdb/install/om" />
<PARAM name="corePath" value="/opt/gaussdb/corefile" />
<PARAM name="dataNode1" value="/opt/gaussdb/install/data/dn,opengauss-db2,/opt/gaussdb/install/data/dn"/>
-- 备份目录
[root@opengauss-dbxxx ~]# cd /opt
[root@opengauss-dbxxx opt]# tar -czf gaussdb_3.1.1.tar ./gaussdb/
2.9 启动集群
-- 停集群,omm 用户执行【主节点】
[omm@opengauss-db1 ~]$ gs_om -t start
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state : Normal
redistributing : No
node_count : 2
Datanode State
primary : 1
standby : 1
secondary : 0
cascade_standby : 0
building : 0
abnormal : 0
down : 0
Successfully started cluster.
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal
三、执行升级
本次采用灰度升级集群
3.1 升级前预检查
# root用户执行【主节点】
[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?yes -- 输入 yes
Please enter password for root
Password:
Successfully created SSH trust for the root permission user.
Setting host ip env
Successfully set host ip env.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Are you sure you want to create the user[omm] and create trust for it (yes/no)? no -- 输入no
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Checking OS software.
Successfully check os software.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Setting Core file
Successfully set core path.
Setting pssh path
Successfully set pssh path.
Setting Cgroup.
Successfully set Cgroup.
Set ARM Optimization.
No need to set ARM Optimization.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.
-- 可通过/opt/software/openGauss/script/gs_checkos -i A -h opengauss-db1,opengauss-db2 --detail查看预检查详细信息,如有告警等信息进行处理
3.2 执行升级
# root用户执行【主节点】
[root@opengauss-db1 ~]# chmod -R 755 /opt/software/openGauss/script/
[root@opengauss-db1 ~]# chown -R omm:dbgrp /opt/software/openGauss/script/
-- 灰度升级
[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xml
Static configuration matched with old static configuration files.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Successfully set upgrade_mode to 0.
Checking upgrade environment.
Successfully checked upgrade environment.
Start to do health check.
Successfully checked cluster status.
Upgrade all nodes.
NOTICE: The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.
Performing grey rollback.
No need to rollback.
The directory /opt/gaussdb/install/app_70980198 will be deleted after commit-upgrade, please make sure there is no personal data.
Installing new binary.
Wait for the cluster status normal or degrade.
copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.
Successfully copy certs from /opt/gaussdb/install/app_70980198 to /opt/gaussdb/install/app_a07d57c3.
Successfully backup hotpatch config file.
Sync cluster configuration.
Successfully synced cluster configuration.
Switch symbolic link to new binary directory.
Successfully switch symbolic link to new binary directory.
Start check CMS parameter.
Old cluster version number less than 92574.
Switching all db processes.
Check cluster state.
Cluster state: [ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
-----------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 26000 6001 P Primary Normal
2 opengauss-db2 10.110.3.156 26000 6002 S Standby Normal
Wait for the cluster status normal or degrade.
Wait for the cluster status normal or degrade.
Create checkpoint before switching.
Start to wait for om_monitor.
Switching DN processes.
Switch DN processes for rolling upgrade.
Ready to grey start cluster.
Grey start cluster successfully.
Wait for the cluster status normal or degrade.
Successfully switch all process version
The nodes ['opengauss-db1', 'opengauss-db2'] have been successfully upgraded to new version. Then do health check.
Start to do health check.
Successfully checked cluster status.
Waiting for the cluster status to become normal.
.
The cluster status is normal.
Upgrade main process has been finished, user can do some check now.
Once the check done, please execute following command to commit upgrade:
gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml
Successfully upgrade all nodes.
-- 升级提交
[omm@opengauss-db1 ~]$ gs_upgradectl -t commit-upgrade -X /opt/software/openGauss/cluster_config.xml
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Start to do health check.
Successfully checked cluster status.
Wait for the cluster status normal or degrade.
Wait for the cluster status normal or degrade.
Start check CMS parameter.
Old cluster version number less than 92574.
Successfully cleaned old install path.
Commit upgrade succeeded.
Start check CMS parameter.
Old cluster version number less than 92574.
3.3 信息核查
3.3.1 查看版本信息
# omm用户执行【任一节点】
-- 查看版本信息
-- 版本信息为 5.0.0
[omm@opengauss-db1 ~]$ gs_om -V
gs_om (openGauss OM 5.0.0 build 244a7e05) compiled at 2023-03-29 03:22:22 commit 0 last mr
-- 查看两节点数据库版本信息,都已升级到5.0.0
[omm@opengauss-db1 ~]$ gs_ssh -c "gsql -V"
Successfully execute command on all nodes.
Output:
[SUCCESS] opengauss-db1:
gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr
[SUCCESS] opengauss-db2:
gsql (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr
3.3.2 查看集群状态信息
# omm用户执行【任一节点】
-- 集群状态信息
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 可以看到在升级后进行了主备切换
3.3.3 查看数据库信息
# omm用户执行【任一节点】
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
[omm@opengauss-db1 ~]$
[omm@opengauss-db1 ~]$
[omm@opengauss-db1 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
ERROR: cannot execute CREATE DATABASE in a read-only transaction
-- 因为发生了主备切换,连接备节点无法创建数据库
四、附录
4.1 需修改version.cfg属主和属组
执行升级前,应同时修改主备节点/opt/software/openGauss/version.cfg属主和属组,如未修改,执行升级会报错。
-- 如未修改主备节点version.cfg属主和属组,执行升级时会报如下错误
[omm@opengauss-db1 ~]$ /opt/software/openGauss/script/gs_upgradectl -t auto-upgrade --grey -X /opt/software/openGauss/cluster_config.xml
[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'
[Errno 13] Permission denied: '/opt/software/openGauss/version.cfg'
Start check CMS parameter.
float() argument must be a string or a number, not 'NoneType'
4.2 修改网卡MTU可能导致主备节点间无法SSH
在升级前预检查时,如果修改了主备节点网卡的MTU,在执行gs_upgradectl会卡主导致升级报错,此时两个节点间无法通过SSH互联,虽然可以互相ping通。
解决办法是将MTU值调整为默认1500,重启SSH服务
-- 升级预检查提示主备节点MTU值需调整,从1500调整到8192,但修改网卡MTU后执行gs_upgradectl升级卡主,最后报错,从升级日志里可看到如下相关信息:
[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent
[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.
[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.
[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.
[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.
[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.
[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.
[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.
[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156
Successfully appended authorized_key on remote node 10.110.3.156.
[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155
Successfully appended authorized_key on remote node 10.110.3.155.
[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.
[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.
[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.
[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.
[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.
[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.
[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.
[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
1, lost connection
[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.
[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.
[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.
[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.
[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
1, lost connection
-- 此时查看主备节点SSH状态也是异常
[root@opengauss-db2 ~]# systemctl status sshd.service
● sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-07-21 11:03:03 CST; 12h ago
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 2160 (sshd)
Tasks: 1
Memory: 4.2M
CGroup: /system.slice/sshd.service
└─2160 /usr/sbin/sshd -D
Jul 21 17:44:20 opengauss-db2 sshd[6374]: Accepted publickey for root from 10.110.3.155 port 63717 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 17:44:22 opengauss-db2 sshd[6417]: Accepted publickey for root from 10.110.3.155 port 63721 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 17:44:24 opengauss-db2 sshd[6463]: Accepted publickey for root from 10.110.3.155 port 63723 ssh2: ED25519 SHA256:hUo4iBgUOVXW5ONlVeD2QMdS+4snKsRs0K1K3jBLO8E
Jul 21 22:45:32 opengauss-db2 sshd[4829]: Accepted password for root from 10.110.3.155 port 30166 ssh2
Jul 21 22:45:37 opengauss-db2 sshd[4883]: Accepted password for root from 10.110.3.155 port 30172 ssh2
Jul 21 22:45:39 opengauss-db2 sshd[4922]: Connection closed by 10.110.3.155 port 30178 [preauth]
Jul 21 22:45:39 opengauss-db2 sshd[4928]: Connection closed by 10.110.3.155 port 30182 [preauth]
Jul 21 22:45:40 opengauss-db2 sshd[4930]: Accepted password for root from 10.110.3.155 port 30188 ssh2
Jul 21 23:06:46 opengauss-db2 sshd[13949]: Connection closed by 10.110.3.156 port 50810 [preauth]
Jul 21 23:27:22 opengauss-db2 sshd[22723]: Connection closed by 10.110.3.155 port 31050 [preauth]
-- 重新调整MTU,重启主备节点SSH服务
[root@opengauss-db2 ~]# systemctl restart sshd.service
[root@opengauss-db2 ~]# systemctl status sshd.service
● sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2023-07-21 23:33:29 CST; 1s ago
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 25303 (sshd)
Tasks: 1
Memory: 1.3M
CGroup: /system.slice/sshd.service
└─25303 /usr/sbin/sshd -D
Jul 21 23:33:28 opengauss-db2 systemd[1]: Starting OpenSSH server daemon...
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 60002.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 60002.
Jul 21 23:33:29 opengauss-db2 systemd[1]: Started OpenSSH server daemon.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on 0.0.0.0 port 22.
Jul 21 23:33:29 opengauss-db2 sshd[25303]: Server listening on :: port 22.
4.3 python3故障导致无法正常查看集群状态
-- 如果安装的python3故障,会导致gs_om无法查看集群状态
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
-bash: /opt/gaussdb/install/om/script/gs_om: Permission denied
4.4 集群升级后会发生主备切换
集群升级后导致主备节点发生切换,若连接原主库数据库会导致无法写入
-- 集群升级前状态信息
[omm@opengauss-db1 dn]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 集群升级后状态信息
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Standby Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Primary Normal
-- 连接原来的主库无法创建数据库
[omm@opengauss-db1 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
ERROR: cannot execute CREATE DATABASE in a read-only transaction
-- 连接新主节点可以正常创建数据库
[omm@opengauss-db2 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
openGauss=# CREATE DATABASE gaussdb WITH ENCODING 'UTF8' template = template0;
CREATE DATABASE
openGauss=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-------+-----------+---------+-------+-------------------
gaussdb | omm | UTF8 | C | C |
postgres | omm | SQL_ASCII | C | C |
template0 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
template1 | omm | SQL_ASCII | C | C | =c/omm +
| | | | | omm=CTc/omm
(4 rows)
[root@opengauss-db1 ~]# python3 /opt/software/openGauss/script/gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?no
Setting host ip env
[GAUSS-51400] : Failed to execute the command: sed -i '/^export[ ]*HOST_IP=/d' /etc/profile. Result:{'opengauss-db1': 'Success', 'opengauss-db2': 'Failure'}.
Error:
[SUCCESS] opengauss-db1:
[FAILURE] opengauss-db2:
[2023-07-21 22:45:39.414838][20984][gs_sshexkey][DEBUG]:Successfully to add id_rsa in ssh-agent
[2023-07-21 22:45:39.415698][20984][gs_sshexkey][DEBUG]:Ssh agent register successfully.
[2023-07-21 22:45:39.416461][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step5]:Successfully created the local key files.
[2023-07-21 22:45:39.417283][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Appending local ID to authorized_keys.
[2023-07-21 22:45:39.418192][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step6]:Successfully appended local ID to authorized_keys.
[2023-07-21 22:45:39.429370][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Updating the known_hosts file.
[2023-07-21 22:45:40.311033][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step7]:Successfully updated the known_hosts file.
[2023-07-21 22:45:40.311665][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Appending authorized_key on the remote node.
[2023-07-21 22:45:40.679766][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.156
Successfully appended authorized_key on remote node 10.110.3.156.
[2023-07-21 22:45:40.864480][20984][gs_sshexkey][DEBUG]:Send to 10.110.3.155
Successfully appended authorized_key on remote node 10.110.3.155.
[2023-07-21 22:45:40.921407][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step8]:Successfully appended authorized_key on all remote node.
[2023-07-21 22:45:40.921956][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Checking common authentication file content.
[2023-07-21 22:45:40.927562][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step9]:Successfully checked common authentication content.
[2023-07-21 22:45:40.928391][20984][gs_sshexkey(_log:1396)][gs_sshexkey][LOG][Step10]:Distributing SSH trust file to all node.
[2023-07-21 22:47:41.046988][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 3, retry again.
[2023-07-21 22:47:41.047776][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:47:41.089878][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:49:51.205162][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 2, retry again.
[2023-07-21 22:49:51.206276][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:49:51.240173][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:52:01.367717][20984][gs_sshexkey][DEBUG]:send_trust_file failed, coutdown 1, retry again.
[2023-07-21 22:52:01.368465][20984][gs_sshexkey][DEBUG]:errorinfo: hostip: 10.110.3.156, status: 1, output: lost connection,
[2023-07-21 22:52:01.425251][20984][gs_sshexkey][DEBUG]:check os info: drwx------ 2 root root 4096 Jul 21 22:45 .ssh
-rwxr-xr-x 1 root root 885 Dec 12 2022 ssh_key.sh
-rw-r--r-- 1 root root 521 Jul 21 11:36 sshtrust.sh
total 32
drwx------ 2 root root 4096 Jul 21 22:45 .
dr-xr-x---. 11 root root 4096 Jul 21 22:45 ..
-rw------- 1 root root 504 Jul 21 22:45 authorized_keys
-rw------- 1 root root 464 Jul 21 22:45 id_om
-rw------- 1 root root 100 Jul 21 22:45 id_om.pub
-rw------- 1 root root 1679 Jul 21 11:35 id_rsa
-rw------- 1 root root 400 Jul 21 11:35 id_rsa.pub
-rw------- 1 root root 1012 Jul 21 22:45 known_hosts
[2023-07-21 22:54:11.538969][20984][gs_sshexkey][ERROR]:[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
1, lost connection
[2023-07-21 22:54:12.110072][20463][gs_preinstall][DEBUG]:The $GAUSSHOME/bin is exist.
[2023-07-21 22:54:12.111040][20463][gs_preinstall][DEBUG]:The $GAUSS_ENV is 2.
[2023-07-21 22:54:12.111678][20463][gs_preinstall][DEBUG]:There is the upgrade is in progress.
[2023-07-21 22:54:12.112467][20463][gs_preinstall][DEBUG]:In upgrade process, no need to delete /opt/gaussdb/install/om.
[2023-07-21 22:54:12.113237][20463][gs_preinstall][ERROR]:[GAUSS-51632] : Failed to do gs_sshexkey.Error: Please enter password for current user[root].
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
[GAUSS-50223] : Failed to update the authentication files.cmd is source /root/.bashrc;scp -q -o "BatchMode yes" -o "NumberOfPasswordPrompts 0" /root/.ssh/id_om /root/.ssh/id_om.pub 10.110.3.156:.ssh/ && temp_auth=$(grep '#OM' /root/.ssh/authorized_keys) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/authorized_keys; echo *** >> /root/.ssh/authorized_keys" && temp_auth=$(grep '#OM' /root/.ssh/known_hosts) && ssh 10.110.3.156 "sed -i '/#OM/d' /root/.ssh/known_hosts; echo *** >> /root/.ssh/known_hosts"; Node:10.110.3.156. Error:
1, lost connection
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Down
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Down
cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
[omm@opengauss-db1 ~]$ cm_ctl switchover -a
cm_ctl: send switchover msg to cm_server, connect fail node_id:0, data_path:.
[omm@opengauss-db1 ~]$ cm_ctl query -Cv
[ CMServer State ]
node instance state
---------------------------------
1 opengauss-db1 1 Primary
2 opengauss-db2 2 Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node instance state | node instance state
------------------------------------------------------------------------------------------
1 opengauss-db1 6001 P Primary Normal | 2 opengauss-db2 6002 S Standby Normal
[omm@opengauss-db1 ~]$ gs_om -t status --detail --all
[ CMServer State ]
node node_ip instance state
-------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 1 /opt/gaussdb/install/cm/cm_server Primary
2 opengauss-db2 10.110.3.156 2 /opt/gaussdb/install/cm/cm_server Standby
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
------------------------------------------------------------------------------------
1 opengauss-db1 10.110.3.155 6001 /opt/gaussdb/install/data/dn P Primary Normal
2 opengauss-db2 10.110.3.156 6002 /opt/gaussdb/install/data/dn S Standby Normal
```![image20230721172148201.png](https://oss-emcsprod-public.modb.pro/image/editor/20230722-c4e42479-422f-4161-979d-2fba202f7337.png)