【我和openGauss的故事】openGauss 5.0.0企业版两节点CM高可用实践

怕晒的太阳 openGauss 2023-08-07 18:00 发表于中国香港

引言

CM支持VIP管理

1.支持业务配置VIP连接数据库,当主机故障,发生主备切换时,业务连接可自动重连到新的主机(毫秒级别);

2.当数据库出现双主时,通过VIP连接数据库可确保连接唯一的主机,降低双主丢数据的风险。

CM支持两节点部署

1.通过引入第三方网关IP,有效解决CM集群两节点部署模式下自仲裁问题,支持CMS和DN;

2.同时支持动态配置CM集群故障切换策略和数据库集群脑裂故障恢复策略,从而能够尽可能确保集群数据的完整性和一致性。

安装准备

安装准备工作,已经在openGauss5.5.0.0企业版x86单机安装(文章链接:https://www.modb.pro/db/1683405047395344384)描述过,此处就不在累赘。主备步骤如下:

1.CPU架构是X86,操作系统是Centos7.6。请根据安装操作系统下载对应数据库安装包。

2.关闭防火墙和SELINUX

3.关闭RemoveIPC

4.设置时区和时间

5.设置网卡MTU值

6.设置root允许远程登录

7.数据库用户和用户组

8.Core_Pattern设置

9.安装python3.6

安装XML文件说明

[opengauss@test2 dn1]$ cat /opt/software/cm2.xml
<?xml version="1.0" encoding="utf-8"?>
<ROOT>
<CLUSTER>
<PARAM name="clusterName" value="Cluster_CM2" />
<PARAM name="nodeNames" value="test001,test002"/>
<PARAM name="gaussdbAppPath" value="/home/opengauss/app" />
<PARAM name="gaussdbLogPath" value="/var/log/gaussdb_log" />
<PARAM name="tmpMppdbPath" value="/home/opengauss/tmp"/>
<PARAM name="gaussdbToolPath" value="/home/opengauss/om" />
<PARAM name="corePath" value="/data/core"/>
<PARAM name="backIp1s" value="xx.x.xx.xx,xx.x.xx.xx"/>
</CLUSTER>
<DEVICELIST>
<DEVICE sn="test001">
<PARAM name="name" value="test001"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="xx.x.xx.xx"/>
<PARAM name="sshIp1" value="xx.x.xx.xx"/>
<!-- cm主 -->
<PARAM name="cmsNum" value="1"/>
<PARAM name="cmDir" value="/data/openGauss/cm"/>
<PARAM name="cmServerPortBase" value="15300"/>
<PARAM name="cmServerListenIp1" value="xx.x.xx.xx,xx.x.xx.xx"/>
<PARAM name="cmServerHaIp1" value="xx.x.xx.xx,xx.x.xx.xx"/>
<!-- cmServerlevel目前只支持1 -->
<PARAM name="cmServerlevel" value="1"/>
<!-- cms主及所有备的hostname -->
<PARAM name="cmServerRelation" value="test001,test002"/>
<!-- dn -->
<PARAM name="dataNum" value="1"/>
<PARAM name="dataPortBase" value="15400"/>
<PARAM name="dataNode1" value="/data/openGauss/dn1,ps-vbdb-test3,/data/openGauss/dn2"/>
<PARAM name="dataNode1_syncNum" value="0"/>
</DEVICE>
<DEVICE sn="test002">
<PARAM name="name" value="test002"/>
<PARAM name="azName" value="AZ1"/>
<PARAM name="azPriority" value="1"/>
<PARAM name="backIp1" value="xx.x.xx.xx"/>
<PARAM name="sshIp1" value="xx.x.xx.xx"/>
<!-- cm -->
<PARAM name="cmDir" value="/data/openGauss/cm"/>
<PARAM name="cmServerPortStandby" value="15300"/>
</DEVICE>
</DEVICELIST>
</ROOT>

安装openGauss

root下预安装

./gs_preinstall -U opengauss -G dbgrp -X /opt/software/cm2.xml
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?yes
Please enter password for root
Password:
Password:
Successfully created SSH trust for the root permission user.
Setting host ip env
Successfully set host ip env.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Are you sure you want to create the user[opengauss] and create trust for it (yes/no)? yes
Please enter password for cluster user.
Password:
Please enter password for cluster user again.
Password:
Generate cluster user password files successfully.

Successfully created [opengauss] user on all nodes.
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Creating SSH trust for [opengauss] user.
Please enter password for current user[opengauss].
Password:
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Distributing trust keys file to all node successfully.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Successfully verified SSH trust on all hosts.
Successfully created SSH trust.
Successfully created SSH trust for [opengauss] user.
Checking OS software.
Successfully check os software.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h ps-vbdb-test2,ps-vbdb-test3 --detail".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Setting Core file
Successfully set core path.
Setting pssh path
Successfully set pssh path.
Setting Cgroup.
Successfully set Cgroup.
Set ARM Optimization.
No need to set ARM Optimization.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.

切换到普通用户,安装

gs_install -X /opt/software/cm2.xml
Parsing the configuration file.
Check preinstall on every node.
Successfully checked preinstall on every node.
Creating the backup directory.
Successfully created the backup directory.
begin deploy..
Installing the cluster.
begin prepare Install Cluster..
Checking the installation environment on all nodes.
begin install Cluster..
Installing applications on all nodes.
Successfully installed APP.
begin init Instance..
encrypt cipher and rand files for database.
Please enter password for database:
Please repeat for database:
begin to create CA cert files
The sslcert will be generated in /home/opengauss/app/share/sslcert/om
Create CA files for cm beginning.
Create CA files on directory [/home/opengauss/app_a07d57c3/share/sslcert/cm]. file list: ['cacert.pem', 'server.key', 'server.crt', 'client.key', 'client.crt', 'server.key.cipher', 'server.key.rand', 'client.key.cipher', 'client.key.rand']
Non-dss_ssl_enable, no need to create CA for DSS
Cluster installation is completed.
Configuring.
Deleting instances from all nodes.
Successfully deleted instances from all nodes.
Checking node configuration on all nodes.
Initializing instances on all nodes.
Updating instance configuration on all nodes.
Check consistence of memCheck and coresCheck on database nodes.
Successful check consistence of memCheck and coresCheck on all nodes.
Configuring pg_hba on all nodes.
Configuration is completed.
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state      : Normal
redistributing     : No
node_count         : 2
Datanode State
    primary           : 1
    standby           : 1
    secondary         : 0
    cascade_standby   : 0
    building          : 0
    abnormal          : 0
    down              : 0

Successfully installed application.
end deploy.

查询集群状态

gs_om -t status --detail
[  CMServer State   ]

node             node_ip         instance                          state
--------------------------------------------------------------------------
1  test1 xx.x.xx.xx    1    /data/openGauss/cm/cm_server Primary
2  test2 xx.x.xx.xx    2    /data/openGauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                 state
---------------------------------------------------------------------------
1  test1 xx.x.xx.xx    6001 /data/openGauss/dn1 P Primary Normal
2  test2 xx.x.xx.xx    6002 /data/openGauss/dn2 S Standby Normal

数据库的启动和停止:

[opengauss@test2 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[opengauss@test2 ~]$ gs_om -t start
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
.
Successfully started cluster.
======================================================================
cluster_state      : Normal
redistributing     : No
node_count         : 2
Datanode State
    primary           : 1
    standby           : 1
    secondary         : 0
    cascade_standby   : 0
    building          : 0
    abnormal          : 0
    down              : 0

Successfully started cluster.

主节点上进程信息:

【我和openGauss的故事】openGauss 5.0.0企业版两节点CM高可用实践_ci

备节点上进程信息:

【我和openGauss的故事】openGauss 5.0.0企业版两节点CM高可用实践_redis_02

安装成功后,登录数据操作:

主节点:

【我和openGauss的故事】openGauss 5.0.0企业版两节点CM高可用实践_ci_03

备节点:

【我和openGauss的故事】openGauss 5.0.0企业版两节点CM高可用实践_redis_04

主备切换操作

原主切换前集群信息:

[opengauss@test1 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
opengau+ 154168  0.0  0.0  21956   832 ?        Ss   Aug03   0:00 ssh-agent -a /home/opengauss/gaussdb_tmp/gauss_socket_tmp
opengau+ 166310  0.6  0.0  41724  8784 ?        S    00:00   8:48 /home/opengauss/app/bin/om_monitor -L /var/log/gaussdb_log/opengauss/cm/om_monitor
opengau+ 168867 13.5  0.1 1509496 26928 ?       Sl   00:00 174:54 /home/opengauss/app/bin/cm_agent
opengau+ 168885 12.5  2.8 6652180 471124 ?      Sl   00:00 162:34 /home/opengauss/app/bin/cm_server
opengau+ 168905  0.0  0.2 1409964 41324 ?       Sl   00:00   0:00 gaussdb fenced UDF master process
opengau+ 169254  5.1  7.6 7782296 1257508 ?     Ssl  00:00  66:18 /home/opengauss/app/bin/gaussdb -D /data/openGauss/dn1 -M standbygs_om -t status --detail
[  CMServer State   ]

node             node_ip         instance                          state
--------------------------------------------------------------------------
1  test1 xx.x.xx.xx    1    /data/openGauss/cm/cm_server Primary
2  test2 xx.x.xx.xx    2    /data/openGauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : Yes
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                 state
---------------------------------------------------------------------------
1  test1 xx.x.xx.xx    6001 /data/openGauss/dn1 P Primary Normal                  ##主节点显示P
2  test2 xx.x.xx.xx    6002 /data/openGauss/dn2 S Standby Normal

切换成功后,原主变成备节点

gs_om -t status --detail
[  CMServer State   ]

node             node_ip         instance                          state
--------------------------------------------------------------------------
1  test1 xx.x.xx.xx    1    /data/openGauss/cm/cm_server Primary
2  test2 xx.x.xx.xx    2    /data/openGauss/cm/cm_server Standby

[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
balanced        : No
current_az      : AZ_ALL

[  Datanode State   ]

node             node_ip         instance                 state
---------------------------------------------------------------------------
1  test1 xx.x.xx.xx    6001 /data/openGauss/dn1 P Standby Normal
2  test2 xx.x.xx.xx    6002 /data/openGauss/dn2 S Primary Normal[opengauss@test1 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
opengau+ 154168  0.0  0.0  21956   832 ?        Ss   Aug03   0:00 ssh-agent -a /home/opengauss/gaussdb_tmp/gauss_socket_tmp
opengau+ 166310  0.6  0.0  41724  8784 ?        S    00:00   9:10 /home/opengauss/app/bin/om_monitor -L /var/log/gaussdb_log/opengauss/cm/om_monitor
opengau+ 181143  0.0  0.0 115544  2056 pts/1    S    21:33   0:00 -bash
opengau+ 212240 13.6  0.1 1443956 26628 ?       Sl   22:21   0:47 /home/opengauss/app/bin/cm_agent
opengau+ 212259 12.9  2.5 6391332 416212 ?      Sl   22:21   0:44 /home/opengauss/app/bin/cm_server
opengau+ 212271  7.4  7.6 7730032 1251812 ?     Sl   22:21   0:25 /home/opengauss/app/bin/gaussdb -D /data/openGauss/dn1 -M pending
opengau+ 212278  0.0  0.2 1409968 41272 ?       Sl   22:21   0:00 gaussdb fenced UDF master process
opengau+ 216922  0.0  0.0 155460  1864 pts/1    R+   22:27   0:00 ps ux
[opengauss@test1 ~]$ gsql -d postgres  -p 15400 -r
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# insert into test values(1);
ERROR:  cannot execute INSERT in a read-only transaction
openGauss=# select * from test;
 a
---
 1
 1
(2 rows)

openGauss=# \q
[opengauss@ps-vbdb-test2 ~]$

原备节点升主成功:

gs_ctl switchover -D /data/openGauss/dn2
[2023-08-04 22:26:43.517][171430][][gs_ctl]: gs_ctl switchover ,datadir is /data/openGauss/dn2
[2023-08-04 22:26:43.517][171430][][gs_ctl]: switchover term (1)
[2023-08-04 22:26:43.525][171430][][gs_ctl]: waiting for server to switchover........
[2023-08-04 22:26:48.567][171430][][gs_ctl]: done
[2023-08-04 22:26:48.567][171430][][gs_ctl]: switchover completed (/data/openGauss/dn2)
[opengauss@test2 dn2]$ gs_ctl status --detail
gs_ctl: unrecognized option '--detail'
Try "gs_ctl --help" for more information.
[opengauss@test2 dn2]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
opengau+  46514  0.0  0.0  72472   964 ?        Ss   Aug03   0:00 ssh-agent -a /home/opengauss/gaussdb_tmp/gauss_socket_tmp
opengau+  46590  0.0  0.0  72472   776 ?        Ss   Aug03   0:00 ssh-agent -s
opengau+  52674  0.0  0.0 115544  2084 pts/0    S    Aug03   0:00 -bash
opengau+  54665  0.7  0.0  41728  8796 ?        S    00:00  10:31 /home/opengauss/app/bin/om_monitor -L /var/log/gaussdb_log/opengauss/cm/om_monitor
opengau+ 167866 13.8  0.1 1443960 26636 ?       Sl   22:21   0:48 /home/opengauss/app/bin/cm_agent
opengau+ 167884 11.8  2.5 6260128 415892 ?      Sl   22:21   0:41 /home/opengauss/app/bin/cm_server
opengau+ 167897 11.6  7.7 7869340 1265916 ?     Sl   22:21   0:41 /home/opengauss/app/bin/gaussdb -D /data/openGauss/dn2 -M pending
opengau+ 167904  0.0  0.2 1409968 41244 ?       Sl   22:21   0:00 gaussdb fenced UDF master process
opengau+ 171967  0.0  0.0 155460  1860 pts/0    R+   22:27   0:00 ps ux
[opengauss@test2 dn2]$ gsql -d postgres  -p 15400 -r
gsql ((openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# \d+
                                        List of relations
 Schema | Name | Type  |   Owner   |    Size    |             Storage              | Description
--------+------+-------+-----------+------------+----------------------------------+-------------
 public | test | table | opengauss | 8192 bytes | {orientatinotallow=row,compressinotallow=no} |
(1 row)

openGauss=# insert into test values(1);
INSERT 0 1
openGauss=#

切换成功后,执行gs_om -t refreshconf保存主备机器信息:

gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.

强行stop备节点,CM会自动拉起:

gs_ctl stop -D /data/openGauss/dn2
[2023-08-04 22:57:36.610][197800][][gs_ctl]: gs_ctl stopped ,datadir is /data/openGauss/dn2
waiting for server to shut down.... done
server stopped
[opengauss@test2 dn2]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
opengau+  46514  0.0  0.0  72472   964 ?        Ss   Aug03   0:00 ssh-agent -a /home/opengauss/gaussdb_tmp/gauss_socket_tmp
opengau+  46590  0.0  0.0  72472   776 ?        Ss   Aug03   0:00 ssh-agent -s
opengau+  52674  0.0  0.0 115544  2116 pts/0    S    Aug03   0:00 -bash
opengau+  54665  0.7  0.0  41728  8796 ?        S    00:00  10:45 /home/opengauss/app/bin/om_monitor -L /var/log/gaussdb_log/opengauss/cm/om_monitor
opengau+ 193209 13.8  0.1 1443956 26632 ?       Sl   22:54   0:23 /home/opengauss/app/bin/cm_agent
opengau+ 193227 11.9  2.5 6325664 415728 ?      Sl   22:54   0:20 /home/opengauss/app/bin/cm_server
opengau+ 193247  0.0  0.2 1409968 41264 ?       Sl   22:54   0:00 gaussdb fenced UDF master process
opengau+ 197815  0.0  0.2 1344972 33648 ?       Sl   22:57   0:00 /home/opengauss/app/bin/gaussdb -D /data/openGauss/dn2 -M pending
opengau+ 197826  0.0  0.0 1196260 15560 ?       R    22:57   0:00 /home/opengauss/app/bin/gaussdb -V
opengau+ 197827  0.0  0.0 155460  1860 pts/0    R+   22:57   0:00 ps ux

总结

通过本次实验验证了解了openGauss两节点CM集群切换操作。进一步熟悉高可用特性。