主备配置请看上一篇详细说明

主备库切换演练

集群启停顺序
启动:DM01/DM02 数据库→DM01/DM02 守护进程→DM03 监视器
DM01:[dmdba@dm01 ~]$ DmServiceDW01 start
DM02:[dmdba@dm02 ~]$ DmServiceDW02 start
DM01:[dmdba@dm01 ~]$ DmWatcherServiceWatcher start
DM02:[dmdba@dm02 ~]$ DmWatcherServiceWatcher start
DM03:[dmdba@dm03 ~]$ DmMonitorServiceMonitor start

停止:DM03 监听器→DM01/DM02 守护进程→DM01主库→/DM02 备库
DM03:[dmdba@dM03 ~]$ DmMonitorServiceMonitor stop
DM02:[dmdba@dm02 ~]$ DmWatcherServiceWatcher stop
DM01:[dmdba@dm01 ~]$ DmWatcherServiceWatcher stop
DM01:[dmdba@dm01 ~]$ DmServiceDM01 stop
DM02:[dmdba@dm02 ~]$ DmServiceDM02 stop
手动执行切换主备

1 . DM03监视器中输入命令查看可以切换的备库

choose switchover
Can choose one of the following instances to do switchover:
1: DW02
显示DW02数据库可以切换为主库

2.输入命令切换

[dmdba@dm03 bin]$ ./dmmonitor /home/dmdba/dmmonitor/dmmonitor.ini

login
用户名:
密码:
[monitor]         2022-10-14 16:37:15: 登录监视器成功!

switchover GDW1.DW02
[monitor]         2022-10-14 16:37:26: 开始切换实例DW02
[monitor]         2022-10-14 16:37:26: 通知守护进程DW01切换SWITCHOVER状态
[monitor]         2022-10-14 16:37:26: 守护进程(DW01)状态切换 [OPEN-->SWITCHOVER]
[monitor]         2022-10-14 16:37:27: 切换守护进程DW01为SWITCHOVER状态成功
[monitor]         2022-10-14 16:37:27: 通知守护进程DW02切换SWITCHOVER状态
[monitor]         2022-10-14 16:37:27: 守护进程(DW02)状态切换 [OPEN-->SWITCHOVER]
[monitor]         2022-10-14 16:37:28: 切换守护进程DW02为SWITCHOVER状态成功
[monitor]         2022-10-14 16:37:28: 实例DW01开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor]         2022-10-14 16:37:28: 实例DW01执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor]         2022-10-14 16:37:28: 实例DW02开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor]         2022-10-14 16:37:29: 实例DW02执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor]         2022-10-14 16:37:29: 实例DW01开始执行ALTER DATABASE MOUNT语句
[monitor]         2022-10-14 16:37:29: 实例DW01执行ALTER DATABASE MOUNT语句成功
[monitor]         2022-10-14 16:37:29: 实例DW02开始执行SP_APPLY_KEEP_PKG()语句
[monitor]         2022-10-14 16:37:30: 实例DW02执行SP_APPLY_KEEP_PKG()语句成功
[monitor]         2022-10-14 16:37:30: 实例DW02开始执行ALTER DATABASE MOUNT语句
[monitor]         2022-10-14 16:37:30: 实例DW02执行ALTER DATABASE MOUNT语句成功
[monitor]         2022-10-14 16:37:30: 实例DW01开始执行ALTER DATABASE STANDBY语句
[monitor]         2022-10-14 16:37:30: 实例DW01执行ALTER DATABASE STANDBY语句成功
[monitor]         2022-10-14 16:37:30: 实例DW02开始执行ALTER DATABASE PRIMARY语句
[monitor]         2022-10-14 16:37:31: 实例DW02执行ALTER DATABASE PRIMARY语句成功
[monitor]         2022-10-14 16:37:31: 通知实例DW02修改所有归档状态无效
[monitor]         2022-10-14 16:37:31: 修改所有实例归档为无效状态成功
[monitor]         2022-10-14 16:37:31: 实例DW01开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2022-10-14 16:37:31: 实例DW01执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2022-10-14 16:37:31: 实例DW02开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2022-10-14 16:37:32: 实例DW02执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2022-10-14 16:37:32: 实例DW01开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor]         2022-10-14 16:37:33: 实例DW01执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor]         2022-10-14 16:37:33: 实例DW02开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor]         2022-10-14 16:37:33: 实例DW02执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor]         2022-10-14 16:37:33: 通知守护进程DW01切换OPEN状态
[monitor]         2022-10-14 16:37:33: 守护进程(DW01)状态切换 [SWITCHOVER-->OPEN]
[monitor]         2022-10-14 16:37:34: 切换守护进程DW01为OPEN状态成功
[monitor]         2022-10-14 16:37:34: 通知守护进程DW02切换OPEN状态
[monitor]         2022-10-14 16:37:35: 守护进程(DW02)状态切换 [SWITCHOVER-->OPEN]
[monitor]         2022-10-14 16:37:36: 切换守护进程DW02为OPEN状态成功
[monitor]         2022-10-14 16:37:36: 通知组(GDW1)的守护进程执行清理操作
[monitor]         2022-10-14 16:37:36: 清理守护进程(DW01)请求成功
2022-10-14 16:37:36
#================================================================================#
GROUP            OGUID       MON_CONFIRM     MODE            MPP_FLAG
GDW1             45331       FALSE           MANUAL          FALSE


<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT
10.0.0.32           5436         2022-10-14 16:37:36  GLOBAL    VALID     OPEN           DW02             OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG
10.0.0.32           5236       OK        DW02             OPEN        PRIMARY   0          0            REALTIME  VALID    6726            48672           6726            48673           NONE  

<<DATABASE GLOBAL INFO:>>
DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT
10.0.0.31           5436         2022-10-14 16:37:36  GLOBAL    VALID     OPEN           DW01             OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  INVALID

EP INFO:
INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG
10.0.0.31           5236       OK        DW01             OPEN        STANDBY   0          0            REALTIME  INVALID  6723            46225           6723            46225           NONE  

DATABASE(DW01) APPLY INFO FROM (DW02), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[6723, 6723, 6723], (RLSN, SLSN, KLSN)[46225, 46225, 46225], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (46225)


#================================================================================#

[monitor]         2022-10-14 16:37:36: 清理守护进程(DW02)请求成功
[monitor]         2022-10-14 16:37:36: 实例DW02切换成功

[monitor]         2022-10-14 16:37:37: 守护进程(DW02)状态切换 [OPEN-->RECOVERY]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 16:37:37  RECOVERY       OK        DW02             OPEN        PRIMARY   VALID    4        48673           48673

[monitor]         2022-10-14 16:37:45: 守护进程(DW02)状态切换 [RECOVERY-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 16:37:45  OPEN           OK        DW02             OPEN        PRIMARY   VALID    4        48675           48676
主备自动切换

1.修改watcher.ini配置

##DM01## ##DM02## 都需要修改
[dmdba@dm01 ~]$ cat -n /data/DAMENG/dmwatcher.ini |grep DW_MODE
     3    DW_MODE                  = MANUAL  #故障手动切换模式(AUTO自动)
[dmdba@dm01 ~]$ sed -i '3s/MANUAL/AUTO/' /data/DAMENG/dmwatcher.ini

修改dmmonitor.ini配置

##DM03
[dmdba@dm03 bin]$ cat -n /home/dmdba/dmmonitor/dmmonitor.ini |grep MON_DW_CONFIRM
     1  MON_DW_CONFIRM             = 0  #0为非确认,1为确认
[dmdba@dm03 bin]$ sed -i '1s/0/1' /home/dmdba/dmmonitor/dmmonitor.ini

2.按顺序关闭集群并重启
关闭

##DM03## 
[dmdba@dm03 bin]$ ./DmMonitorServiceMonitor stop
Stopping DmMonitorServiceMonitor:                          [ OK ]
##DM02## 
[dmdba@dm02 bin]$ ./DmWatcherServiceWatcher stop
[dmdba@dm02 bin]$ ./DmServiceDW02 stop
##DM01## 
[dmdba@dm01 bin]$ ./DmWatcherServiceWatcher stop
[dmdba@dm02 bin]$ ./DmServiceDW01 stop

重新启动

##DM01## 
[dmdba@dm01 bin]$ ./DmServiceDW01 start
Starting DmServiceDW01:                                    [ OK ]
[dmdba@dm01 bin]$ ./DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher:                          [ OK ]
##DM02## 
[dmdba@dm02 bin]$ ./DmServiceDW02 start
Starting DmServiceDW02:                                    [ OK ]
[dmdba@dm02 bin]$ ./DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher:                          [ OK ]
##DM03## 
[dmdba@dm03 bin]$ ./DmMonitorServiceMonitor start
Starting DmMonitorServiceMonitor:                          [ OK ]
配置dmmonitor_normal.ini

dmmonitor 进程在运行中无法查看状态 需要查看状态必须单独启动一个monitor

[dmdba@dm03 ~]$ cat>/home/dmdba/dmmonitor/dmmonitor_normal.ini<<EOF
> MON_DW_CONFIRM             = 0  #0为非确认,1为确认
> MON_LOG_PATH               = ../log1 #监视器日志文件存放路径
> MON_LOG_INTERVAL           = 60  #每隔 60s 定时记录系统信息到日志文件
> MON_LOG_FILE_SIZE          = 512  #单个日志大小,单位MB
> MON_LOG_SPACE_LIMIT        = 2048  #日志上限,单位MB
>
> [GDW1]
>   MON_INST_OGUID           = 45331  #组GDW1的唯一OGUID 值
>   MON_DW_IP                = 10.0.0.31:5436  #IP对应MAL_HOST,PORT对应MAL_DW_PORT
>   MON_DW_IP                = 10.0.0.32:5436
> EOF

单独启动一个monitor

[dmdba@dm03 bin]$ ./dmmonitor /home/dmdba/dmmonitor/dmmonitor_no.ini
[monitor]         2022-10-14 17:05:20: DMMONITOR[4.0] V8
[monitor]         2022-10-14 17:05:20: DMMONITOR[4.0] IS READY.

[monitor]         2022-10-14 17:05:21: 收到守护进程(DW01)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:05:21  OPEN           OK        DW01             OPEN        PRIMARY   VALID    6        53987           53988

[monitor]         2022-10-14 17:05:21: 收到守护进程(DW02)消息
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:05:21  OPEN           OK        DW02             OPEN        STANDBY   VALID    6        53986           53986

tip
[monitor]         2022-10-14 17:05:25: [!!! 提示:本监视器不是确认监视器,在故障自动切换模式下如果发生主库故障,本监视器无法执行自动接管 !!!]
模拟主库故障
[dmdba@dm01 bin]$ ps -ef|grep dms
dmdba     19302      1  1 17:02 pts/0    00:00:04 /home/dmdba/dmdbms/bin/dmserver path=/data/DAMENG/dm.ini -noconsole mount
[dmdba@dm01 bin]$ kill -9 19302

监视器打印故障细则

[monitor]         2022-10-14 17:11:34: 实例DW01[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:34  STARTUP        ERROR     DW01             OPEN        PRIMARY   VALID    7        56546           56547

[monitor]         2022-10-14 17:11:34: 守护进程(DW01)状态切换 [OPEN-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:34  STARTUP        ERROR     DW01             OPEN        PRIMARY   VALID    7        56546           56547

[monitor]         2022-10-14 17:11:34: [!!! 实例DW01的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DW01执行自动接管 !!!]

[monitor]         2022-10-14 17:11:35: 守护进程(DW02)状态切换 [OPEN-->TAKEOVER]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:35  TAKEOVER       OK        DW02             OPEN        STANDBY   VALID    7        56546           56546

[monitor]         2022-10-14 17:11:38: 守护进程(DW02)状态切换 [TAKEOVER-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:38  OPEN           OK        DW02             OPEN        PRIMARY   VALID    8        58993           58993

[monitor]         2022-10-14 17:11:56: 实例DW01[STANDBY, MOUNT, ISTAT_SAME:TRUE]恢复正常
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:56  STARTUP        OK        DW01             MOUNT       STANDBY   INVALID  7        56547           56547

[monitor]         2022-10-14 17:11:57: 守护进程(DW01)状态切换 [STARTUP-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:57  OPEN           OK        DW01             OPEN        STANDBY   INVALID  7        56547           56547

[monitor]         2022-10-14 17:11:57: 守护进程(DW02)状态切换 [OPEN-->RECOVERY]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:57  RECOVERY       OK        DW02             OPEN        PRIMARY   VALID    8        58999           58999

[monitor]         2022-10-14 17:11:59: 守护进程(DW02)状态切换 [RECOVERY-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN
                  2022-10-14 17:11:59  OPEN           OK        DW02             OPEN        PRIMARY   VALID    8        59000           59000

自动切换完成

TIP
[monitor]         2022-10-14 17:13:01: [!!! 提示:本监视器不是确认监视器,在故障自动切换模式下如果发生主库故障,本监视器无法执行自动接管 !!!]

[monitor]         2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]不可加入其他实例,守护进程状态:OPEN,Open记录状态:VALID
[monitor]         2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]当前没有命令正在执行
[monitor]         2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]运行正常, 守护进程是OPEN状态,守护类型是GLOBAL

[monitor]         2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]可加入实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]
[monitor]         2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]当前没有命令正在执行
[monitor]         2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]运行正常, 守护进程是OPEN状态,守护类型是GLOBAL

[monitor]         2022-10-14 17:13:01: 组(GDW1)当前活动实例运行正常

[monitor]         2022-10-14 17:13:01: 所有组中的活动实例运行正常!