主备配置请看上一篇详细说明
主备库切换演练
集群启停顺序
启动:DM01/DM02 数据库→DM01/DM02 守护进程→DM03 监视器
DM01:[dmdba@dm01 ~]$ DmServiceDW01 start
DM02:[dmdba@dm02 ~]$ DmServiceDW02 start
DM01:[dmdba@dm01 ~]$ DmWatcherServiceWatcher start
DM02:[dmdba@dm02 ~]$ DmWatcherServiceWatcher start
DM03:[dmdba@dm03 ~]$ DmMonitorServiceMonitor start
停止:DM03 监听器→DM01/DM02 守护进程→DM01主库→/DM02 备库
DM03:[dmdba@dM03 ~]$ DmMonitorServiceMonitor stop
DM02:[dmdba@dm02 ~]$ DmWatcherServiceWatcher stop
DM01:[dmdba@dm01 ~]$ DmWatcherServiceWatcher stop
DM01:[dmdba@dm01 ~]$ DmServiceDM01 stop
DM02:[dmdba@dm02 ~]$ DmServiceDM02 stop
手动执行切换主备
1 . DM03监视器中输入命令查看可以切换的备库
choose switchover
Can choose one of the following instances to do switchover:
1: DW02
显示DW02数据库可以切换为主库
2.输入命令切换
[dmdba@dm03 bin]$ ./dmmonitor /home/dmdba/dmmonitor/dmmonitor.ini
login
用户名:
密码:
[monitor] 2022-10-14 16:37:15: 登录监视器成功!
switchover GDW1.DW02
[monitor] 2022-10-14 16:37:26: 开始切换实例DW02
[monitor] 2022-10-14 16:37:26: 通知守护进程DW01切换SWITCHOVER状态
[monitor] 2022-10-14 16:37:26: 守护进程(DW01)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-10-14 16:37:27: 切换守护进程DW01为SWITCHOVER状态成功
[monitor] 2022-10-14 16:37:27: 通知守护进程DW02切换SWITCHOVER状态
[monitor] 2022-10-14 16:37:27: 守护进程(DW02)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-10-14 16:37:28: 切换守护进程DW02为SWITCHOVER状态成功
[monitor] 2022-10-14 16:37:28: 实例DW01开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-10-14 16:37:28: 实例DW01执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-10-14 16:37:28: 实例DW02开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-10-14 16:37:29: 实例DW02执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-10-14 16:37:29: 实例DW01开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-10-14 16:37:29: 实例DW01执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-10-14 16:37:29: 实例DW02开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2022-10-14 16:37:30: 实例DW02执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2022-10-14 16:37:30: 实例DW02开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-10-14 16:37:30: 实例DW02执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-10-14 16:37:30: 实例DW01开始执行ALTER DATABASE STANDBY语句
[monitor] 2022-10-14 16:37:30: 实例DW01执行ALTER DATABASE STANDBY语句成功
[monitor] 2022-10-14 16:37:30: 实例DW02开始执行ALTER DATABASE PRIMARY语句
[monitor] 2022-10-14 16:37:31: 实例DW02执行ALTER DATABASE PRIMARY语句成功
[monitor] 2022-10-14 16:37:31: 通知实例DW02修改所有归档状态无效
[monitor] 2022-10-14 16:37:31: 修改所有实例归档为无效状态成功
[monitor] 2022-10-14 16:37:31: 实例DW01开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-10-14 16:37:31: 实例DW01执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-10-14 16:37:31: 实例DW02开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-10-14 16:37:32: 实例DW02执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-10-14 16:37:32: 实例DW01开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-10-14 16:37:33: 实例DW01执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-10-14 16:37:33: 实例DW02开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-10-14 16:37:33: 实例DW02执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-10-14 16:37:33: 通知守护进程DW01切换OPEN状态
[monitor] 2022-10-14 16:37:33: 守护进程(DW01)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-10-14 16:37:34: 切换守护进程DW01为OPEN状态成功
[monitor] 2022-10-14 16:37:34: 通知守护进程DW02切换OPEN状态
[monitor] 2022-10-14 16:37:35: 守护进程(DW02)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-10-14 16:37:36: 切换守护进程DW02为OPEN状态成功
[monitor] 2022-10-14 16:37:36: 通知组(GDW1)的守护进程执行清理操作
[monitor] 2022-10-14 16:37:36: 清理守护进程(DW01)请求成功
2022-10-14 16:37:36
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 45331 FALSE MANUAL FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
10.0.0.32 5436 2022-10-14 16:37:36 GLOBAL VALID OPEN DW02 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
10.0.0.32 5236 OK DW02 OPEN PRIMARY 0 0 REALTIME VALID 6726 48672 6726 48673 NONE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
10.0.0.31 5436 2022-10-14 16:37:36 GLOBAL VALID OPEN DW01 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
10.0.0.31 5236 OK DW01 OPEN STANDBY 0 0 REALTIME INVALID 6723 46225 6723 46225 NONE
DATABASE(DW01) APPLY INFO FROM (DW02), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[6723, 6723, 6723], (RLSN, SLSN, KLSN)[46225, 46225, 46225], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (46225)
#================================================================================#
[monitor] 2022-10-14 16:37:36: 清理守护进程(DW02)请求成功
[monitor] 2022-10-14 16:37:36: 实例DW02切换成功
[monitor] 2022-10-14 16:37:37: 守护进程(DW02)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 16:37:37 RECOVERY OK DW02 OPEN PRIMARY VALID 4 48673 48673
[monitor] 2022-10-14 16:37:45: 守护进程(DW02)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 16:37:45 OPEN OK DW02 OPEN PRIMARY VALID 4 48675 48676
主备自动切换
1.修改watcher.ini配置
##DM01## ##DM02## 都需要修改
[dmdba@dm01 ~]$ cat -n /data/DAMENG/dmwatcher.ini |grep DW_MODE
3 DW_MODE = MANUAL #故障手动切换模式(AUTO自动)
[dmdba@dm01 ~]$ sed -i '3s/MANUAL/AUTO/' /data/DAMENG/dmwatcher.ini
修改dmmonitor.ini配置
##DM03
[dmdba@dm03 bin]$ cat -n /home/dmdba/dmmonitor/dmmonitor.ini |grep MON_DW_CONFIRM
1 MON_DW_CONFIRM = 0 #0为非确认,1为确认
[dmdba@dm03 bin]$ sed -i '1s/0/1' /home/dmdba/dmmonitor/dmmonitor.ini
2.按顺序关闭集群并重启
关闭
##DM03##
[dmdba@dm03 bin]$ ./DmMonitorServiceMonitor stop
Stopping DmMonitorServiceMonitor: [ OK ]
##DM02##
[dmdba@dm02 bin]$ ./DmWatcherServiceWatcher stop
[dmdba@dm02 bin]$ ./DmServiceDW02 stop
##DM01##
[dmdba@dm01 bin]$ ./DmWatcherServiceWatcher stop
[dmdba@dm02 bin]$ ./DmServiceDW01 stop
重新启动
##DM01##
[dmdba@dm01 bin]$ ./DmServiceDW01 start
Starting DmServiceDW01: [ OK ]
[dmdba@dm01 bin]$ ./DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher: [ OK ]
##DM02##
[dmdba@dm02 bin]$ ./DmServiceDW02 start
Starting DmServiceDW02: [ OK ]
[dmdba@dm02 bin]$ ./DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher: [ OK ]
##DM03##
[dmdba@dm03 bin]$ ./DmMonitorServiceMonitor start
Starting DmMonitorServiceMonitor: [ OK ]
配置dmmonitor_normal.ini
dmmonitor 进程在运行中无法查看状态 需要查看状态必须单独启动一个monitor
[dmdba@dm03 ~]$ cat>/home/dmdba/dmmonitor/dmmonitor_normal.ini<<EOF
> MON_DW_CONFIRM = 0 #0为非确认,1为确认
> MON_LOG_PATH = ../log1 #监视器日志文件存放路径
> MON_LOG_INTERVAL = 60 #每隔 60s 定时记录系统信息到日志文件
> MON_LOG_FILE_SIZE = 512 #单个日志大小,单位MB
> MON_LOG_SPACE_LIMIT = 2048 #日志上限,单位MB
>
> [GDW1]
> MON_INST_OGUID = 45331 #组GDW1的唯一OGUID 值
> MON_DW_IP = 10.0.0.31:5436 #IP对应MAL_HOST,PORT对应MAL_DW_PORT
> MON_DW_IP = 10.0.0.32:5436
> EOF
单独启动一个monitor
[dmdba@dm03 bin]$ ./dmmonitor /home/dmdba/dmmonitor/dmmonitor_no.ini
[monitor] 2022-10-14 17:05:20: DMMONITOR[4.0] V8
[monitor] 2022-10-14 17:05:20: DMMONITOR[4.0] IS READY.
[monitor] 2022-10-14 17:05:21: 收到守护进程(DW01)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:05:21 OPEN OK DW01 OPEN PRIMARY VALID 6 53987 53988
[monitor] 2022-10-14 17:05:21: 收到守护进程(DW02)消息
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:05:21 OPEN OK DW02 OPEN STANDBY VALID 6 53986 53986
tip
[monitor] 2022-10-14 17:05:25: [!!! 提示:本监视器不是确认监视器,在故障自动切换模式下如果发生主库故障,本监视器无法执行自动接管 !!!]
模拟主库故障
[dmdba@dm01 bin]$ ps -ef|grep dms
dmdba 19302 1 1 17:02 pts/0 00:00:04 /home/dmdba/dmdbms/bin/dmserver path=/data/DAMENG/dm.ini -noconsole mount
[dmdba@dm01 bin]$ kill -9 19302
监视器打印故障细则
[monitor] 2022-10-14 17:11:34: 实例DW01[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:34 STARTUP ERROR DW01 OPEN PRIMARY VALID 7 56546 56547
[monitor] 2022-10-14 17:11:34: 守护进程(DW01)状态切换 [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:34 STARTUP ERROR DW01 OPEN PRIMARY VALID 7 56546 56547
[monitor] 2022-10-14 17:11:34: [!!! 实例DW01的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DW01执行自动接管 !!!]
[monitor] 2022-10-14 17:11:35: 守护进程(DW02)状态切换 [OPEN-->TAKEOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:35 TAKEOVER OK DW02 OPEN STANDBY VALID 7 56546 56546
[monitor] 2022-10-14 17:11:38: 守护进程(DW02)状态切换 [TAKEOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:38 OPEN OK DW02 OPEN PRIMARY VALID 8 58993 58993
[monitor] 2022-10-14 17:11:56: 实例DW01[STANDBY, MOUNT, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:56 STARTUP OK DW01 MOUNT STANDBY INVALID 7 56547 56547
[monitor] 2022-10-14 17:11:57: 守护进程(DW01)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:57 OPEN OK DW01 OPEN STANDBY INVALID 7 56547 56547
[monitor] 2022-10-14 17:11:57: 守护进程(DW02)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:57 RECOVERY OK DW02 OPEN PRIMARY VALID 8 58999 58999
[monitor] 2022-10-14 17:11:59: 守护进程(DW02)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-10-14 17:11:59 OPEN OK DW02 OPEN PRIMARY VALID 8 59000 59000
自动切换完成
TIP
[monitor] 2022-10-14 17:13:01: [!!! 提示:本监视器不是确认监视器,在故障自动切换模式下如果发生主库故障,本监视器无法执行自动接管 !!!]
[monitor] 2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]不可加入其他实例,守护进程状态:OPEN,Open记录状态:VALID
[monitor] 2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]当前没有命令正在执行
[monitor] 2022-10-14 17:13:01: 实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]运行正常, 守护进程是OPEN状态,守护类型是GLOBAL
[monitor] 2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]可加入实例DW02[PRIMARY, OPEN, ISTAT_SAME:TRUE]
[monitor] 2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]当前没有命令正在执行
[monitor] 2022-10-14 17:13:01: 实例DW01[STANDBY, OPEN, ISTAT_SAME:TRUE]运行正常, 守护进程是OPEN状态,守护类型是GLOBAL
[monitor] 2022-10-14 17:13:01: 组(GDW1)当前活动实例运行正常
[monitor] 2022-10-14 17:13:01: 所有组中的活动实例运行正常!