问题概述
在Goldengate运行中,目标端的TAIL文件可能出现因为IO的问题、迁移的操作问题等导致损坏或丢失,这个时候,goldendate目标端应用进程将无法正常运行。
这是模拟目标端的TAIL文件丢失故障问题的处理CASE!
解决方案
1. 整个CASE处理思路:
1)执行ETROLLOVER,抽取、传送进程前滚一个文件;
2)调整传送、应用进程读文件的位置Seqno;
2. 环境:
GoldenGate为11+版本;
抽取进程:ext_rwq
传送进程:pm_rwq
应用进程:rep_rwq
3. 模拟问题:应用端TAIL文件损坏(丢失)
GGSCI (serv-node2) 8> stop rep_rwq
Sending STOP request to REPLICAT REP_RWQ ...
Request processed.
[oracle@serv-node2 dirdat]$ ls -lt
total 92
-rw-r----- 1 oracle oinstall 2664 Feb 15 23:24 RP000000017
-rw-r----- 1 oracle oinstall 1539 Feb 15 23:24 RP000000016
-rw-r----- 1 oracle oinstall 1539 Feb 15 01:47 RP000000015
-rw-r----- 1 oracle oinstall 2428 Feb 15 01:14 RP000000016.DEL
-rw-r----- 1 oracle oinstall 38128 Feb 14 00:51 RP000000014
-rw-r----- 1 oracle oinstall 36604 Feb 6 16:59 RP000000012.DEL
[oracle@serv-node2 dirdat]$ mv RP000000017 RP000000017.DEL
[oracle@serv-node2 dirdat]$ ls -lt
total 92
-rw-r----- 1 oracle oinstall 2664 Feb 15 23:24 RP000000017.DEL --》改名,模拟丢失
-rw-r----- 1 oracle oinstall 1539 Feb 15 23:24 RP000000016
-rw-r----- 1 oracle oinstall 1539 Feb 15 01:47 RP000000015
-rw-r----- 1 oracle oinstall 2428 Feb 15 01:14 RP000000016.DEL
-rw-r----- 1 oracle oinstall 38128 Feb 14 00:51 RP000000014
-rw-r----- 1 oracle oinstall 36604 Feb 6 16:59 RP000000012.DEL
4. 启动进程,确认问题:
GGSCI (serv-node2) 9> start rep_rwq
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
GGSCI (serv-node2) 12> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT ABENDED REP_RWQ 00:00:00 00:00:22 ---》状态异常,为ABENDED
GGSCI (serv-node2) 13> view report rep_rwq
… …
2023-02-16 00:42:21 ERROR OGG-01091 Unable to open file "/ogg_19.1/dirdat/RP000000017" (error 2, No such file or directory). ---》ERROR:找不到问题,进程异常ABENDED。
***********************************************************************
* ** Run Time Statistics ** *
***********************************************************************
2023-02-16 00:42:21 INFO OGG-02333 Reading /ogg_19.1/dirdat/RP000000017, current RBA 0, 0 records, m_file_seqno = 17, m_file_rba = 2,664.
Report at 2023-02-16 00:42:21 (activity since 2023-02-16 00:42:19)
No records were replicated.
5. 源端操作:重启一下传送进程
GGSCI (serv-node1) 17> stop pm_rwq
Sending STOP request to EXTRACT PM_RWQ ...
Request processed.
GGSCI (serv-node1) 18> start pm_rwq
Sending START request to MANAGER ...
EXTRACT PM_RWQ starting
GGSCI (serv-node1) 25> info pm_rwq detail
EXTRACT PM_RWQ Last Started 2023-02-16 00:44 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:01 ago)
Process ID 10138
Log Read Checkpoint File /ogg_19.1/dirdat/ET000000035
First Record RBA 2271
Target Extract Trails:
Trail Name Seqno RBA Max MB Trail Type
./dirdat/RP 17 1539 500 RMTTRAIL ---》写的文件为RP000000017
6. 目标端操作:重启一下应用进程
GGSCI (serv-node2) 16> start rep_rwq
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
GGSCI (serv-node2) 17> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REP_RWQ 00:00:00 00:00:00
GGSCI (serv-node2) 26> info rep_rwq detail
REPLICAT REP_RWQ Last Started 2023-02-16 00:47 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:07 ago)
Process ID 18500
Log Read Checkpoint File /ogg_19.1/dirdat/RP000000018 ----》读的文件为RP000000018,和传送进程写的文件为RP000000017,不一致,无法进行同步。
First Record RBA 0
Current Log BSN value: (requires database login)
Last Committed Transaction CSN value: (requires database login)
Extract Source Begin End
7. 开始处理问题:
大原则:
1)抽取、传送进程前滚一个文件;
2)调整传送、应用进程读文件的位置;
源端调整:
GGSCI (serv-node1) 26> info ext_rwq detail
EXTRACT EXT_RWQ Last Started 2023-02-14 00:25 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Process ID 1612
Log Read Checkpoint Oracle Redo Logs
2023-02-16 00:51:36 Seqno 306, RBA 14950912
SCN 0.41097717 (41097717)
Target Extract Trails:
Trail Name Seqno RBA Max MB Trail Type
./dirdat/ET 39 2563 50 EXTTRAIL
GGSCI (serv-node1) 28> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT_RWQ 00:00:00 00:00:06
EXTRACT RUNNING PM_RWQ 00:00:00 00:00:05
GGSCI (serv-node1) 29> stop PM_RWQ
Sending STOP request to EXTRACT PM_RWQ ...
Request processed.
GGSCI (serv-node1) 30> stop EXT_RWQ
Sending STOP request to EXTRACT EXT_RWQ ...
Request processed.
--抽取进程前滚一个文件
GGSCI (serv-node1) 31> alter EXTRACT EXT_RWQ etrollover
2023-02-16 00:53:21 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.
EXTRACT altered.
--传送进程前滚一个文件
GGSCI (serv-node1) 34> alter extract pm_rwq etrollover
2023-02-16 00:56:42 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.
EXTRACT altered.
GGSCI (serv-node1) 36> info ext_rwq detail
EXTRACT EXT_RWQ Initialized 2023-02-14 00:25 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:05:36 ago)
Log Read Checkpoint Oracle Redo Logs
2023-02-16 00:51:36 Seqno 306, RBA 14950912
SCN 0.41097717 (41097717)
Target Extract Trails:
Trail Name Seqno RBA Max MB Trail Type
./dirdat/ET 40 0 50 EXTTRAIL ---》当前抽取进程写文件的位置:Seqno =40
调整传送进程读文件的位置,和抽取进程写文件的位置:Seqno =40一致。
GGSCI (serv-node1) 37> alter pm_rwq extseqno 40,extrba 0
EXTRACT altered.
GGSCI (serv-node1) 38> info pm_rwq detail
EXTRACT PM_RWQ Initialized 2023-02-16 00:58 Status STOPPED
Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
Log Read Checkpoint File /ogg_19.1/dirdat/ET000000040
First Record RBA 0
Target Extract Trails:
Trail Name Seqno RBA Max MB Trail Type
./dirdat/RP 18 0 500 RMTTRAIL ---》当前传送进程写文件的位置为:Seqno=18
目标端调整:
调整应用进程读文件的位置,和传送进程写文件的位置:Seqno =18一致。
GGSCI (serv-node2) 32> alter replicat rep_rwq,extseqno 18,extrba 0
2023-02-16 01:01:57 INFO OGG-06594 Replicat REP_RWQ has been altered. Even the start up position might be updated, duplicate suppression remains active in next startup. To override duplicate suppression, start REP_RWQ with NOFILTERDUPTRANSACTIONS option.
REPLICAT altered.
数据库层面,INSERT一行数据,
SQL> insert into jobs values('EEE-1','EEE-1, test 20230215',3000,9000);
1 row created.
然后,启动各个OGG进程:
GGSCI (serv-node1) 39> start ext_rwq
Sending START request to MANAGER ...
EXTRACT EXT_RWQ starting
GGSCI (serv-node1) 40> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT_RWQ 00:00:00 00:00:00
EXTRACT STOPPED PM_RWQ 00:00:00 00:04:32
GGSCI (serv-node1) 41> start pm_rwq
Sending START request to MANAGER ...
EXTRACT PM_RWQ starting
GGSCI (serv-node1) 42> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING EXT_RWQ 00:00:00 00:00:00
EXTRACT RUNNING PM_RWQ 00:00:00 00:04:43
GGSCI (serv-node1) 43> stats pm_rwq,total
Sending STATS request to EXTRACT PM_RWQ ...
Start of Statistics at 2023-02-16 01:03:52.
Output to ./dirdat/RP:
Extracting from RUANWQ.JOBS to RUANWQ.JOBS:
*** Total statistics since 2023-02-16 01:03:37 ***
Total inserts 1.00 ---》确认完成传送
Total updates 0.00
Total deletes 0.00
Total upserts 0.00
Total discards 0.00
Total operations 1.00
End of Statistics.
目标端:
GGSCI (serv-node2) 33> start rep_rwq
Sending START request to MANAGER ...
REPLICAT REP_RWQ starting
GGSCI (serv-node2) 34> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REP_RWQ 00:00:00 00:02:12
GGSCI (serv-node2) 35> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT RUNNING REP_RWQ 00:00:00 00:02:13
GGSCI (serv-node2) 36> stats rep_rwq total
Sending STATS request to REPLICAT REP_RWQ ...
Start of Statistics at 2023-02-16 01:05:26.
Replicating from RUANWQ.JOBS to STOCK_PDB.RUANWQ.JOBS:
*** Total statistics since 2023-02-16 01:04:13 ***
Total inserts 1.00 -----》确认完成应用
Total updates 0.00
Total deletes 0.00
Total upserts 0.00
Total discards 0.00
Total operations 1.00
End of Statistics.
8. 确认数据恢复同步:
最终确认数据库层面,对应的一行数据库应用到目的库!
SQL> select * from jobs;
JOB_ID JOB_TITLE MIN_SALARY MAX_SALARY
---------- ----------------------------------- ---------- ----------
……
EEE-1 EEE-1, test 20230215 3000 9000
……