一、目标

本篇博客讲述OGG投递进程端口之间的关系,梳理清楚后,便于问题排查及问题定位。

 

二、OGG相关进程端口

2.1 OGG MGR占用什么端口? 



Mgr参数配置中
PORT 7809
这个就是MGR进程占用的端口
1.目标库OGG MGR进程stop 情况下,源端投递进程abend,提示无法tcp传输!
2.如果OGG部署在RAC上面,并且使用ACFS\NFS等文件系统,可能遇到MGR在某个节点IP启用,与源端投递进程对应的IP不一致!从而也是无法正常投递过去的!


 


2.2 MGR 动态端口干啥用的?



参考学习这篇文章:
GoldenGate 如何在源提取泵和目标服务器/收集器之间分配端口?(文档 ID 965270.1)

Configuration:
An Extract pump will have the RMTHOST parameter which is generally configured simply as:
RMTHOST <target IP>, MGRPORT <target manager port number>
Example: RMTHOST REMOTESERVER, MGRPORT 7809
To control port usage on a machine (the target in this case) generally most sites have the Manager DYNAMICPORTLIST parameter that
restricts OGG into using a range of ports.
Please refer to the reference manual for the RMTHOST,PORT and DYNAMICPORTLIST parameterS for more details.
Example of manager parameters:
PORT 7809
DYNAMICPORTLIST 8000-8010
也就是说,OGG在指定的动态端口列表中,对OGG进行分配端口! 并且通过这个参数限制了OGG能使用的端口范围,及端口数量!!!


 

2.3 OGG是如何分配这些动态端口的?



How does GoldenGate allocates ports between Source Extract pump and Target Server/collector?  
When an Extract pump starts it requests a link to the target node using the port number as specified in RMTHOST.
RMTHOST REMOTESERVER, MGRPORT 7809
The manager process would be running in node REMOTESERVER and listening on port 7809.
The target manager will start a collector and pass to the collector if configured the DYAMICPORTLIST range of ports. If this is

not configured random ports will be used.
The target manager tend goes back to listening on port 7809.
The collector will try to use (calling TCP/IP BIND) each port in sequence until it finds one that works. The collector will
communicate back to the Extract pump and instructs the Extract pump to use this port for communication.
Notes:
You can check which ports are used on the target using the GGSCI > SEND MGR GETPORTINFO.
Prior to Version 11, the manager searched for a port for the server instead of the server locating a useable port.
对于OGG来说,OGG的端口是与OGG的投递进程相关的,投递进程跨主机网络进行传输数据,需要使用程序的端口来进行沟通!
那么对于投递进程参数中的目标OGG端口写谁? 一般是写OGG 目标环境的MGR PORT端口,目标库OGG MGR PORT作为一个监听端口。
动态端口的使用基本是按照顺序依次分配使用,因此端口不够用的情况下,达到阈值后,就报TCP无法传输的情况。


 

2.4 如何诊断OGG端口的问题?



一、案例一,OGG MGR可以认为目标库没有启动,导致源端OGG投递进程投递IP+ OGG MGR PORT在对应主机端口不通!
首选OGG投递进程可用的情况是,目标库OGG MGR 端口在主机层面是存在的,程序MGR进程没启动则端口并没有开启!
OGG Troubleshooting TCP/IP Errors In Open Systems (Doc ID 966227.1)
主机层面确认OGG MGR端口已开启

二、案例二,动态端口列表不足,端口占满了,导致投递进程失败!
有个朋友之前沟通过这个问题,没有截图保留。 这个问题如何排查呢?
OGG GGS Error 150: No Dynamic Ports Available Orphan Ports Server Collector (Doc ID 965356.1)
1)如下方式查询端口使用的情况


# netstat -tunlp|grep 784

如果可用,“lsof”命令可能更易于使用。例如,给定“DYNAMICPORTLIST 7841-7899”,使用:

lsof -ni:7890-7919

SEND MGR GETPORTINFO detail

send mgr childstatus debug

这些方法都是辅助,从侧面得到端口的使用情况;

  2)验证

将一个正常投递到目标主机OGG的投递进程关闭,释放占用的端口后;

之前报错的源端OGG投递进程启动,如果正常启动。 基本上可以确认是端口不足导致的!

另外一个问题,OGG一个投递进程占用一个端口吗?  不是的

 

三、案例三,MOS 截取,端口不通! 源端到目标端MGR端口不通也会导致这个问题!

防火墙 Or 端口没开都会导致这个情况。


$ telnet remote_system_name 7890


 MOS建议网络问题排查很费劲的情况下,直接开启双向的目标端端口就行!省事

 

四、案例四,MOS截取,MGR端口异常

If running, check that MGR is responding to connection requests and commands:

GGSCI (remote_system) 4> send mgr getportinfo detail

Sending GETPORTINFO, request to MANAGER ...

Dynamic Port List

Starting Index 0

Reassign Delay 3 seconds

Entry Port Error Process Assigned Program

----- ----- ----- ---------- ------------------- -------

0 7891 0

1 7892 0

2 7893 0

3 7894 0

4 7895 0

5 7896 0

6 7897 0

7 7898 0

8 7899 0

If the comand times out, kill and restart MGR:

执行MGR端口列表查询,正常的命令执行超时,那么MOS建议Kill mgr,重新open


 


五、其它可能性


If none of the above help identify the problem, contact your network administrator to check firewall settings and make sure the ports are


open both ways.

An often overlooked issue is that any error that kills the server collector process appears as a TCP error to the sending extract. 
If a server does not have write privileges to the trail or if a disk is full, the server dies. A dying server looks like a lost connection to TCP.
The user should always verify the ability to write trails as part of the troubleshooting process. This is particularly applicable for the case:


"The "dynamic" Server/Collector process terminated immediately after starting"

空间满了,没权限啥的也会导致源端投递进程报错。