Alert Log Errors:  12170 TNS-12535/TNS-00505: Operation Timed Out (文档 ID 1628949.1)

Applies to:

Oracle Net Services - Version 11.2.0.1 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

The following error is reported in the database alert log. 

***Note the "Client address" is posted within the error stack in this case.

Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 11.2.0.3.0 - Production
Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
***Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=XXX.XX.XXX.XXX)(PORT=45679))

  

The PORT field here is the ephemeral port assigned to the client for this connection. 
This does not correspond to the listener port.

 

Changes

No changes are necessary, but may have recently upgraded the database  to 11g release 1 or higher, or installed a new Oracle11g database and  they are now visible in the alert log.
Note: Prior to 11gR1 these  same 'Fatal NI connect error 12170' are written to the sqlnet.log.    This document describes a problem that arises when a firewall exists  between the client and the database server.

Cause

We can search the listener log covering the same time period using this search criteria.

 (HOST=XXX.XX.XXX.XXX)(PORT=45679)

 

The 11g listener log in text format is located here:

$ORACLE_BASE/diag/tnslsnr/<your_host>/<listener_name>/trace/<listener_name>.log

 

Again, this is the client's IP address and the unique ephemeral port assigned to the client for this connection.

In this case, we find that this connection was established at the listener at this timestamp:  


22-FEB-2014 10:42:10 * (CONNECT_DATA=(SID=test)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=XXX.XX.XXX.XXX)(PORT=45679)) * establish * test* 0 .

Compare this to the event in the alert.log with special attention to the timestamp. 
The connection was dropped by the instance at 22-FEB-2014 12:45:09 or roughly 2 hours later.

Time: 22-FEB-2014 12:45:09
Tracing not turned on.
Tns error struct:
ns main err code: 12535

TNS-12535: TNS:operation timed out
ns secondary err code: 12560
nt main err code: 505

TNS-00505: Operation timed out
nt secondary err code: 60
nt OS err code: 0
***Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=XXX.XX.XXX.XXX)(PORT=45679))

The 'nt secondary err code' identifies the underlying network  transport, such as (TCP/IP) timeout limit. In the current case 60  identifies Windows underlying transport layer.

The "nt secondary err code" will be different based on the operating system:

Linux x86 or Linux x86-64: "nt secondary err code: 110"
HP-UX : "nt secondary err code: 238"
AIX: "nt secondary err code: 78"
Solaris: "nt secondary err code: 145"


The alert.log message indicates that a connection was terminated  AFTER it was established to the instance.  In this case, it was  terminated 2 hours and 3 minutes after the listener handed the  connection to the database. 

 This would indicate an issue with a firewall where a maximum idle time setting is in place. 

The connection would not necessarily be "idle".  This issue can arise  during a long running query or when using JDBC Thin connection  pooling.  If there is no data 'on the wire' for lengthy

periods of time for any reason, the firewall might terminate the connection.

 

Solution

The solution would be to discuss increasing the firewall setting for  maximum idle time.  This is a security setting, so it should be done  with the Network or System Admin.

In cases where this is not feasible, Oracle offers the following POTENTIAL suggestion (with an important note following):

The following parameter, set at the **RDBMS_HOME/network/admin/sqlnet.ora, can resolve this kind of problem.
DCD or SQLNET.EXPIRE_TIME can mimic data transmission between the server and the client during long periods of idle time. 

SQLNET.EXPIRE_TIME=n  Where <n> is a non-zero value set in minutes.  

Once this change is in place, there is NO need to restart the  listener or the database.  The change will be in place for all newly  spawned server processes following the change.

Be aware that connections that were established prior to this setting  would not be affected by this change.  Therefore, you may continue to  experience some timeouts until all remote connection are established  with this setting in place.

See the following : Note 257650.1 Resolving Problems with Connection Idle Timeout With Firewall

 

**In an installation that includes GRID, this parameter  should be set in the RDBMS_HOME/network/admin/sqlnet.ora file.  This  would be the default location for sqlnet.ora file parameters referenced  by the instance. 

PLEASE NOTE:
DCD was never designed to be  used as a "virtual traffic generator" as we are wanting to use it for  here. This is merely a useful side-effect of the feature.
In fact,  some later firewalls and updated firewall firmware may not see DCD  packets as a valid traffic possibly because the packets that DCD sends  are actually empty packets. Therefore, DCD may not work as expected and  the firewall / switch may still terminate TCP sockets that are idle for  the period monitored, even when DCD is enabled and working.
In such  cases, the firewall timeout should be increased or users should not  leave the application idle for longer than the idle time out configured  on the firewall.

So please consider your business requirement for allowing  connections to remain or appear 'idle' before implementing these  suggestions and note that this is a workaround which, on some occasions,  may not workaround all the network timeouts.