Problem
When NetBackup servers are separated by a firewall and a TCP socket remains idle for a period of time that exceeds the firewall Idle timeout or inactivity timer, the firewall will break the TCP socket connection for not having any data flow over the socket for the timeout value.
Error Message
From a Cisco PIX firewall the following entry could be seen in the firewall syslog:Jun 2 16:54:07 gwrouter Jun 02 21:54:05 host.domain.example.com: %FWSM-6-302014: Teardown TCP connection 219048017 faddr 192.168.1.89/13724 gaddr 192.168.1.247/42962 laddr 192.168.1.247/42962 duration 1:01:27 bytes 1546 (Conn-timeout)
Cause
Inactive TCP sockets left idle for a period of time that exceeds the firewall idle timeout value
Solution
Once the problem is determined to be an issue with the firewall dropping idle connections, there are a couple of ways to deal with this issue:
• TCP Keepalives: Out of the box, at least with RHEL/CentOS systems, the default keepalive timing kicks in after 2 hours of idle time. Reducing this timing at the OS level to a time frame within the firewall timeout window resolves the problem. The downside is that by reducing this time frame, you are increasing network congestion associated with any other connections idle for the given timeframe. It's not much, but with a lot of open connections, a lot of equipment, and a lot of services, those little packets turn into something that has to be accounted for.
examples of TCP keepalive settings:
output: ndd -get /dev/tcp tcp_keepalive_interval # Solaris = 7200000 ms
output: ndd -get /dev/tcp tcp_keepalive_interval # HP-UX = 7200000 ms
output: no -a | grep keepintvl # AIX = 150/2 seconds
Windows Registry Information:
Value Name: KeepAliveTime
Key: Tcpip\Parameters
Value Type: REG_DWORD time in milliseconds
Valid Range: 1-0xFFFFFFFF
Default: 7,200,000 (two hours)
Recommendation: 300,000
Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application.
• Increase Firewall Timeouts: A possible resolution is to increase the firewall timeouts to be greater than the 2 hour keepalive timer that the operating system has in place by default. This saves the trouble of reconfiguring existing equipment, but it also means that the live-session counter could be in for an exponential increase depending on how many idle sessions the firewall is actively killing on a day to day basis.
• Arbitrary Traffic Generation: This is the "duct tape and hammer" solution. It works, it's not pretty, and it's an abuse of network resources. Generating traffic on a connection for the sole purposes of preventing the firewall from killing the connection works, but it's also generating traffic that serves no real purpose.
Applies To
NetBackup servers separated by firewalls