Best practices for minimizing bind conflicts for well-known service ports including ephemeral port range.

記事: 100040677
最終公開日: 2021-02-26
評価: 1 5
製品: NetBackup & Alta Data Protection

Problem

NetBackup processes are not able to connect to the legacy services, such as bpcd, bprd, bpdbm, bpjobd, etc. Typically the connection fails with status 25.

A check of the destination host shows that the service is running, but some other process is using the IANA registered well-known TCP port for the service, preventing the service from listening on the port and completing startup processing.

This problem can also affect legacy services that are similarly accessible via PBX, but that still bind to a port number that is no longer registered with IANA for local connections; e.g. acssel, acsssi, tldcd, vmd.

 

Error Message

The connecting process will typically fail with status 25, e.g.

$ bplist -l -R /
EXIT STATUS 25: cannot connect on socket

The debug logs for the associated process will show entries similar to this.

08:57:26.129 [31036] <16> do_request: Can't connect to host nbmaster: cannot connect on socket (25)

The legacy service debug log shows that, upon startup, it cannot bind to the IANA reserved well-known; in this case bprd port 13720. In this example, it was able to bind after several minutes of retrying.

19:18:47.946 [28221.28221] <2> legacy_listen: bind(13720) failed: 98
19:18:48.946 [28221.28221] <2> legacy_listen: bind(13720) failed: 98

...snipped repeated bind failures...
20:00:34.524 [28221.28221] <2> legacy_listen: bind(13720) failed: 98
20:00:35.524 [28221.28221] <2> legacy_listen: bind(13720) failed: 98
20:00:36.524 [28221.28221] <2> daemon_select_and_accept_ex: 4 re-listening for legacy service bprd
20:00:38.763 [28221.28221] <2> bprd: socket fd from accept() is 9

The output from netstat -na captured at the same time shows that another process is using the well-known port as the source port for a connection to some other service.

In this example, the host is Linux so netstat -naop shows the process PID and name. In addition...

  • The connection is local to this host so the service at the other end of the connection is also known.
  • Both client and service are other NetBackup process, but they could have been other applications.
  • This was a transient CORBA connection, which closed after a few minutes, allowing bprd to then bind port 13720 and finish startup. If this connection had stayed ESTABLISHED indefinitely, bprd could not have started.

$ netstat -naop | grep 13720
tcp        0      0 192.168.0.111:13720 192.168.0.111:1556 ESTABLISHED 28507/nbrmms
tcp        0      0 192.168.0.111:1556 192.168.0.111:13720 ESTABLISHED 28064/nbevtmgr

 

Cause

TCP ports are both used by services (to listen upon) and by clients (to connect from). The service ports are generally well-known ports registered with IANA, NetBackup has registered several. The client or source ports are selected by the connecting process, and typically provided randomly by the operating system from the available range of ports. The operating system typically selects these ports from the ephemeral port range.

Historically, the ephemeral port range was 1024 to 5000 which did not conflict with the NetBackup service ports which are all in the range 13700 to 13786. This includes Linux 2.2 and prior kernels, and Windows versions prior to 2008.

As network using applications have grown in use, hundreds or thousands of concurrent TCP connections may exist simultaneously, and the range of 3,976 port numbers could be exhausted. To prevent service port conflicts, newer operating system versions typically use either 32768 or 49152 as the lower bound of the range and 61000 or 65535 as the upper bound on the range. Again, these do not conflict with the NetBackup service ports.

But a range of 32,767 ports is sometimes not enough for busy, modern web and application servers so operating systems provide tunables to adjust the ephemeral port range. Some sites may have inadvertently changed the ephemeral port range so that it overlaps with the NetBackup service ports and allows conflicts to occur, leading to failures such as those noted above. This is sometimes done based on the guidance of other vendors; e.g. Oracle sometimes recommends decreasing the lower bound for the range to 9000.

Note: The NetBackup Resilient Network service (nbrntd) and Secure Comm Proxy services (vnetd -proxy), especially when used in combination, can also consume many ephemeral ports if supporting thousands of concurrent connections. Be sure to only configure resilient network for hosts where TCP connections cannot be made reliable by updating driver versions and kernel tuning.

 

Solution

Option A: Process startup sequence

A connecting application is allowed to bind any port in the ephemeral range, including an unused well-known service port, so the best solution is to start all services before other process start and begin establishing connections. If a connecting application was started first, it can be terminated to release the port and then - after TCP TIME_WAIT has expired for the socket - the service can be started.

But because some services must connect to other services, this may not completely eliminate conflicts at all times. Consider service-A starts and binds port-A. Then service-B starts and connects to service-A, using port-C for the source port. Service-C is then unable to bind port-C at startup.

Option B: Exclude the NetBackup service ports from the ephemeral port range

To minimize the need to carefully sequence the startup of processes, most processes will utilize the ephemeral port range when requesting a source port.

Review the following operating system/platform tunable settings, in Table 1 below. Adjust the setting values upward, so the bottom of the ephemeral port range is at least 13800, if they have been configured lower than the default value and other applications are regularly binding NetBackup service ports and creating conflicts with getting NetBackup services started.

Table 1

 Platform  Command  Default Value
 AIX  /usr/sbin/no -a | grep tcp_ephemeral_low  32768
 HP-UX  /usr/bin/ndd /dev/tcp tcp_smallest_anon_port  49152
 Linux 2.2 kernel  sysctl net.ipv4.ip_local_port_range  1024 4999
 Linux 2.4 kernel  sysctl net.ipv4.ip_local_port_range  32768 60999
 Solaris  /usr/sbin/ndd /dev/tcp tcp_smallest_anon_port
 /usr/sbin/ndd /dev/tcp tcp_largest_anon_port
 32768 65535
 Windows XP*  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort  5000
 Windows 2008-2016  netsh int ipv4 show dynamicport tcp
 netsh int ipv6 show dynamicport tcp
 49152 65535
 49152 65535

* For Windows XP and prior versions, only the upper bound of the ephemeral port range is configurable. It defaults to 5000 but may have been raised as high as 65534. Lower it to 13700 to prevent conflicts with most NetBackup service ports, and ensure PBX is started first. PBX listens on port 1556 and also on another ephemeral port, but does not make other connections. Starting it first avoids having to restrict the ephemeral ports to a range of 1024 - 1555, which is too narrow for most application servers.

Note: If using the ephemeral port range above 13800 does not provide enough ports for connection hungry applications, then also review the number of ports that are in a transient state and see if the ports can be released for reuse in a more timely fashion.

  • Connections that linger in TCP TIME_WAIT longer than 60 seconds can often be minimized by kernel tuning.
  • Connections in TCP CLOSE_WAIT can often be minimized by correcting application behavior.
  • The latter will also minimize the connections in TCP FIN_WAIT state, although kernel tuning may also be of use.

 

Was this content helpful?