Ten second connection delays cause status 25 in NetBackup

Article: 100024894
Last Published: 2014-04-09
Ratings: 0 0
Product(s): NetBackup & Alta Data Protection

Problem

NetBackup uses the PBX port for communication between unified processes. To minimize the number of ports required to be open through the firewall, connections to legacy processes also attempt to use PBX before attempting to use the vnetd or daemon ports.  

Certain firewall configurations may lead to a 10 second delay before the PBX connection attempt is aborted and the vnetd or daemon port is attempted.   Operations that require multiple sequential connections may incur a cumulative delay that causes the backup to fail, most notably multiplexed and snapshot client backups.

See the related articles for another common cause of these delays.

 

Error Message

The bpbrm debug log shows each connection that is created.  This multiplex backup begins by receiving the ipc and hash strings for the mpx_ipc socket from bptm.  This information is then inherited by the bpbrm.child when it starts.

 

17:46:37.262 [1660.4296] <2> bpbrm read_media_msg: read from media manager: NCB_DATASOCKET myclient_1307115858 57276;c432cfc40d6d02d80f592f6c03015c49;10;600

17:46:37.262 [1660.4296] <2> bpbrm multiplexed_backup: bpbrm.c.9562: media_port: 0 0 0x00000000

17:46:37.262 [1660.4296] <2> bpbrm multiplexed_backup: bpbrm.c.9563: mpx_ipc: 57276;c432cfc40d6d02d80f592f6c03015c49;10;600

…snip…

17:46:37.262 [1660.4296] <2> bpbrm spawn_brm_child: bpbrm child pid = 5592.5372

 

As the media server continues to get ready to perform the backup, the bpbrm process will query the client several times via bpcd.  Two connections to the client host are required for each bpcd.  Each connection first tries to create a socket to the PBX port.  When not successful, a retry to the vnetd port is attempted and successful.  This is the final pair of connections from bpbrm to bpcd prior to starting bpbkar.  Notice the 10 second delay between the attempts.

 

17:49:54.006 [5592.5372] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:04.006 [5592.5372] <2> vnet_async_connect: ... 0: connect in progress: 2 0x00000002

17:50:04.021 [5592.5372] <2> vnet_vnetd_pbx_c_supported: ... 0: VN_REQUEST_PBX_C_SUPPORTED: 13 0x0000000d

17:50:04.381 [5592.5372] <2> do_vnetd_service: ... 0: remote host supports PBX, but PBX is not running: 0 0x00000000

17:50:04.381 [5592.5372] <2> vnet_vnetd_service_socket: ... 0: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006

17:50:04.381 [5592.5372] <2> vnet_vnetd_service_socket: ... 0: service: bpcd

17:50:04.599 [5592.5372] <2> do_vnetd_service: ... 0: via VNETD: bpcd CONNECT FROM 2.2.2.2.3583 TO 3.3.3.3.13724 fd = 1240

17:50:04.599 [5592.5372] <2> vnet_async_connect: ... 0: connect: async CONNECT FROM 2.2.2.2.3583 TO 3.3.3.3.13724 fd = 1240

17:50:04.599 [5592.5372] <2> logconnections: BPCD CONNECT FROM 2.2.2.2.3583 TO 3.3.3.3.13724 fd = 1240

...snip...

17:50:04.599 [5592.5372] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:14.599 [5592.5372] <2> vnet_async_connect: ... 0: connect in progress: 2 0x00000002

17:50:14.615 [5592.5372] <2> vnet_vnetd_pbx_c_supported: ... 0: VN_REQUEST_PBX_C_SUPPORTED: 13 0x0000000d

17:50:15.099 [5592.5372] <2> do_vnetd_service: ... 0: remote host supports PBX, but PBX is not running: 0 0x00000000

17:50:15.099 [5592.5372] <2> do_vnetd_service: ... 0: connect: VNETD CONNECT FROM 2.2.2.2.3586 TO 3.3.3.3.13724 fd = 1208

17:50:15.099 [5592.5372] <2> vnet_async_connect: ... 0: connect: async CONNECT FROM 2.2.2.2.3586 TO 3.3.3.3.13724 fd = 1208

17:50:15.099 [5592.5372] <2> vnet_vnetd_connect_forward_socket_begin: ... 0: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a

17:50:15.099 [5592.5372] <2> vnet_vnetd_connect_forward_socket_begin: ... 0: ipc_string: 57287

17:50:15.334 [5592.5372] <2> vnet_vnetd_connect_forward_socket_begin: ... 0: hash_str1: f017547cf7f80e43f64be3005adbfebd

 

 

The vnetd debug log on the client host does not (and should not) show any activity at the time of the PBX connection attempt.  It does show the vnetd connection which was then used to start bpcd.  It should show vnetd being started by PBX when the first attempt to create the forwarding socket is made.  Instead there is another gap, followed by the successful connection direct to the vnetd port.

 

17:49:55.486 [3264.3732] <2> vnet_cached_getaddrinfo_and_update: ... 0: found in cache name: ::1

17:49:55.486 [3264.3732] <2> vnet_cached_getaddrinfo_and_update: ... 0: found in cache service: NULL

...delay while the PBX port is attempted...

17:50:03.423 [3940.2948] <2> ProcessRequests: ... 0: msg: VNETD ACCEPT FROM 2.2.2.2.3583 TO 3.3.3.3.13724 fd = 392

...snip...

17:50:03.642 [3940.2948] <2> ProcessRequests: ... 0: msg: Request VN_REQUEST_PBX_C_SUPPORTED(13)

17:50:03.642 [3940.2948] <2> ProcessRequests: ... 0: status: 0 0x00000000

17:50:03.767 [3940.2948] <2> ProcessRequests: ... 0: msg: Request VN_REQUEST_SERVICE_SOCKET(6)

17:50:03.985 [3940.2948] <2> process_service_socket: ... 0: service_name: bpcd

...snip...

17:50:04.001 [3940.2948] <2> ProcessRequests: ... 0: status: 0 0x00000000

...missing expected inbound connection via PBX port...

17:50:14.001 [4544.564] <2> ProcessRequests: ... 0: msg: VNETD ACCEPT FROM 2.2.2.2.3586 TO 3.3.3.3.13724 fd = 392

...snip...

17:50:14.485 [4544.564] <2> ProcessRequests: ... 0: msg: Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)

...snip...

17:50:14.485 [4544.564] <2> vnet_generate_hashes: ... 0: hash_str1: f017547cf7f80e43f64be3005adbfebd

17:50:14.485 [4544.564] <2> vnet_generate_hashes: ... 0: hash_str2: acb68c8d00d3dc190a7f64b3865c25b6

17:50:14.485 [4544.564] <2> process_connect_forward_socket: ... 0: ipc_string: 57287

17:50:14.485 [4544.564] <2> process_connect_forward_socket: ... 0: hash_str1: f017547cf7f80e43f64be3005adbfebd

17:50:14.939 [4544.564] <2> vnet_verify_local_accept: ... 0: localhost IPv4 peer port: 57288 0x0000dfc8

17:50:14.939 [4544.564] <2> vnet_send_network_socket: ... 0: hash_str2: acb68c8d00d3dc190a7f64b3865c25b6

17:50:14.970 [4544.564] <2> ProcessRequests: ... 0: status: 0 0x00000000

 

 

At this point, bpbrm and bpcd are connected via the DATA and NAME sockets.  After confirming the client host name, bpbrm passes the mpx_ipc information to bpcd and bpcd returns an error indication.

 

17:50:15.974 [5592.5372] <2> bpcr_get_hostname_rqst: Server hostname length = 23

17:50:15.974 [5592.5372] <2> bpbrm start_bpcd_stat: DATA_SOCK from bpcr = 1240

17:50:15.974 [5592.5372] <2> bpbrm start_bpcd_stat: NAME_SOCK from bpcr = 1208

17:50:15.974 [5592.5372] <2> bpbrm handle_backup: calling bpcr_get_socket_rqst3.

17:50:15.974 [5592.5372] <2> bpbrm handle_backup: bpbrm.c.14898: media_port: 0 0 0x00000000

17:50:15.974 [5592.5372] <2> bpbrm handle_backup: bpbrm.c.14899: mpx_ipc: 57276;c432cfc40d6d02d80f592f6c03015c49;10;600

17:50:36.114 [5592.5372] <2> bpcr_put_vnetd_forward_socket: bpcd returned failure: 25

17:50:36.114 [5592.5372] <2> bpcr_ipc_vnetd_socket_ipi_rqst: bpcr_put_vnetd_forward_socket failed: 25

17:50:36.114 [5592.5372] <16> bpbrm handle_backup: cannot create data socket, The operation completed successfully.  (0)

 

 

The bpcd debug log on the client confirms that bpcd was started, waited 10 seconds while bpbrm was delayed creating the forward socket, and then connected successfully to the forwarding socket.  Notice that the mpx_ipc socket information was then received without delay because it was already created, but that the socket was not available at the time of the connection attempt.

 

17:50:04.017 [2172.4736] <2> logconnections: BPCD ACCEPT FROM 2.2.2.2.3583 TO 3.3.3.3.13724 fd = 304

...snip...

17:50:04.204 [2172.4736] <2> process_requests: output socket port number = 1

17:50:14.939 [2172.4736] <2> is_pbxable_server: ... 0: service: 57287

17:50:14.939 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:14.939 [2172.4736] <2> vnet_async_connect: ... 0: connect: async CONNECT FROM 127.0.0.1.57288 TO 127.0.0.1.57287 fd = 396

17:50:14.939 [2172.4736] <2> vnet_receive_network_socket: ... 0: hash_str1: f017547cf7f80e43f64be3005adbfebd

17:50:14.939 [2172.4736] <2> vnet_receive_network_socket: ... 0: hash_str2: acb68c8d00d3dc190a7f64b3865c25b6

17:50:14.939 [2172.4736] <2> verify_hashes: ... 0: hash_str1: f017547cf7f80e43f64be3005adbfebd

17:50:14.939 [2172.4736] <2> verify_hashes: ... 0: hash_str2: acb68c8d00d3dc190a7f64b3865c25b6

17:50:14.970 [2172.4736] <2> process_requests: Duplicated vnetd socket on stderr

...snip...

17:50:15.157 [2172.4736] <2> process_requests: BPCD_HOSTNAME_RQST

17:50:15.454 [2172.4736] <2> process_requests: BPCD_GET_STDOUT_HOST_SOCKET_RQST

17:50:15.454 [2172.4736] <2> process_requests: socket port number = 1

17:50:15.454 [2172.4736] <2> is_pbxable_server: ... 0: service: 57276

...snip...

17:50:15.454 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:16.470 [2172.4736] <2> vnet_async_connect: ... 0: getsockopt SO_ERROR returned: 10061 0x0000274d

17:50:17.470 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:18.485 [2172.4736] <2> vnet_async_connect: ... 0: getsockopt SO_ERROR returned: 10061 0x0000274d

17:50:20.485 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:21.485 [2172.4736] <2> vnet_async_connect: ... 0: getsockopt SO_ERROR returned: 10061 0x0000274d

17:50:25.485 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:26.501 [2172.4736] <2> vnet_async_connect: ... 0: getsockopt SO_ERROR returned: 10061 0x0000274d

17:50:34.501 [2172.4736] <2> vnet_async_connect: ... 0: connect in progress: 1 0x00000001

17:50:35.517 [2172.4736] <2> vnet_async_connect: ... 0: getsockopt SO_ERROR returned: 10061 0x0000274d

17:50:35.517 [2172.4736] <2> connect_to_service: ... 0: vnet_async_connect() failed: 18 0x00000012

17:50:35.517 [2172.4736] <2> vnet_connect_to_reserved_service: ... 0: connect_to_service() failed: 18 0x00000012

17:50:35.517 [2172.4736] <2> vnet_connect_to_third_party_service: ... 0: vnet_connect_to_reserved_service() failed: 18 0x00000012

17:50:35.517 [2172.4736] <2> vnet_connect_by_vnetd: ... 0: vnet_connect_to_third_party_service() failed: 18 0x00000012

17:50:35.517 [2172.4736] <2> get_vnetd_forward_socket: vnet_connect_by_vnetd failed: 18

17:50:35.517 [2172.4736] <16> process_requests: get_vnetd_forward_socket failed: 25

 

 

The vnetd debug log on the client host confirms that the mpx_ipc socket timed out during all the delays in creating the other sockets.  It was closed more than 60 seconds before bpcd was ready to use it. 

 

17:46:35.940 [2368.4288] <2> ProcessRequests: ... 0: msg: VNETD ACCEPT FROM 2.2.2.2.3471 TO 3.3.3.3.13724 fd = 392

...snip...

17:46:36.299 [2368.4288] <2> ProcessRequests: ... 0: msg: Request VN_REQUEST_CONNECT_FORWARD_SOCKET(10)

17:46:36.299 [2368.4288] <2> vnet_generate_hashes: ... 0: hash_str1: c432cfc40d6d02d80f592f6c03015c49

17:46:36.299 [2368.4288] <2> vnet_generate_hashes: ... 0: hash_str2: dfa2ad7685b3a5d3706300a8c1b956fe

17:46:36.299 [2368.4288] <2> process_connect_forward_socket: ... 0: ipc_string: 57276

17:48:36.611 [2368.4288] <2> vnet_sock_ready: ... 0: max_time: 120 0x00000078

17:48:36.611 [2368.4288] <2> vnet_sock_ready: ... 0: sock: 356 0x00000164

17:48:36.611 [2368.4288] <2> vnet_sock_ready: ... 0: Function failed: 11 0x0000000b

17:48:36.611 [2368.4288] <2> vnet_accept_from_vnetd: ... 0: status: 0 0x00000000

17:48:36.611 [2368.4288] <2> process_connect_forward_socket: ... 0: status: 11 0x0000000b

17:48:36.611 [2368.4288] <2> ProcessRequests: ... 0: status: 11 0x0000000b

17:48:36.611 [2368.4288] <2> ProcessRequests: ... 0: msg: Request VN_REQUEST_DISCONNECT(1)

Running 'netstat -na' at one second intervals on the mediaserver confirmed that the outbound socket to the PBX port was in a SYN_SENT state for ten seconds.  Running 'netstat -na' on the client host did not show the socket reaching that host so a firewall or other network device was preventing the connection.  A network trace confirmed the TCP SYN was sent from the mediaserver host and retransmitted twice by the TCP stack before bpbrm canceled the connection.
 

Cause

NetBackup server processes will attempt to communicate with client processes using the PBX port (1556) before attempting to utilize the vnetd port (13724) or the bpcd daemon port (13782).  If a firewall is present between the hosts and discards the connection request to PBX without responding, then the server process will wait 10 seconds before attempting to connect to vnetd or the daemon port.

 

Note that this delay will not occur;

·                             If a firewall is not present and PBX is not running on the remote client, because the TCP stack on the remote host will return a TCP RST immediately upon receipt of the TCP SYN for port 1556.

·                             If a firewall is present, and does not permit connection requests to port 1556, and returns a TCP RST to the originator when it blocks connection to the port.

 

Solution

Opening the PBX port (1556) through the firewall - bidirectional - between all NetBackup hosts will avoid the 10 second delay waiting for the initial connection to PBX to timeout.   Having the PBX port open is the long term direction of NetBackup and is required to use many of the new features such as SAN Client, Oracle Cloning, etc.

 

Other possible workarounds include:

 

A) Disabling multiplexing in the backup policy.   Because bptm is started much later in the process flow, there are fewer cumulative delays and the vnetd process is less likely to timeout.

 

B) Extending the vnetd timeout.   While this will prevent this particular failure, it will allow these delays to accumulate, which may have undesirable effects on concurrent processes or later events in this process.   This should only be used if other options are not available.

 

 

Was this content helpful?