Search <book_title>...

Veritas NetBackup™ CloudPoint Install and Upgrade Guide

Last Published: 2021-09-17

Product(s): NetBackup & Alta Data Protection (9.1.0.1)

Troubleshooting CloudPoint

Refer to the following troubleshooting scenarios:

CloudPoint agent fails to connect to the CloudPoint server if the agent host is restarted abruptly.
This issue may occur if the host where the CloudPoint agent is installed is shut down abruptly. Even after the host restarts successfully, the agent fails to establish a connection with the CloudPoint server and goes into an offline state.
The agent log file contains the following error:
```
flexsnap-agent-onhost[4972] MainThread flexsnap.connectors.rabbitmq:
ERROR - Channel 1 closed unexpectedly: 
(405) RESOURCE_LOCKED - cannot obtain exclusive access to locked queue '
flexsnap-agent.a1f2ac945cd844e393c9876f347bd817' in vhost '/'
```
This issue occurs because the RabbitMQ connection between the agent and the CloudPoint server does not close even in case of an abrupt shutdown of the agent host. The CloudPoint server cannot detect the unavailability of the agent until the agent host misses the heartbeat poll. The RabbitMQ connection remains open until the next heartbeat cycle. If the agent host reboots before the next heartbeat poll is triggered, the agent tries to establish a new connection with the CloudPoint server. However, as the earlier RabbitMQ connection already exists, the new connection attempt fails with a resource locked error.
As a result of this connection failure, the agent goes offline and leads to a failure of all snapshot and restore operations performed on the host.
Workaround:
Restart the Veritas CloudPoint Agent service on the agent host.
- On a Linux hosts, run the following command:
  # sudo systemctl restart flexsnap-agent.service
- On Windows hosts:
  Restart the Veritas CloudPoint™ Agent service from the Windows Services console.
CloudPoint agent registration on Windows hosts may time out or fail.
For protecting applications on Windows, you need to install and then register the CloudPoint agent on the Windows host. The agent registration may sometimes take longer than usual and may either time out or fail.
Workaround:
To resolve this issue, try the following steps:
- Re-register the agent on the Windows host using a fresh token.
- If the registration process fails again, restart the CloudPoint services on the CloudPoint server and then try registering the agent again.
Refer to the following for more information:
See Registering the Windows-based agent.
See Restarting CloudPoint.
Disaster recovery when DR package is lost or passphrase is lost.
This issue may occur if the DR package is lost or the passphrase is lost.
In case of Catalog backup, 2 backup packages are created:
- DR package which contains all the certs
- Catalog package which contains the data base
The DR package contains the NetBackup UUID certs and Catalog DB also has the UUID. When you perform disaster recovery using the DR package followed by catalog recovery, both the UUID cert and the UUID are restored. This allows NetBackup to communicate with CloudPoint since the UUID is not changed.
However if the DR package is lost or the Passphrase is lost the DR operation cannot be completed. You can only recover the catalog without DR package after you reinstall NetBackup. In this case, a new UUID is created for NetBackup which is not recognised by CloudPoint. The one-to-one mapping of NetBackup and CloudPoint is lost.
Workaround:
To resolve this issue, you must update the new NBU UUID and Version Number after NetBackup primary is created.
- The NetBackup administrator must be logged on to the NetBackup Web Management Service to perform this task. Use the following command to log on:
  /usr/openv/netbackup/bin/bpnbat -login -loginType WEB
- Execute the following command on the primary server to get the NBU UUID:
  /usr/openv/netbackup/bin/admincmd/nbhostmgmt -list -host <primary server host name> | grep "Host ID"
- Execute the following command to get the Version Number:
  /usr/openv/netbackup/bin/admincmd/bpgetconfig -g <primary Ssrver host name> -L
After you get the NBU UUID and Version number, execute the following command on the CloudPoint host to update the mapping:
/cloudpoint/scripts/cp_update_nbuuid.sh -i <NBU UUID> -v <Version Number>
The snapshot job is successful but the backup from snapshot job fails with the error "Certificate verification failed" if CloudPoint server's certificate is revoked
In backup from snapshot operations, while taking snapshot NetBackup communicates with CloudPoint server.
In backup operations, communication happens between the datamover container on CloudPoint server and NetBackup media/primary server. Following flags should be used to enforce the revocation status check of certificates of respective servers.
- ECA_CRL_CHECK: By default enabled and validated during backup operation, whereas VIRTUALIZATION_CRL_CHECK is by default disabled and is validated during snapshot and cloud vendor operations.
- VIRTUALIZATION_CRL_CHECK: If this flag is enabled and CloudPoint machines certificate is revoked, then snapshot job fails.
See Configuring security for Azure and Azure Stack .
CloudPoint fails to establish connection using agentless to the Windows cloud instance
Error 1: <Instance_name>: network connection timed out.
Case 1: CloudPoint server log message:
```
WARNING - Cannot connect to the remote host. SMB Connection timeout
 <IP address> <user>

…

flexsnap.OperationFailed: Could not connect to the remote server 
<IP address>
```
Workaround
To resolve this issue, try the following steps:
- Verify if the SMB port 445 is added in the Network security group and is accessible from the CloudPoint server.
- Verify if the SMB port 445 is allowed through cloud instance firewall.
Case 2: CloudPoint Server log message:
```
WARNING - Cannot connect to the remote host. WMI Connection 
timeout <IP address> <user>

…

flexsnap.OperationFailed: Could not connect to the remote 
server <IP address>
```
Workaround:
To resolve this issue, try the following steps:
- Verify and add DCOM port (135) in the Network security group and is accessible from CloudPoint server.
- Verify if the port 135 is allowed through cloud instance firewall.
Case 3: CloudPoint Server log message:
```
Exception while opening SMB connection, [Errno Connection error 
(<IP address>:445)] [Errno 113] No route to host.
```
Workaround: Verify if the cloud instance is up and running or not in inconsistent state.
Case 4: CloudPoint Server log message:
```
Error when closing dcom connection: 'Thread-xxxx'"
```
Where, xxxx is the thread number.
Workaround:
To resolve this issue, try the following steps:
- Verify if the WMI-IN dynamic port range or the fixed port as configured is added in the Network security group.
- Verify and enable WMI-IN port from the cloud instance firewall.
Error 2: <Instance_name>: Could not connect to the virtual machine.
CloudPoint server log message:
```
Error: Cannot connect to the remote host. <IP address> Access denied. 
```
Workaround:
To resolve this issue, try the following steps:
- Verify if the user is having administrative rights.
- Verify if the UAC is disabled for the user.
CloudPoint cloud operations fail on a RHEL system if a firewall is disabled
The CloudPoint operations fail for all the supported cloud plugins on a RHEL system, if a firewall is disabled on that system when the CloudPoint services are running. This is a network configuration issue that prevents the CloudPoint from accessing the cloud provider REST API endpoints.
Workaround
- Stop CloudPoint
  # docker run --rm -it
  -v /var/run/docker.sock:/var/run/docker.sock
  -v /cloudpoint:/cloudpoint veritas/flexsnap-cloudpoint:<version> stop
- Restart Docker
  # systemctl restart docker
- Restart CloudPoint
  # docker run --rm -it
  -v /var/run/docker.sock:/var/run/docker.sock
  -v /cloudpoint:/cloudpoint veritas/flexsnap-cloudpoint:<version> start