Veritas™ Resiliency Platform 2.2 Solutions for VMware
- Section I. Overview of Resiliency Platform
- Overview of Resiliency Platform
- Overview of Resiliency Platform Data Mover
- Overview of recovery to on-premises data center
- Managing assets protected by NetBackup
- Overview of Amazon Web Services
- Overview of vCloud
- Section II. Preparing your environment
- Using array-based replication
- Using Veritas Resiliency Platform Data Mover
- Managing disaster recovery network mapping
- Managing Replication Gateway pairs
- Using array-based replication
- Section III. Working with resiliency groups
- Managing resiliency groups
- Configuring resiliency groups for remote recovery
- Managing virtual machines for remote recovery (DR) using 3rd party replication technology
- Managing virtual machines for remote recovery (DR) using Resiliency Platform Data Mover
- Managing virtual machines for remote recovery (DR) in Amazon Web Services
- Managing resiliency groups
- Section IV. Managing disaster recovery
- Rehearsing DR operations to ensure DR readiness
- Performing disaster recovery operations
- Rehearsing DR operations to ensure DR readiness
- Managing resiliency plans
- Creating a new resiliency plan template
- Monitoring risks, reports, and activities
- Managing evacuation plans
- Appendix A. General troubleshooting
- Resolving the Admin Wait state
- Appendix B. Sample policy and trust relationships for AWS
Predefined risks in Resiliency Platform
Table: Predefined risks lists the predefined risks available in Resiliency Platform. These risks are reflected in the current risk report and the historical risk report.
Table: Predefined risks
Risks | Description | Risk detection time | Risk type | Affected operation | Fix if violated |
---|---|---|---|---|---|
Veritas Infoscale Operations Manager disconnected | Checks for Veritas Infoscale Operations Manager to Resiliency Manager connection state | 1 minute | Error | All operations | Check Veritas Infoscale Operations Manager reachability Try to reconnect Veritas Infoscale Operations Manager |
vCenter Password Incorrect | Checks if vCenter password is incorrect | 5 minutes | Error |
| In case of a password change, resolve the password issue and refresh the vCenter configuration |
VM tools not installed | Checks if VM Tools are not Installed. It may affect IP Customization and VM Shutdown. | Real time, when resiliency group is created | Error |
|
|
Snapshot removed from Virtual Machine | Checks if snapshot has been removed from virtual machine. | 5 minutes | Error | Resiliency Platform Data Mover replication | Edit the resiliency group to refresh configuration |
Snapshot reverted on Virtual Machine | Checks if snapshot has been reverted on virtual machine. | 5 minutes | Error | Resiliency Platform Data Mover replication | Remove and re-add the virtual machine to the Resiliency group by editing Resiliency group |
Data Mover Daemon Crash | Checks if VM Data Mover filter is not able to connect to its counterpart in ESX. | 5 minutes | Error | Resiliency Platform Data Mover replication | In order to continue the replication, you can move (VMotion) the VM to a different ESX node in the cluster and either troubleshoot the issue with this ESX node or raise a support case with Veritas |
Snapshot created on Virtual Machine | Checks if a snapshot has been created on Virtual machine. | 5 minutes | Error | Resiliency Platform Data Mover replication | Edit the resiliency group to refresh configuration |
DataMover virtual machine in noop mode | Checks if VM Data Mover filter is not able to connect to its counterpart in ESX. | 5 minutes | Error | Resiliency Platform Data Mover replication | In order to continue the replication, you can move (VMotion) the VM to a different ESX node in the cluster and either troubleshoot the issue with this ESX node or raise a support case with Veritas |
Resiliency group configuration drift | Checks if disk configuration of any of the assets in the resiliency group has changed. | 30 minutes | Error |
| Edit the resiliency group to first remove the impacted virtual machine from the resiliency group and then add it back to the resiliency group. |
Global user deleted | Checks if there are no global users. In this case, the user will not be able to customize the IP for Windows machines in VMware environment. | Real time | Warning |
| Edit the resiliency group or add a Global user |
Missing heartbeat from Resiliency Manager | Checks for heartbeat failure from a Resiliency Manager. | 5 minutes | Error | All | Fix the Resiliency Manager connectivity issue |
Infrastructure Management Server disconnected | Check for Infrastructure Management Server(IMS) to Resiliency Manager(RM) connection state. | 1 minute | Error | All | Check IMS reachability Try to reconnect IMS |
Storage Discovery Host down | Checks if the discovery daemon is down on the storage discovery host | 15 minutes | Error | Migrate | Resolve the discovery daemon issue |
DNS removed | Checks if DNS is removed from the resiliency group where DNS customization is enabled | real time | Warning |
| Edit the Resiliency Group and disable DNS customization |
IOTap driver not configured | Checks if the IOTap driver is not configured | 2 hours | Error | None | Configure the IOTap driver This risk is removed when the workload is configured for disaster recovery |
VMware Discovery Host Down | Checks if the discovery daemon is down on the VMware Discovery Host | 15 minutes | Error | Migrate | Resolve the discovery daemon issue |
VM restart is pending | Checks if the VM has not been restarted after add host operation | 2 hours | Error | Configure DR | Restart the VM after add host operation |
New VM added to replication storage | Checks if a virtual machine that is added to a Veritas Replication Set on a primary site, is not a part of the resiliency group. | 5 minutes | Error |
| Add the virtual machine to the resiliency group. |
Replication lag exceeding RPO | Checks if the replication lag exceeds the thresholds defined for the resiliency group. This risk affects the SLA for the services running on your production data center. | 5 minutes | Warning |
| Check if the replication lag exceeds the RPO that is defined in the Service Objective |
Replication state broken/critical | Checks if the replication is not working or is in a critical condition for each resiliency group. | 5 minutes | Error |
| Contact the enclosure vendor. |
Remote mount point already mounted | Checks if the mount point is not available for mounting on target site for any of the following reasons:
|
| Warning |
| Unmount the mount point that is already mounted or is being used by other assets. |
Disk utilization critical | Checks if at least 80% of the disk capacity is being utilized. The risk is generated for all the resiliency groups associated with that particular file system. |
| Warning |
| Delete or move some files or uninstall some non-critical applications to free up some disk space. |
ESX not reachable | Checks if the ESX server is in a disconnected state. | 5 minutes | Error |
| Resolve the ESX server connection issue. |
vCenter Server not reachable | Checks if the virtualization server is unreachable or if the password for the virtualization server has changed. | 5 minutes | Error |
| Resolve the virtualization server connection issue. In case of a password change, resolve the password issue. |
Insufficient compute resources on failover target | Checks if there are insufficient CPU resources on failover target in a virtual environment. | 6 hours | Warning |
| Reduce the number of CPUs assigned to the virtual machines on the primary site to match the available CPU resources on failover target. |
Host not added on recovery data center | Checks if the host is not added to the IMS on the recovery data center. | 30 minutes | Error | Migrate | Check the following and fix:
|
NetBackup Notification channel disconnected | Checks for NetBackup Notification channel connection state | 5 minutes | Error | Restore | Check if the NetBackup Notification channel is added to the NetBackup master server. |
Backup image violates the defined RPO | Checks if the backup image violates the defined RPO | 30 minutes | Warning | No operation |
|
NetBackup master server disconnected | Checks if NetBackup master server is disconnected or not reachable | 5 minutes | Error | Restore | Check if IMS is added as an additional server to the NetBackup master server |
Assets do not have copy policy | Checks if the assets do not have a copy policy | 3 hours | Warning | No operation | Set up copy policy and then refresh the NetBackup master server |
Target replication is not configured | Checks if the target replication is not configured | 3 hours | Warning | No operation | Configure target replication and then refresh the NetBackup master server |
Disabled NetBackup Policy | NetBackup policy associated with the virtual machine is disabled | 3 hours | Warning | No operation | Fix the disabled policy |