Search <book_title>...

InfoScale™ 9.0 Cluster Server Agent for Hitachi TrueCopy/HP-XP Continuous Access Configuration Guide - Windows

Last Published: 2025-04-13

Product(s): InfoScale & Storage Foundation (9.0)

Platform: Windows

Failure scenarios in global clusters

The following table lists the failure scenarios in a global cluster configuration and describes the behavior of VCS and the agent in response to the failure.

Table: Failure scenarios in a global cluster configuration with the VCS agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access

Failure	Description and VCS response
Application failure	Application cannot start successfully on any hosts at the primary site. VCS response at the secondary site: Causes global service group at the primary site to fault and displays an alert to indicate the fault. Does the following based on the ClusterFailOverPolicy global service group attribute: Auto or Connected - VCS automatically brings the faulted global group online at the secondary site. Manual - No action. You must bring the global group online at the secondary site. Agent response: Write enables the devices at the secondary site, except when the link is manually suspended with the read-only option. Swaps the P-VOL/S-VOL role of each device in the device group. Restarts replication from P-VOL devices on the secondary site to the S-VOL devices at the primary site. See Performing failback after a node failure or an application failure. See Replication link / Application failure scenarios.
Host failure	All hosts at the primary site fail. VCS response at the secondary site: Displays an alert to indicate the primary cluster fault. Does the following based on the ClusterFailOverPolicy global service group attribute: Auto - VCS automatically brings the faulted global group online at the secondary site. Manual or Connected - No action. You must bring the global group online at the secondary site. The agent does the following: Write enables the devices at the secondary site, except when the link is manually suspended with the read-only option. Swaps the P-VOL/S-VOL role of each device in the device group. Restarts replication from P-VOL devices on the secondary site to the S-VOL devices at the primary site. See Performing failback after a node failure or an application failure.
Site failure	All hosts and the storage at the primary site fail. VCS response at the secondary site: Displays an alert to indicate the cluster fault. Does the following based on the ClusterFailOverPolicy global service group attribute: Auto - VCS automatically brings the faulted global group online at the secondary site. Manual or Connected - No action. You must bring the global group online at the secondary site. Agent response: The agent does the following on the secondary site in case of a manual failover based on the value of the SplitTakeover attribute of the HTC resource: 1 - The agent issues the horctakeover command to make the HTC devices write-enabled. The HTC devices go into the SSWS (Suspend for Swapping with S-VOL side only) state. If the original primary site is restored, you must execute the pairresync-swaps action on the secondary site to establish reverse replication. 0 - Agent does not perform failover to the secondary site. See Performing failback after a site failure.
Replication link failure	Replication link between the arrays at the two sites fails. The volume state on the primary site becomes PSUE. VCS response: No action. Agent response: When the replication link is disconnected, the agent does the following based on the value of LinkMonitor attribute of the HTC resource: 0 - No action. 1 - The agent periodically attempts to resynchronize the S-VOL side using the pairresync command. 2 - The agent periodically attempts to resynchronize the S-VOL side and also sends notifications about the disconnected link. Notifications are sent in the form of either SNMP traps or emails. For information about the VCS NotifierMngr agent, refer to the Cluster Server Bundled Agents Reference Guide. If the value of the LinkMonitor attribute is not set to 1 or 2, you must manually resynchronize the HTC devices after the link is restored. To manually resynchronize the HTC devices after the link is restored: Before you resync the S-VOL device, you must split off the Shadow Image device from the S-VOL device at the secondary site. You must initiate resync of the S-VOL device using the agent's pairresync action. After P-VOL and S-VOL devices are in sync, re-establish the mirror relationship between the Shadow Copy and the S-VOL devices. If you initiate a failover to the secondary site when resync is in progress, the online function of the Hitachi TrueCopy/Hewlett-Packard XP Continuous Access agent waits for the resync to complete and then initiates a takeover of the S-VOL devices. Note: If you did not configure Shadow Copy devices and if disaster occurs when resync is in progress, then the data at the secondary site becomes inconsistent. Arctera recommends configuring Shadow Copy devices at both the sites. See Replication link / Application failure scenarios.
Network failure	The network connectivity and the replication link between the sites fail. VCS response at the secondary site: VCS at each site concludes that the remote cluster has faulted. Does the following based on the ClusterFailOverPolicy global service group attribute: Manual or Connected - No action. You must confirm the cause of the network failure from the cluster administrator at the remote site and fix the issue. Auto - VCS brings the global group online at the secondary site which may lead to a site-wide split brain. This causes data divergence between the devices on the primary and the secondary arrays. When the network (WAC and replication) connectivity is restored, you must manually resync the data. Note: Arctera recommends that the value of the ClusterFailOverPolicy attribute is set to Manual for all global groups to prevent unintended failovers due to transient network failures. To resynchronize the data after the network link is restored: Take the global service group offline at both the sites. Manually resynchronize the data. Use the pairresync-swap command to resynchronize from the secondary. Bring the global service group online on the secondary site. Agent response: Similar to the site failure.
Storage failure	The array at the primary site fails. VCS response at the secondary site: Causes the global service group at the primary site to fault and displays an alert to indicate the fault. Does the following based on the ClusterFailOverPolicy global service group attribute: Auto or Connected - VCS automatically brings the faulted global service group online at the secondary site. Manual - No action. You must bring the global group online at the secondary site. Agent response: The agent does the following based on the SplitTakeover attribute of the HTC resource: 1 - The agent issues the horctakeover command to make the HTC devices write-enabled. The S-VOL devices go into the SSWS state. 0 - The agent faults the HTC resource.