InfoScale™ 9.0 Cluster Server Agent for Hitachi TrueCopy/HP-XP Continuous Access Configuration Guide - Windows

Last Published:
Product(s): InfoScale & Storage Foundation (9.0)
Platform: Windows
  1. Introducing the agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
    1.  
      About the agent for Hitachi TrueCopy/HP-XP Continuous Access
    2.  
      Supported software
    3.  
      Supported hardware
    4.  
      Typical Hitachi TrueCopy/Hewlett-Packard XP Continuous Access setup in a VCS cluster
    5. Hitachi TrueCopy/Hewlett-Packard XP Continuous Access agent functions
      1.  
        About the Hitachi TrueCopy/Hewlett-Packard XP Continuous Access agent's online function
  2. Configuring the agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
    1. Configuration concepts for the Hitachi TrueCopy/Hewlett-Packard XP Continuous Access agent
      1.  
        Resource type definition for the Hitachi TrueCopy agent
      2. Attribute definitions for the TrueCopy/HP-XP-CA agent
        1. About the SplitTakeover attribute for the Hitachi TrueCopy agent
          1.  
            SplitTakeover attribute = 0
          2.  
            SplitTakeover attribute = 1
        2. About the FreezeSecondaryOnSplit attribute for the Hitachi TrueCopy agent
          1.  
            FreezeSecondaryOnSplit attribute = 0
        3.  
          About the HTC configuration parameters
        4.  
          Special consideration for fence level NEVER
        5.  
          Considerations for calculating the AllowAutoFailoverInterval attribute value
      3.  
        Sample configuration for the TrueCopy/HP-XP-CA agent
    2. Before you configure the agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
      1.  
        About cluster heartbeats
      2.  
        About configuring system zones in replicated data clusters
      3.  
        About preventing split-brain
    3. Configuring the agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
      1.  
        Configuring the agent manually in a global cluster
      2.  
        Configuring the agent manually in a replicated data cluster
  3. Testing VCS disaster recovery support with Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
    1. How VCS recovers from various disasters in an HA/DR setup with Hitachi TrueCopy/Hewlett-Packard XP Continuous Access
      1.  
        Failure scenarios in global clusters
      2.  
        Failure scenarios in replicated data clusters
      3.  
        Replication link / Application failure scenarios
    2.  
      Testing the global service group migration
    3.  
      Testing disaster recovery after host failure
    4.  
      Testing disaster recovery after site failure
    5.  
      Performing failback after a node failure or an application failure
    6.  
      Performing failback after a site failure
  4. Setting up fire drill
    1.  
      About fire drills
    2. About the HTCSnap agent
      1.  
        HTCSnap agent functions
      2.  
        Resource type definition for the HTCSnap agent
      3.  
        Attribute definitions for the HTCSnap agent
      4.  
        About the Snapshot attributes
      5.  
        Sample configuration for a fire drill service group
    3.  
      Additional considerations for running a fire drill
    4.  
      Before you configure the fire drill service group
    5. Configuring the fire drill service group
      1.  
        About the Fire Drill wizard
    6.  
      Verifying a successful fire drill

Failure scenarios in replicated data clusters

The following table lists the failure scenarios in a replicated data cluster configuration, and describes the behavior of VCS and the agent in response to the failure.

Table: Failure scenarios in a replicated data cluster configuration with VCS agent for Hitachi TrueCopy/Hewlett-Packard XP Continuous Access

Failure

Description and VCS response

Application failure

Application cannot start successfully on any hosts at the primary site.

VCS response:

  • Causes the service group at the primary site to fault.

  • Does the following based on the AutoFailOver attribute for the faulted service group:

    • 1 - VCS automatically brings the faulted service group online at the secondary site.

    • 2 - You must bring the service group online at the secondary site.

The agent does the following:

  • Write enables the devices at the secondary site, except when the link is manually suspended with the read-only option.

  • Swaps the P-VOL/S-VOL role of each device in the device group.

  • Restarts replication from P-VOL devices on the secondary site to the S-VOL devices at the primary site.

See Performing failback after a node failure or an application failure.

See Replication link / Application failure scenarios.

Host failure

All hosts at the primary site fail.

VCS response:

  • Causes the service group at the primary site to fault.

  • Does the following based on the AutoFailOver attribute for the faulted service group:

    • 1 - VCS automatically brings the faulted service group online at the secondary site.

    • 2 - You must bring the service group online at the secondary site.

The agent does the following:

  • Write enables the devices at the secondary site, except when the link is manually suspended with the read-only option.

  • Swaps the P-VOL/S-VOL role of each device in the device group.

  • Restarts replication from P-VOL devices on the secondary site to the S-VOL devices at the primary site.

See Performing failback after a node failure or an application failure.

Site failure

All hosts and the storage at the primary site fail.

VCS response:

  • Causes the service group at the primary site to fault.

  • Does the following based on the AutoFailOver attribute for the faulted service group:

    • 1 - VCS automatically brings the faulted service group online at the secondary site.

    • 2 - You must bring the service group online at the secondary site.

Agent response: The agent does the following based on the SplitTakeover attribute of the HTC resource:

  • 1 - The agent issues the horctakeover command to make the HTC devices write-enabled. The HTC devices go into the SSWS (Suspend for Swapping with S-VOL side only) state. If the original primary site is restored, you must execute the pairresync-swaps action on the secondary site to establish reverse replication.

  • 0 - Agent does not perform failover to the secondary site.

See Performing failback after a site failure.

Replication link failure

Replication link between the arrays at the two sites fails.

VCS response: No action.

Agent response: When the replication link is disconnected, the agent does the following based on the LinkMonitor attribute of the HTC resource:

  • 0 - No action.

  • 1 - The agent periodically attempts to resynchronize the S-VOL side using the pairresync command.

  • 2 - The agent periodically attempts to resynchronize the S-VOL side and also sends notifications about the disconnected link. Notifications are sent in the form of either SNMP traps or emails. For information about the VCS NotifierMngr agent, refer to the Cluster Server Bundled Agents Reference Guide.

If the value of the LinkMonitor attribute is not set to 1 or 2, you must manually resynchronize the HTC devices after the link is restored.

To manually resynchronize the HTC devices after the link is restored:

  1. Before you resync the S-VOL device, you must split off the Shadow Image device from the S-VOL device at the secondary site.

  2. You must initiate resync of S-VOL device using the agent's pairresync action.

  3. After P-VOL and S-VOL devices are in sync, reestablish the mirror relationship between the Shadow Copy and the S-VOL devices.

If you initiate a failover to the secondary site when resync is in progress, the online function of the Hitachi TrueCopy/Hewlett-Packard XP Continuous Access agent waits for the resync to complete and then initiates a takeover of the S-VOL devices.

Note:

If you did not configure Shadow Copy devices and if disaster occurs when resync is in progress, then the data at the secondary site becomes inconsistent. Veritas recommends configuring Shadow Copy devices at both the sites.

See Replication link / Application failure scenarios.

Network failure

The LLT and the replication links between the sites fail.

VCS response:

  • VCS at each site concludes that the nodes at the other site have faulted.

  • Does the following based on the AutoFailOver attribute for the faulted service group:

    • 2 - No action. You must confirm the cause of the network failure from the cluster administrator at the remote site and fix the issue.

    • 1 - VCS brings the service group online at the secondary site which leads to a cluster-wide split brain. This causes data divergence between the devices on the arrays at the two sites.

      When the network (LLT and replication) connectivity is restored, VCS takes all the service groups offline on one of the sites and restarts itself. This action eliminates concurrency violation where in the same group is online at both the sites.

      After taking the service group offline, you must manually resynchronize the data.

      Note:

      Veritas recommends that the value of the AutoFailOver attribute is set to 2 for all service groups to prevent unintended failovers due to transient network failures.

To resynchronize the data after the network link is restored:

  1. Take the service groups offline at both the sites.

  2. Manually resynchronize the data.

    Depending on the site whose data you want to retain run the pairresync or the pairresync-swap command.

  3. Bring the service group online on one of the sites.

Agent response: Similar to the site failure.

Storage failure

The array at the primary site fails.

VCS response:

  • Causes the service group at the primary site to fault and displays an alert to indicate the fault.

  • Does the following based on the AutoFailOver attribute for the faulted service group:

    • 1 - VCS automatically brings the faulted service group online at the secondary site.

    • 2 - You must bring the service group online at the secondary site.

Agent response: The agent does the following based on the SplitTakeover attribute of the HTC resource:

  • 1 - The agent issues the horctakeover command to make the HTC devices write-enabled. The S-VOL devices go into the SSWS state.

  • 0 - The agent does not perform failover to the secondary site.