InfoScale™ 9.0 Disaster Recovery Implementation Guide - Linux

Last Published:
Product(s): InfoScale & Storage Foundation (9.0)
Platform: Linux
  1. Section I. Introducing Storage Foundation and High Availability Solutions for disaster recovery
    1. About supported disaster recovery scenarios
      1.  
        About disaster recovery scenarios
      2. About campus cluster configuration
        1.  
          VCS campus cluster requirements
        2.  
          How VCS campus clusters work
        3.  
          Typical VCS campus cluster setup
      3. About replicated data clusters
        1.  
          How VCS replicated data clusters work
      4. About global clusters
        1.  
          How VCS global clusters work
        2.  
          User privileges for cross-cluster operations
        3. VCS global clusters: The building blocks
          1.  
            Visualization of remote cluster objects
          2.  
            About global service groups
          3. About global cluster management
            1.  
              About the wide-area connector process
            2.  
              About the wide-area heartbeat agent
            3.  
              Sample configuration for the wide-area heartbeat agent
          4. About serialization - The Authority attribute
            1.  
              About the Authority and AutoStart attributes
          5.  
            About resiliency and "Right of way"
          6.  
            VCS agents to manage wide-area failover
          7.  
            About the Steward process: Split-brain in two-cluster global clusters
          8.  
            Secure communication in global clusters
      5.  
        Disaster recovery feature support for components in the Veritas InfoScale product suite
      6.  
        Virtualization support for InfoScale 9.0 products in replicated environments
    2. Planning for disaster recovery
      1. Planning for cluster configurations
        1.  
          Planning a campus cluster setup
        2.  
          Planning a replicated data cluster setup
        3.  
          Planning a global cluster setup
      2. Planning for data replication
        1.  
          Data replication options
        2.  
          Data replication considerations
  2. Section II. Implementing campus clusters
    1. Setting up campus clusters for VCS and SFHA
      1. About setting up a campus cluster configuration
        1.  
          Preparing to set up a campus cluster configuration
        2.  
          Configuring I/O fencing to prevent data corruption
        3.  
          Configuring VxVM disk groups for campus cluster configuration
        4.  
          Configuring VCS service group for campus clusters
        5.  
          Setting up campus clusters for VxVM and VCS using Veritas InfoScale Operations Manager
      2.  
        Fire drill in campus clusters
      3.  
        About the DiskGroupSnap agent
      4. About running a fire drill in a campus cluster
        1.  
          Configuring the fire drill service group
        2.  
          Running a successful fire drill in a campus cluster
    2. Setting up campus clusters for SFCFSHA, SFRAC
      1.  
        About setting up a campus cluster for disaster recovery for SFCFSHA or SF Oracle RAC
      2.  
        Preparing to set up a campus cluster in a parallel cluster database environment
      3.  
        Configuring I/O fencing to prevent data corruption
      4.  
        Configuring VxVM disk groups for a campus cluster in a parallel cluster database environment
      5.  
        Configuring VCS service groups for a campus cluster for SFCFSHA and SF Oracle RAC
      6.  
        Tuning guidelines for parallel campus clusters
      7.  
        Best practices for a parallel campus cluster
  3. Section III. Implementing replicated data clusters
    1. Configuring a replicated data cluster using VVR
      1. About setting up a replicated data cluster configuration
        1.  
          About typical replicated data cluster configuration
        2.  
          About setting up replication
        3.  
          Configuring the service groups
        4.  
          Configuring the service group dependencies
      2. About migrating a service group
        1.  
          Switching the service group
      3.  
        Fire drill in replicated data clusters
    2. Configuring a replicated data cluster using third-party replication
      1.  
        About setting up a replicated data cluster configuration using third-party replication
      2.  
        About typical replicated data cluster configuration using third-party replication
      3.  
        About setting up third-party replication
      4.  
        Configuring the service groups for third-party replication
      5.  
        Fire drill in replicated data clusters using third-party replication
  4. Section IV. Implementing global clusters
    1. Configuring global clusters for VCS and SFHA
      1.  
        Installing and Configuring Cluster Server
      2. Setting up VVR replication
        1.  
          About configuring VVR replication
        2.  
          Best practices for setting up replication
        3. Creating a Replicated Data Set
          1. Creating a Primary RVG of an RDS
            1.  
              Prerequisites for creating a Primary RVG of an RDS
            2.  
              Example - Creating a Primary RVG containing a data volume
            3.  
              Example - Creating a Primary RVG containing a volume set
          2. Adding a Secondary to an RDS
            1.  
              Best practices for adding a Secondary to an RDS
            2.  
              Prerequisites for adding a Secondary to an RDS
          3. Changing the replication settings for a Secondary
            1. Setting the mode of replication for a Secondary
              1.  
                Example - Setting the mode of replication to asynchronous for an RDS
              2.  
                Example - Setting the mode of replication to synchronous for an RDS
            2.  
              Setting the latency protection for a Secondary
            3.  
              Setting the SRL overflow protection for a Secondary
            4.  
              Setting the network transport protocol for a Secondary
            5. Setting the packet size for a Secondary
              1.  
                Example - Setting the packet size between the Primary and Secondary
            6. Setting the bandwidth limit for a Secondary
              1.  
                Example: Limiting network bandwidth between the Primary and the Secondary
              2.  
                Example: Disabling Bandwidth Throttling between the Primary and the Secondary
              3.  
                Example: Limiting network bandwidth used by VVR when using full synchronization
        4. Synchronizing the Secondary and starting replication
          1. Methods to synchronize the Secondary
            1.  
              Using the network to synchronize the Secondary
            2.  
              Using block-level tape backup to synchronize the Secondary
            3.  
              Moving disks physically to synchronize the Secondary
          2. Using the automatic synchronization feature
            1.  
              Notes on using automatic synchronization
          3.  
            Example for setting up replication using automatic synchronization
          4.  
            About SmartMove for VVR
          5.  
            About thin storage reclamation and VVR
          6.  
            Determining if a thin reclamation array needs reclamation
        5. Starting replication when the data volumes are zero initialized
          1.  
            Example: Starting replication when the data volumes are zero initialized
      3.  
        Setting up third-party replication
      4. Configuring clusters for global cluster setup
        1.  
          Configuring global cluster components at the primary site
        2.  
          Installing and configuring VCS at the secondary site
        3.  
          Securing communication between the wide-area connectors
        4.  
          Configuring remote cluster objects
        5.  
          Configuring additional heartbeat links (optional)
        6.  
          Configuring the Steward process (optional)
      5. Configuring service groups for global cluster setup
        1.  
          Configuring VCS service group for VVR-based replication
        2.  
          Configuring a service group as a global service group
      6.  
        Fire drill in global clusters
    2. Configuring a global cluster with Storage Foundation Cluster File System High Availability, Storage Foundation for Oracle RAC, or Storage Foundation for Sybase CE
      1.  
        About global clusters
      2.  
        About replication for parallel global clusters using Storage Foundation and High Availability (SFHA) Solutions
      3.  
        About setting up a global cluster environment for parallel clusters
      4.  
        Configuring the primary site
      5. Configuring the secondary site
        1.  
          Configuring the Sybase ASE CE cluster on the secondary site
      6.  
        Setting up replication between parallel global cluster sites
      7.  
        Testing a parallel global cluster configuration
    3. Configuring global clusters with VVR and Storage Foundation Cluster File System High Availability, Storage Foundation for Oracle RAC, or Storage Foundation for Sybase CE
      1.  
        About configuring a parallel global cluster using Volume Replicator (VVR) for replication
      2. Setting up replication on the primary site using VVR
        1.  
          Creating the data and SRL volumes on the primary site
        2.  
          Setting up the Replicated Volume Group on the primary site
      3. Setting up replication on the secondary site using VVR
        1.  
          Creating the data and SRL volumes on the secondary site
        2.  
          Editing the /etc/vx/vras/.rdg files
        3.  
          Setting up IP addresses for RLINKs on each cluster
        4.  
          Setting up the disk group on secondary site for replication
      4.  
        Starting replication of the primary site database volume to the secondary site using VVR
      5. Configuring Cluster Server to replicate the database volume using VVR
        1.  
          Modifying the Cluster Server (VCS) configuration on the primary site
        2.  
          Modifying the VCS configuration on the secondary site
        3.  
          Configuring the Sybase ASE CE cluster on the secondary site
      6.  
        Replication use cases for global parallel clusters
  5. Section V. Reference
    1. Appendix A. Sample configuration files
      1. Sample Storage Foundation for Oracle RAC configuration files
        1.  
          sfrac02_main.cf file
        2.  
          sfrac07_main.cf and sfrac08_main.cf files
        3.  
          sfrac09_main.cf and sfrac10_main.cf files
        4.  
          sfrac11_main.cf file
        5.  
          sfrac12_main.cf and sfrac13_main.cf files
        6.  
          Sample fire drill service group configuration
      2. About sample main.cf files for Storage Foundation (SF) for Oracle RAC
        1.  
          Sample main.cf for Oracle 10g for CVM/VVR primary site
        2.  
          Sample main.cf for Oracle 10g for CVM/VVR secondary site
      3. About sample main.cf files for Storage Foundation (SF) for Sybase ASE CE
        1.  
          Sample main.cf for a basic Sybase ASE CE cluster configuration under VCS control with shared mount point on CFS for Sybase binary installation
        2.  
          Sample main.cf for a basic Sybase ASE CE cluster configuration with local mount point on VxFS for Sybase binary installation
        3.  
          Sample main.cf for a primary CVM VVR site
        4.  
          Sample main.cf for a secondary CVM VVR site

Replication use cases for global parallel clusters

For information on the VCS commands for global clusters:

See the Cluster Server Administrator's Guide.

If you have two clusters configured to use VVR for replication, the following replication use cases are supported:

Table: Replication use cases for global parallel clusters

Management option

Description

Migration of the role of the primary site to the remote site

Migration is a planned transfer of the role of primary replication host from one cluster to a remote cluster. This transfer enables the application on the remote cluster to actively use the replicated data. The former primary cluster becomes free for maintenance or other activity.

Takeover of the primary site role by the secondary site

Takeover occurs when an unplanned event (such as a disaster) causes a failure, making it necessary for the applications using the replicated data to be brought online on the remote cluster.

Migrate the role of primary site to the secondary site

See “To migrate the role of primary site to the remote site”.

Migrate the role of new primary site back to the original primary site

See “To migrate the role of new primary site back to the original primary site”.

Take over after an outage

See “To take over after an outage”.

Resynchronize after an outage

See “To resynchronize after an outage”.

Update the rlink

See “To update the rlink”.

After configuring the replication objects within VCS, you can use VCS commands to migrate the role of the cluster on the primary site to the remote cluster. In the procedure below, VCS takes the replicated database service group, database_grp, offline on the primary site and brings it online on the secondary site; the secondary site now assumes the role of the primary site.

Note:

The hagrp -switch command cannot migrate a parallel group within a cluster or between clusters in a global cluster environment.

To migrate the role of primary site to the remote site

  1. From the primary site, use the following command to take the database service group offline on all nodes.
    # hagrp -offline database_grp -any

    Wait for VCS to take all database service groups offline on the primary site.

  2. Verify that the RLINK between the primary and secondary is up to date. Use the vxrlink -g command with the status option and specify the RLINK for the primary cluster. You can use the command from any node on the primary cluster.

    For example:

    # vxrlink -g data_disk_group status rlk_clus2_dbdata_rvg
    					

    Where rlk_clus1_dbdata_rvg is the RLINK.

  3. On the secondary site, which is now the new primary site, bring the database service group online on all nodes:
    # hagrp -online database_grp -any

After migrating the role of the primary site to the secondary site, you can use VCS commands to migrate the role of the cluster on the new primary site to the original primary site. In the procedure below, VCS takes the replicated database service group, database_grp, offline on the new primary (former secondary) site and brings it online on the original primary site; the original primary site now resumes the role of the primary site.

Note:

The hagrp -switch command cannot migrate a parallel group within a cluster or between clusters in a global cluster environment.

To migrate the role of new primary site back to the original primary site

  1. Make sure that all database resources are online, and switch back the group database_grp to the original primary site.

    Issue the following command on the remote site:

    # hagrp -offline database_grp -any
  2. Verify that the RLINK between the primary and secondary is up to date. Use the vxrlink -g command with the status option and specify the RLINK for the primary cluster. You can use the command from any node on the current primary cluster.

    For example:

    # vxrlink -g data_disk_group status rlk_clus1_dbdata_rvg
    					

    Where rlk_clus1_dbdata_rvg is the RLINK.

  3. Make sure that database_grp is offline on the new primary site. Then, execute the following command on the original primary site to bring the database_grp online:
    # hagrp -online database_grp -any

Takeover occurs when the remote cluster on the secondary site starts the application that uses replicated data. This situation may occur if the secondary site perceives the primary site as dead, or when the primary site becomes inaccessible (perhaps for a known reason). For a detailed description of concepts of taking over the primary role:

See the Veritas InfoScale™ Replication Administrator's Guide.

Before enabling the secondary site to take over the primary role, the administrator on the secondary site must "declare" the type of failure at the remote (primary, in this case) site and designate the failure type using one of the options for the haclus command.

Takeover options are:

Table: Takeover options on global parallel clusters

Takeover option

Description

Disaster

When the cluster on the primary site is inaccessible and appears dead, the administrator declares the failure type as "disaster." For example, fire may destroy a data center, including the primary site and all data in the volumes. After making this declaration, the administrator can bring the service group online on the secondary site, which now has the role as "primary" site.

Outage

When the administrator of a secondary site knows the primary site is inaccessible for a known reason, such as a temporary power outage, the administrator may declare the failure as an "outage." Typically, an administrator expects the primary site to return to its original state.

After the declaration for an outage occurs, the RVGSharedPri agent enables DCM logging while the secondary site maintains the primary replication role. After the original primary site becomes alive and returns to its original state, DCM logging makes it possible to use fast fail back resynchronization when data is resynchronized to the original cluster.

Before attempting to resynchronize the data using the fast fail back option from the current primary site to the original primary site, take the precaution at the original primary site of making a snapshot of the original data. This action provides a valid copy of data at the original primary site for use in the case the current primary site fails before the resynchronization is complete.

Disconnect

When both clusters are functioning properly and the heartbeat link between the clusters fails, a split-brain condition exists. In this case, the administrator can declare the failure as "disconnect," which means no attempt will occur to take over the role of the primary site at the secondary site. This declaration is merely advisory, generating a message in the VCS log indicating the failure results from a network outage rather than a server outage.

Replica

In the rare case where the current primary site becomes inaccessible while data is resynchronized from that site to the original primary site using the fast fail back method, the administrator at the original primary site may resort to using a data snapshot (if it exists) taken before the start of the fast fail back operation. In this case, the failure type is designated as "replica".

The examples illustrate the steps required for an outage takeover and resynchronization.

To take over after an outage

  1. From any node of the secondary site, issue the haclus command:
    # haclus -declare outage -clus clus1
    			
  2. After declaring the state of the remote cluster, bring the database_grp service group online on the secondary site. For example:
    # hagrp -online -force database_grp -any
    			

To resynchronize after an outage

  1. On the original primary site, create a snapshot of the Replicated Volume Group (RVG) before resynchronizing it in case the current primary site fails during the resynchronization. Assuming the disk group is data_disk_group and the RVG is dbdata1_rvg, type:
    # vxrvg -g data_disk_group -F snapshot dbdata_rvg1
    				

    See the Veritas InfoScale™ Replication Administrator's Guide for details on RVG snapshots.

  2. Resynchronize the RVG. From any node of the current primary site, issue the hares command and the -action option with the fbsync action token to resynchronize the RVGSharedPri resource. For example:
    # hares -action dbdata_vvr_shpri fbsync -sys sys3
    # vxdctl -c mode
    			
  3. Perform one of the following commands, depending on whether the resynchronization of data from the current primary site to the original primary site is successful:

    • If the resynchronization of data is successful, use the vxrvg command with the snapback option to reattach the snapshot volumes on the original primary site to the original volumes in the specified RVG:

      # vxrvg -g data_disk_group snapback dbdata_rvg1
    • A failed attempt at the resynchronization of data (for example, a disaster hits the primary RVG when resynchronization is in progress) could generate inconsistent data.

      You can restore the contents of the RVG data volumes from the snapshot taken in step 1:

      # vxrvg -g data_disk_group snaprestore dbdata_rvg1

If the rlink is not up to date, use the hares -action command with the resync action token to synchronize the RVG.

To update the rlink

  • The following command example is issued on any node (sys1, in this case) in the primary cluster, specifying the RVGSharedPri resource, dbdata_vvr_shpri:
    # hares -action dbdata_vvr_shpri resync -sys sys1