Cluster Server 7.4.1 Administrator's Guide - Linux
- Section I. Clustering concepts and terminology
- Introducing Cluster Server
- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS
- Types of service groups
- About resource monitoring
- Agent classifications
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
- About cluster topologies
- VCS configuration concepts
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work
- About the VCS user privilege model
- Administering the cluster from the command line
- About administering VCS from the command line
- About installing a VCS license
- Administering LLT
- Starting VCS
- Stopping the VCS engine and related processes
- Logging on to VCS
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups
- Modifying service group attributes
- About administering resources
- Enabling and disabling IMF for agents by using script
- Linking and unlinking resources
- About administering resource types
- About administering clusters
- Configuring applications and resources in VCS
- VCS bundled agents for UNIX
- Configuring NFS service groups
- About NFS
- Configuring NFS service groups
- Sample configurations
- About configuring the RemoteGroup agent
- About configuring Samba service groups
- About testing resource failover by using HA fire drills
- Predicting VCS behavior using VCS Simulator
- Section III. VCS communication and operations
- About communications, membership, and data protection in the cluster
- About cluster communications
- About cluster membership
- About membership arbitration
- About membership arbitration components
- About server-based I/O fencing
- About majority-based fencing
- About the CP server service group
- About secure communication between the VCS cluster and CP server
- About data protection
- Examples of VCS operation with I/O fencing
- About cluster membership and data protection without I/O fencing
- Examples of VCS operation without I/O fencing
- Administering I/O fencing
- About the vxfentsthdw utility
- Testing the coordinator disk group using the -c option of vxfentsthdw
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- About administering the coordination point server
- About configuring a CP server to support IPv6 or dual stack
- About migrating between disk-based and server-based fencing configurations
- Migrating between fencing configurations using response files
- Controlling VCS behavior
- VCS behavior on resource faults
- About controlling VCS behavior at the service group level
- About AdaptiveHA
- Customized behavior diagrams
- About preventing concurrency violation
- VCS behavior for resources that support the intentional offline functionality
- VCS behavior when a service group is restarted
- About controlling VCS behavior at the resource level
- VCS behavior on loss of storage connectivity
- Service group workload management
- Sample configurations depicting workload management
- The role of service group dependencies
- About communications, membership, and data protection in the cluster
- Section IV. Administration - Beyond the basics
- VCS event notification
- VCS event triggers
- Using event triggers
- List of event triggers
- Virtual Business Services
- Section V. Veritas High Availability Configuration wizard
- Introducing the Veritas High Availability Configuration wizard
- Administering application monitoring from the Veritas High Availability view
- Administering application monitoring from the Veritas High Availability view
- Administering application monitoring from the Veritas High Availability view
- Section VI. Cluster configurations for disaster recovery
- Connecting clusters–Creating global clusters
- VCS global clusters: The building blocks
- About global cluster management
- About serialization - The Authority attribute
- Prerequisites for global clusters
- Setting up a global cluster
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Test scenario for a multi-tiered environment
- Administering global clusters from the command line
- About global querying in a global cluster setup
- Administering clusters in global cluster setup
- Setting up replicated data clusters
- Setting up campus clusters
- Connecting clusters–Creating global clusters
- Section VII. Troubleshooting and performance
- VCS performance considerations
- How cluster components affect performance
- How cluster operations affect performance
- VCS performance consideration when a system panics
- About scheduling class and priority configuration
- VCS agent statistics
- About VCS tunable parameters
- Troubleshooting and recovery for VCS
- VCS message logging
- Gathering VCS information for support analysis
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting issues with systemd unit service files
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting sites
- Troubleshooting I/O fencing
- Fencing startup reports preexisting split-brain
- Troubleshooting CP server
- Troubleshooting server-based fencing on the VCS cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting licensing
- Licensing error messages
- Troubleshooting secure configurations
- Troubleshooting wizard-based configuration issues
- Troubleshooting issues with the Veritas High Availability view
- VCS message logging
- VCS performance considerations
- Section VIII. Appendixes
How VCS campus clusters work
This topic describes how VCS works with VxVM to provide high availability in a campus cluster environment.
In a campus cluster setup, VxVM automatically mirrors volumes across sites. To enhance read performance, VxVM reads from the plexes at the local site where the application is running. VxVM writes to plexes at both the sites.
In the event of a storage failure at a site, VxVM detaches all the disks at the failed site from the disk group to maintain data consistency. When the failed storage comes back online, VxVM automatically reattaches the site to the disk group and recovers the plexes.
See the Storage Foundation Cluster File System High Availability Administrator's Guide for more information.
When service group or system faults occur, VCS fails over service groups based on the values you set for the cluster attribute SiteAware and the service group attribute AutoFailOver.
See Cluster attributes.
For campus cluster setup, you must define sites and add systems to the sites that you defined. A system can belong to only one site. Sit e definitions are uniform across VCS, You can define sites Veritas InfoScale Operations Manager, and VxVM. You can define site dependencies to restrict connected applications to fail over within the same site.
You can define sites by using:
Veritas InfoScale Operations Manager
For more information on configuring sites, see the latest version of the Veritas InfoScale Operations Manager User guide.
Depending on the value of the AutoFailOver attribute, VCS failover behavior is as follows:
0 | VCS does not fail over the service group. |
1 | VCS fails over the service group to another suitable node. By default, the AutoFailOver attribute value is set to 1. |
2 | VCS fails over the service group if another suitable node exists in the same site. Otherwise, VCS waits for administrator intervention to initiate the service group failover to a suitable node in the other site. This configuration requires the HA/DR license enabled. Veritas recommends that you set the value of AutoFailOver attribute to 2. |
Sample definition for these service group attributes in the VCS main.cf is as follows:
cluster VCS_CLUS ( PreferredFencingPolicy = Site SiteAware = 1 ) site MTV ( SystemList = { sys1, sys2 } ) site SFO ( Preference = 2 SystemList = { sys3, sys4 } )
The sample configuration for hybrid_group with AutoFailover = 1 and failover_group with AutoFailover = 2 is as following:
hybrid_group ( Parallel = 2 SystemList = { sys1 = 0, sys2 = 1, sys3 = 2, sys4 = 3 } ) failover_group ( AutoFailover = 2 SystemList = { sys1 = 0, sys2 = 1, sys3 = 2, sys4 = 3 } )
Table: Failure scenarios in campus cluster lists the possible failure scenarios and how VCS campus cluster recovers from these failures.
Table: Failure scenarios in campus cluster
Failure | Description and recovery |
---|---|
Node failure |
If the value of the AutoFailOver attribute is set to 0, VCS requires administrator intervention to initiate a fail over in both the cases of node failure. |
Application failure | The behavior is similar to the node failure. |
Storage failure - one or more disks at a site fails | VCS does not fail over the service group when such a storage failure occurs. VxVM detaches the site from the disk group if any volume in that disk group does not have at least one valid plex at the site where the disks failed. VxVM does not detach the site from the disk group in the following cases:
If only some of the disks that failed come online and if the vxrelocd daemon is running, VxVM relocates the remaining failed disks to any available disks. Then, VxVM automatically reattaches the site to the disk group and resynchronizes the plexes to recover the volumes. If all the disks that failed come online, VxVM automatically reattaches the site to the disk group and resynchronizes the plexes to recover the volumes. |
Storage failure - all disks at both sites fail | VCS acts based on the DiskGroup agent's PanicSystemOnDGLoss attribute value. See the Cluster Server Bundled Agents Reference Guide for more information. |
Site failure | All nodes and storage at a site fail. Depending on the value of the AutoFailOver attribute, VCS fails over the service group as follows:
Because the storage at the failed site is inaccessible, VCS imports the disk group in the application service group with all devices at the failed site marked as NODEVICE. When the storage at the failed site comes online, VxVM automatically reattaches the site to the disk group and resynchronizes the plexes to recover the volumes. |
Network failure (LLT interconnect failure) | Nodes at each site lose connectivity to the nodes at the other site The failure of all private interconnects between the nodes can result in split brain scenario and cause data corruption. Review the details on other possible causes of split brain and how I/O fencing protects shared data from corruption. Veritas recommends that you configure I/O fencing to prevent data corruption in campus clusters. When the cluster attribute PreferredFencingPolicy is set as Site, the fencing driver gives preference to the node with higher site priority during the race for coordination points. VCS uses the site-level attribute Preference to determine the node weight. |
Network failure (LLT and storage interconnect failure) | Nodes at each site lose connectivity to the storage and the nodes at the other site Veritas recommends that you configure I/O fencing to prevent split brain and serial split brain conditions.
|