Cluster Server 7.4.1 Administrator's Guide - Linux
- Section I. Clustering concepts and terminology
- Introducing Cluster Server
- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS
- Types of service groups
- About resource monitoring
- Agent classifications
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
- About cluster topologies
- VCS configuration concepts
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work
- About the VCS user privilege model
- Administering the cluster from the command line
- About administering VCS from the command line
- About installing a VCS license
- Administering LLT
- Starting VCS
- Stopping the VCS engine and related processes
- Logging on to VCS
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups
- Modifying service group attributes
- About administering resources
- Enabling and disabling IMF for agents by using script
- Linking and unlinking resources
- About administering resource types
- About administering clusters
- Configuring applications and resources in VCS
- VCS bundled agents for UNIX
- Configuring NFS service groups
- About NFS
- Configuring NFS service groups
- Sample configurations
- About configuring the RemoteGroup agent
- About configuring Samba service groups
- About testing resource failover by using HA fire drills
- Predicting VCS behavior using VCS Simulator
- Section III. VCS communication and operations
- About communications, membership, and data protection in the cluster
- About cluster communications
- About cluster membership
- About membership arbitration
- About membership arbitration components
- About server-based I/O fencing
- About majority-based fencing
- About the CP server service group
- About secure communication between the VCS cluster and CP server
- About data protection
- Examples of VCS operation with I/O fencing
- About cluster membership and data protection without I/O fencing
- Examples of VCS operation without I/O fencing
- Administering I/O fencing
- About the vxfentsthdw utility
- Testing the coordinator disk group using the -c option of vxfentsthdw
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- About administering the coordination point server
- About configuring a CP server to support IPv6 or dual stack
- About migrating between disk-based and server-based fencing configurations
- Migrating between fencing configurations using response files
- Controlling VCS behavior
- VCS behavior on resource faults
- About controlling VCS behavior at the service group level
- About AdaptiveHA
- Customized behavior diagrams
- About preventing concurrency violation
- VCS behavior for resources that support the intentional offline functionality
- VCS behavior when a service group is restarted
- About controlling VCS behavior at the resource level
- VCS behavior on loss of storage connectivity
- Service group workload management
- Sample configurations depicting workload management
- The role of service group dependencies
- About communications, membership, and data protection in the cluster
- Section IV. Administration - Beyond the basics
- VCS event notification
- VCS event triggers
- Using event triggers
- List of event triggers
- Virtual Business Services
- Section V. Veritas High Availability Configuration wizard
- Introducing the Veritas High Availability Configuration wizard
- Administering application monitoring from the Veritas High Availability view
- Administering application monitoring from the Veritas High Availability view
- Administering application monitoring from the Veritas High Availability view
- Section VI. Cluster configurations for disaster recovery
- Connecting clusters–Creating global clusters
- VCS global clusters: The building blocks
- About global cluster management
- About serialization - The Authority attribute
- Prerequisites for global clusters
- Setting up a global cluster
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Test scenario for a multi-tiered environment
- Administering global clusters from the command line
- About global querying in a global cluster setup
- Administering clusters in global cluster setup
- Setting up replicated data clusters
- Setting up campus clusters
- Connecting clusters–Creating global clusters
- Section VII. Troubleshooting and performance
- VCS performance considerations
- How cluster components affect performance
- How cluster operations affect performance
- VCS performance consideration when a system panics
- About scheduling class and priority configuration
- VCS agent statistics
- About VCS tunable parameters
- Troubleshooting and recovery for VCS
- VCS message logging
- Gathering VCS information for support analysis
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting issues with systemd unit service files
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting sites
- Troubleshooting I/O fencing
- Fencing startup reports preexisting split-brain
- Troubleshooting CP server
- Troubleshooting server-based fencing on the VCS cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting licensing
- Licensing error messages
- Troubleshooting secure configurations
- Troubleshooting wizard-based configuration issues
- Troubleshooting issues with the Veritas High Availability view
- VCS message logging
- VCS performance considerations
- Section VIII. Appendixes
How I/O fencing works in different event scenarios
Table: I/O fencing scenarios describes how I/O fencing works to prevent data corruption in different failure event scenarios. For each event, review the corrective operator actions.
Table: I/O fencing scenarios
Event | Node A: What happens? | Node B: What happens? | Operator action |
---|---|---|---|
Both private networks fail. | Node A races for majority of coordination points. If Node A wins race for coordination points, Node A ejects Node B from the shared disks and continues. | Node B races for majority of coordination points. If Node B loses the race for the coordination points, Node B panics and removes itself from the cluster. | When Node B is ejected from cluster, repair the private networks before attempting to bring Node B back. |
Both private networks function again after event above. | Node A continues to work. | Node B has crashed. It cannot start the database since it is unable to write to the data disks. | Restart Node B after private networks are restored. |
One private network fails. | Node A prints message about an IOFENCE on the console but continues. | Node B prints message about an IOFENCE on the console but continues. | Repair private network. After network is repaired, both nodes automatically use it. |
Node A hangs. | Node A is extremely busy for some reason or is in the kernel debugger. When Node A is no longer hung or in the kernel debugger, any queued writes to the data disks fail because Node A is ejected. When Node A receives message from GAB about being ejected, it panics and removes itself from the cluster. | Node B loses heartbeats with Node A, and races for a majority of coordination points. Node B wins race for coordination points and ejects Node A from shared data disks. | Repair or debug the node that hangs and reboot the node to rejoin the cluster. |
Nodes A and B and private networks lose power. Coordination points and data disks retain power. Power returns to nodes and they restart, but private networks still have no power. | Node A restarts and I/O fencing driver (vxfen) detects Node B is registered with coordination points. The driver does not see Node B listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node A from joining the cluster. Node A console displays: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. | Node B restarts and I/O fencing driver (vxfen) detects Node A is registered with coordination points. The driver does not see Node A listed as member of cluster because private networks are down. This causes the I/O fencing device driver to prevent Node B from joining the cluster. Node B console displays: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. | Resolve preexisting split-brain condition. |
Node A crashes while Node B is down. Node B comes up and Node A is still down. | Node A is crashed. | Node B restarts and detects Node A is registered with the coordination points. The driver does not see Node A listed as member of the cluster. The I/O fencing device driver prints message on console: Potentially a preexisting split brain. Dropping out of the cluster. Refer to the user documentation for steps required to clear preexisting split brain. | Resolve preexisting split-brain condition. |
The disk array containing two of the three coordination points is powered off. No node leaves the cluster membership | Node A continues to operate as long as no nodes leave the cluster. | Node B continues to operate as long as no nodes leave the cluster. | Power on the failed disk array so that subsequent network partition does not cause cluster shutdown, or replace coordination points. See Replacing I/O fencing coordinator disks when the cluster is online. |
The disk array containing two of the three coordination points is powered off. Node B gracefully leaves the cluster and the disk array is still powered off. Leaving gracefully implies a clean shutdown so that vxfen is properly unconfigured. | Node A continues to operate in the cluster. | Node B has left the cluster. | Power on the failed disk array so that subsequent network partition does not cause cluster shutdown, or replace coordination points. See Replacing I/O fencing coordinator disks when the cluster is online. |
The disk array containing two of the three coordination points is powered off. Node B abruptly crashes or a network partition occurs between node A and node B, and the disk array is still powered off. | Node A races for a majority of coordination points. Node A fails because only one of the three coordination points is available. Node A panics and removes itself from the cluster. | Node B has left cluster due to crash or network partition. | Power on the failed disk array and restart I/O fencing driver to enable Node A to register with all coordination points, or replace coordination points. |