Search <book_title>...

Cluster Server 7.4.1 Administrator's Guide - Linux

Last Published: 2019-10-17

Product(s): InfoScale & Storage Foundation (7.4.1)

Platform: Linux

Section I. Clustering concepts and terminology
Section II. Administration - Putting VCS to work
Section III. VCS communication and operations
Section IV. Administration - Beyond the basics
Section V. Veritas High Availability Configuration wizard
1. Introducing the Veritas High Availability Configuration wizard
2. Administering application monitoring from the Veritas High Availability view
  1. Administering application monitoring from the Veritas High Availability view
  2. Administering application monitoring settings
Section VI. Cluster configurations for disaster recovery
Section VII. Troubleshooting and performance
1. VCS performance considerations
2. Troubleshooting and recovery for VCS
Section VIII. Appendixes

Ongoing cluster membership

Once the cluster is up and running, a system remains an active member of the cluster as long as peer systems receive a heartbeat signal from that system over the cluster interconnect. A change in cluster membership is determined as follows:

When LLT on a system no longer receives heartbeat messages from a system on any of the configured LLT interfaces for a predefined time (peerinact), LLT informs GAB of the heartbeat loss from that specific system.
This predefined time is 16 seconds by default, but can be configured.
You can set this predefined time with the set-timer peerinact command. See the llttab manual page.
Note:
When you configure InfoScale cluster in an Azure environment and enable I/O fencing, you must increase the llt-peerinact time to 120 seconds. Each time a cluster node reboots, the fencing module checks whether a network partition has occurred and performs I/O fencing if required. When you reboot a cluster node using the Azure portal, the reboot takes longer than the fencing module takes to complete its operations. As a result, the node that reboots or any other node may panic. To avoid this situation, you must increase the llt-peerinact time.
When LLT informs GAB of a heartbeat loss, the systems that are remaining in the cluster coordinate to agree which systems are still actively participating in the cluster and which are not. This happens during a time period known as GAB Stable Timeout (5 seconds).
VCS has specific error handling that takes effect in the case where the systems do not agree.
GAB marks the system as DOWN, excludes the system from the cluster membership, and delivers the membership change to the fencing module.
The fencing module performs membership arbitration to ensure that there is not a split brain situation and only one functional cohesive cluster continues to run.

The fencing module is turned on by default.

Review the details on actions that occur if the fencing module has been deactivated:

More Information

About cluster membership and data protection without I/O fencing