Search <book_title>...

Cluster Server 7.4.1 Administrator's Guide - Linux

Last Published: 2019-10-17

Product(s): InfoScale & Storage Foundation (7.4.1)

Platform: Linux

Section I. Clustering concepts and terminology
Section II. Administration - Putting VCS to work
Section III. VCS communication and operations
Section IV. Administration - Beyond the basics
Section V. Veritas High Availability Configuration wizard
1. Introducing the Veritas High Availability Configuration wizard
2. Administering application monitoring from the Veritas High Availability view
  1. Administering application monitoring from the Veritas High Availability view
  2. Administering application monitoring settings
Section VI. Cluster configurations for disaster recovery
Section VII. Troubleshooting and performance
1. VCS performance considerations
2. Troubleshooting and recovery for VCS
Section VIII. Appendixes

VCS behavior on loss of storage connectivity

When a node loses connectivity to shared storage, input-output operations (I/O) to volumes return errors and the disk group gets disabled. In this situation, VCS must fail the service groups over to another node. This failover is to ensure that applications have access to shared storage. The failover involves deporting disk groups from one node and importing them to another node. However, pending I/Os must complete before the disabled disk group can be deported.

Pending I/Os cannot complete without storage connectivity. When VCS is not configured with I/O fencing and the PanicSystemOnDGLoss attribute of DiskGroup is not configured to panic the system, VCS assumes data is being read from or written to disks and does not declare the DiskGroup resource as offline. This behavior prevents potential data corruption that may be caused by the disk group being imported on two hosts. However, this also means that service groups remain online on a node that does not have storage connectivity and the service groups cannot be failed over unless an administrator intervenes. This affects application availability.

Some Fibre Channel (FC) drivers have a configurable parameter called failover, which defines the number of seconds for which the driver retries I/O commands before returning an error. If you set the failover parameter to 0, the FC driver retries I/O infinitely and does not return an error even when storage connectivity is lost. This also causes the Monitor function for the DiskGroup to time out and prevents failover of the service group unless an administrator intervenes.