How does InfoScale (and Storage Foundation) for Windows maintain SCSI reservations with clustered disks?
Problem
On Windows hosts, clustered disks are managed using SCSI reservations. This is key to preventing data from being presented to multiple nodes. Windows file systems do not currently handle shared access, meaning that having data imported on multiple nodes could lead to overlapping access to data blocks. This could result in loss of data consistency or larger corruption. SFW will not allow import of a clustered disk group if forming reservations is not possible.
The monitor cycles of Veritas Cluster Server (VCS), Microsoft Cluster Server (MSCS) and Windows Server Failover Cluster (WSFC) enforce mechanisms to check whether these reservations are still in place.
Solution
SFW implements this in two ways:
1. The vxio.sys driver maintains a worker thread for each Disk Group that is imported by cluster software activity. This thread checks the reservation status for every disk in the Disk Group at three second intervals. The reservation requests are serialized so that only one disk can be queried at any one time, and this must return (either failure or success) before the next request is sent. If a disk returns a failure, but this is not sufficient to cause the Disk Group to fail, the thread will continue to test that disk in the next set of reservation testing.
The worker thread is interrogated by the clustering software to determine whether the disks are reserved rather than the cluster having to check the devices itself.
The mechanism for checking reservations varies depending on whether SCSI-2 or SCSI-3 is being used, and whether SFW itself is in SCSI-3 mode.
- SFW in SCSI-2 mode: The worker thread in vxio will initiate a SCSI-2 reservation request to every disk in the Disk Group.
- SFW in SCSI2 mode, with multipathing software in SCSI-3 mode: The worker thread in vxio will initiate a SCSI-2 reservation request to every disk in the Disk Group. The multipathing layer translates this into a SCSI-3 READ KEY request. The SCSI-3 host key is maintained by the multipathing solution.
Note: SCSI-2 is not seen often in all currently supported SFW/InfoScale products, but is seen when VMWare disks are prestented to the guest hosts where the SFW/InfoScale software is installed and clustering is being used as these disks only support SCSI-2.
SCSI-3:
- SFW in SCSI-3 mode: When SFW is in SCSI-3 mode, the vxio reservation thread will initiate a SCSI-3 READ KEY request to every disk in the Disk Group.
- On the cluster software side, SFW provides a mechanism whereby the cluster software can monitor the reservation status.
With VCS: The VMDg agent checks the status every 60 seconds (by default). This monitor does not need to query the disks directly to check the reservations, it will query the vxio worker thread for the current status.
With Microsoft Cluster (MSCS/WSFC): The actions taken by the IsAlive and the LooksAlive requests are the same at the SFW level (i.e. the IsAlive doesn't perform any additional checking over the LooksAlive). The LooksAlive request will query the relevant vxio worker thread for the reservation status.
Failure states:
Loss of reservation:
A reservation to a disk is deemed to be lost if:
- A request to query/renew the reservation fails.
- The target disk is removed from the system.
This event is reflected in the MSCS or WSFC cluster logs (%windir%\cluster\reprorts\cluster.log) by:
ERR Volume Manager Disk Group <DISKGROUPNAME>: LDM_RESLooksAlive: *** FAILED for DISKGROUPNAME, status = 0, res = 0, dg_state = 34
And in the VCS VMDg agent logs (%VCS_HOME%\logs\VMDg_a.log) for VCS:
VMDg:VMDGRESOURCENAME:monitor:Diskgroup SCSI reservation status. Diskgroup name = DISKGROUPNAME, Reservation = 00000022
where 0x22 = 34 = DG_RES_MAJORITY_LOST.
This will additionally show up in the System Event Log. Please see this article for more information on the errors logged by this event: V-203-57349-41 "vxio: cluster or private disk group %2 has lost access to a majority of its disks. Its reservation thread has been stopped" appears in Veritas Storage Foundation for Windows
Loss of cluster software (reservation suspension):
To summarize, in a MSCS/WSFC configuration, there is an additional check made to deal with the situation where the cluster software itself has failed. WSFC and MSCS use a challenge/defence mechanism whereby the owning node needs to continuously renew the reservation or the other node may be able to take over the disks. Because of this, the vxio worker thread checks to see if the cluster is sending LooksAlive/IsAlive request. If the worker thread does not receive a monitoring request within a specified time, then it will assume that the cluster software has failed. In this circumstance, the reservations are then terminated to allow other nodes to be able to reserve the disks. The alternative would be that the node would continue to maintain the reservations which could prolong potential outages.
Majority Rule:
There are two applications of the Majority Rule:
1. If a Disk Group is deported it can only be imported if more than 50% of the disks are available.
Please see this article for more information on the errors logged by this event: V-76-58645-585 - Failed to reserve a majority of disks in cluster dynamic disk group.
Note: If a disk containing a Subdisk from a non-Fault Tolerant Volume is removed then it is likely that the clustered application will fault, regardless of whether the SFW Majority Rule allows the Disk Group to remain online or not.
Please see this article for more information on the errors logged by this event: V-40-49157-41 - vxio: cluster or private disk group %2 has lost access to a majority of its disks. Its reservation thread has been stopped.