Data corruption in Linux VxFS file system with sequential asynchronous (buffered) write workload under high memory pressure condition

Article: 100031525
Last Published: 2015-11-12
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

Data corruption in Linux VxFS file system with sequential asynchronous (buffered) write workload under high memory pressure condition.  A file data corruption issue is discovered through the Etrack incident 3853338.  

The issue only applies to configurations where ALL of the following apply:

1. Veritas File System versions 6.0, 6.1, 6.2 and 7.0

2. The user application is issuing a sequential asynchronous (buffered) write workload.   Synchronous writes where files are opened with the O_SYNC flag are not affected.  Non-sequential (random) I/O's are not affected.

3. The issue can only occur on appending write I/O's that are less than the page size and where the file system block size is less than the OS page size (4k).  As a result, the issue only affects file systems with block sizes of 1024 and 2048 bytes.

4. The system is under high memory pressure conditions and is heavily loaded with a sequential write workload.

5. The problem only affects VxFS running on Linux platforms.

When the problem happens, some of the data written by sequential asynchronous write are lost silently and are not written to the permanent storage.  Data corruption is detected when the file data is read and verified later.


 

Error Message

The data corruption is silent and there is no error message reported from VxFS.   The data corruption is only detected later, when the data is read or verified after it is written.   The original data written by the application is replaced by stale data in permanent storage and thus the corruption pattern cannot be determined.

 

Cause

There is a potential for a race condition between a flusher thread and a VxFS writer thread which can result in stale page contents flushed to disk. The issue can only occur on appending write I/O's that are less than page size and where the file system block size is less than the OS page size (4k).  VxFS incorrectly releases a page lock that can allow a flusher thread to read a stale VxFS inode wsize value resulting in not flushing enough data.

 

Solution

Veritas engineering have fixed the issue in the following VxFS patches.

VxFS Patch 6.0.5.400 for RHEL platforms (due for release February 2016) will include the patch.

Veritas Storage Foundation HA Patch 6.0.5.100  for SLES includes the patch
sfha-sles11sp4_x86_64-Patch-6.0.5.100 (which contains fs-sles11_x86_64-Patch-6.0.5.300).

The above patch can be downloaded from the Veritas Services and Operations Readiness Tools (SORT) website.

https://sort.veritas.com/patch/detail/10972


Hotfixes:

A supported hotfix has been made available for this issue. Please contact Veritas Technical Support to obtain this fix. This hotfix has not yet gone through any extensive Q&A testing. Consequently, if you are not adversely affected by this problem and have a satisfactory temporary workaround in place, we recommend that you wait for the public release of this hotfix.

Veritas Technologies LLC has acknowledged that the above-mentioned issue is present in the current version(s) listed under the Product(s) Section of this article. Veritas Technologies LLC is committed to product quality and satisfied customers.

Veritas Technologies LLC currently plans to address this issue by way of a patch or hotfix to the current version of the software. Please be sure to refer back to this document periodically as any changes to the status of the issue will be reflected here. A link to the patch or hotfix download will be added to this document when it becomes available. Please note that Veritas Technologies LLC reserves the right to remove any fix from the targeted release if it does not pass quality assurance tests.  Veritas’ plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.

Please contact your Veritas Sales representative or the Veritas Sales group for upgrade information including upgrade eligibility to the release containing the resolution for this issue.  For information on how to contact Veritas Sales, please see   https://www.veritas.com

The issue is also fixed in the following hot fix patches, Please contact Veritas Technical Support to obtain them.

VxFS 6.1.1 branch:
fs-rhel5_x86_64-HotFix-6.1.1.107
fs-rhel6_x86_64-HotFix-6.1.1.107
fs-sles11_x86_64-HotFix-6.1.1.107

VxFS 6.2 branch:
fs-rhel6_x86_64-HotFix-6.2.0.004
fs-rhel7_x86_64-HotFix-6.2.0.004
fs-sles11_x86_64-HotFix-6.2.0.004

VxFS 6.2.1 branch:
fs-rhel6_x86_64-HotFix-6.2.1.101
fs-rhel6_x86_64-HotFix-6.2.1.102
fs-rhel7_x86_64-HotFix-6.2.1.101
fs-sles11_x86_64-HotFix-6.2.1.101

Official fixes for above versions will be provided in the next corresponding public patches

References

Etrack : 3853338

Was this content helpful?