"vxdisk resize" can cause user data corruption in a volume on AIX or HPUX if device block 0 values are invalid on a CDS format disk

Article: 100007980
Last Published: 2012-04-24
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

This problem is encountered when a vxdisk resize is run on the AIX or HP platform on a CDS format disk device. 

The data stored in block zero is used with cds formatted disk. It is needed to determine the block offset to write backup labels.  If the data in block zero is invalid and smaller than the VxVM disk media size, silent user data corruption can occur if there is user data near the end of the disk media. 

This problem does not affect the Solaris or Linux versions of the product. This problem occurs only after running vxdisk resize on a CDS formatted disk on AIX/HP platforms. 

In other words, if vxdisk resize is not invoked on a CDS format device on AIX or HPUX, then this problem will not occur.

 

 

Error Message

Tools like Oracle DB Verify can be used to identify corruption in the user data location.

The script provided in this article can also identify if a disk has corruption in the user data location.

Corruption in data blocks may look like block 0 data:

Example Of Corruption:

EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]VxVMDISK g-,; EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]_LVM EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256
 

 

Cause

Due to a defect in vxdisk resize on the AIX and HPUX platform versions of the VxVM product, the disk label (Block 0) is changed to reflect a smaller disk size (capacity) than the VxVM disk size. The location of the backup labels are calculated using block 0 data, resulting in the backup labels overlapping the end of the public region.  This then results in the corruption of the last blocks of the public region.

Running vxdisksetup –if will NOT cause corruption even  if vxdisk resize is invoked on that CDS format device in the past to increase the capacity.  After invoking vxdisksetup –if on a CDS disk whose disk capacity was increased earlier, the CDS format will be using the original disk size, but there will not be any corruption, since the disk size in the label and the VxVM disk size will be identical.

Running vxdisksetup –if may fail after decreasing the capacity of CDS format devices. In this case, executing vxdisk resize before invoking vxdisksetup –if will not solve the issue.

Solution

Please run the following script lbl_rawgeo_chk_v5.sh against the DM (Disk Media) name to determine if the disk device can run into this issue. When a disk fails the script, it will report the total LUN sizes from all three views:

 
a)       The VxVM discovered size
b)       Raw Geometry Size through scsi inquiry and mode pages 3&4
c)        Lun Size stored in block 0
 
The script will report 4 cases during the checking of each VxVM DM (Disk Media)
 
1)       Case when a disk passes
Disk hitachi_usp0_24 Passed
 
2)       When a disk fails but no corruption was detected in user data region
 
--------emc0_3c12 Lun Size's ---------
VxVM DM(disk media) Size = 20867136
Raw Geometry Size = 20966400
Block 0 Label Size = 3276800
-----------------------------------
Please upgrade to 5.1SP1RP2P2
No Block 0 label data was detected in user data blocks
Please provide the following to Veritas:
/var/tmp/emc0_3c12_blk0.out /var/tmp/emc0_3c12_vxsci.out
 
3)       When a disk fails and there is label data in user data region.  It will report the block 0 (label) data that was detected at what disk offset
 
--------emc0_3c15 Lun Size's ---------
VxVM DM(disk media) Size = 20867136
Raw Geometry Size = 20966400
Block 0 Label Size = 3276800
-----------------------------------
Script has detected that Disk emc0_3c15 has Block 0 data in user data region.
Please upgrade to 5.1SP1RP2P2
Block 0 data was located on device at block offset 3284737
 
Please provide the following to Veritas:
/var/tmp/emc0_3c15_blk0.out /var/tmp/emc0_3c15_bklbl.out /var/tmp/emc0_3c15_vxsci.out
 
4)       When disk is not under VxVM control but there is a discrepancy to review
 
hitachi_usp0_28 is not under VxVM Control
Block 0 did not contain any disk label data
vxdisksetup will place correct label using scsi mode sense values
Please make sure no one else is using disk "hitachi_usp0_28"
Raw Geometry Size = 3288960

 

If the script has determined you can run into this issue please upgrade to VxVM 5.1SP1RP2P2 or later

Upgrade Path  (5.1SP1 Two node cluster example below)

1)    Review the release notes for VxVM 5.1SP1-RP2 and VxVM 5.1SP1RP2-P2
2)    Stop VCS forcefully
3)    Stop application manually
4)    Umount all filesystem
5)    Deport diskgroups
6)    Install VxVM RP2 on both nodes with installp
7)    Install VxVM P2 on both nodes with installp
8)    Install all RP2 products using installrp << NOTE this will also start all vcs/vxvm services>>
 

NOTE:This has been encountered on AIX and HP platforms with VxVM product version 5.1SP1, 5.1SP1RP1 and 5.1SP1RP2. Solaris and Linux versions are unaffectedThe disk label (block 0) becomes incorrect after running "vxdisk resize" on a CDS format device. If "vxdisk resize" has not been executed, then the disk label is correct.

Running the following commands could cause corruption if  vxdisk resize had been previously run:

vxdg flush

vxdisk online

vxdisk -o alldgs list

vxdisk flush

vxdisk resize

References

Etrack : 2675538

Was this content helpful?