"vxdisk resize" can cause user data corruption in a volume on AIX or HPUX if device block 0 values are invalid on a CDS format disk
Problem
This problem is encountered when a vxdisk resize is run on the AIX or HP platform on a CDS format disk device.
The data stored in block zero is used with cds formatted disk. It is needed to determine the block offset to write backup labels. If the data in block zero is invalid and smaller than the VxVM disk media size, silent user data corruption can occur if there is user data near the end of the disk media.
This problem does not affect the Solaris or Linux versions of the product. This problem occurs only after running vxdisk resize on a CDS formatted disk on AIX/HP platforms.
In other words, if vxdisk resize is not invoked on a CDS format device on AIX or HPUX, then this problem will not occur.
Error Message
Tools like Oracle DB Verify can be used to identify corruption in the user data location.
The script provided in this article can also identify if a disk has corruption in the user data location.
Corruption in data blocks may look like block 0 data:
Example Of Corruption:
EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]VxVMDISK g-,; EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]_LVM EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 >]EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256 EMC-SYMMETRIX-5874 cyl 800 alt 2 hd 16 sec 256
Cause
Due to a defect in vxdisk resize on the AIX and HPUX platform versions of the VxVM product, the disk label (Block 0) is changed to reflect a smaller disk size (capacity) than the VxVM disk size. The location of the backup labels are calculated using block 0 data, resulting in the backup labels overlapping the end of the public region. This then results in the corruption of the last blocks of the public region.
Running vxdisksetup –if will NOT cause corruption even if vxdisk resize is invoked on that CDS format device in the past to increase the capacity. After invoking vxdisksetup –if on a CDS disk whose disk capacity was increased earlier, the CDS format will be using the original disk size, but there will not be any corruption, since the disk size in the label and the VxVM disk size will be identical.
Running vxdisksetup –if may fail after decreasing the capacity of CDS format devices. In this case, executing vxdisk resize before invoking vxdisksetup –if will not solve the issue.
Solution
Please run the following script lbl_rawgeo_chk_v5.sh against the DM (Disk Media) name to determine if the disk device can run into this issue. When a disk fails the script, it will report the total LUN sizes from all three views:
Disk hitachi_usp0_24 Passed
--------emc0_3c12 Lun Size's ---------
VxVM DM(disk media) Size = 20867136
Raw Geometry Size = 20966400
Block 0 Label Size = 3276800
-----------------------------------
Please upgrade to 5.1SP1RP2P2
No Block 0 label data was detected in user data blocks
Please provide the following to Veritas:
/var/tmp/emc0_3c12_blk0.out /var/tmp/emc0_3c12_vxsci.out
--------emc0_3c15 Lun Size's ---------
VxVM DM(disk media) Size = 20867136
Raw Geometry Size = 20966400
Block 0 Label Size = 3276800
-----------------------------------
Script has detected that Disk emc0_3c15 has Block 0 data in user data region.
Please upgrade to 5.1SP1RP2P2
Block 0 data was located on device at block offset 3284737
Please provide the following to Veritas:
/var/tmp/emc0_3c15_blk0.out /var/tmp/emc0_3c15_bklbl.out /var/tmp/emc0_3c15_vxsci.out
hitachi_usp0_28 is not under VxVM Control
Block 0 did not contain any disk label data
vxdisksetup will place correct label using scsi mode sense values
Please make sure no one else is using disk "hitachi_usp0_28"
Raw Geometry Size = 3288960
If the script has determined you can run into this issue please upgrade to VxVM 5.1SP1RP2P2 or later
Upgrade Path (5.1SP1 Two node cluster example below)
NOTE:This has been encountered on AIX and HP platforms with VxVM product version 5.1SP1, 5.1SP1RP1 and 5.1SP1RP2. Solaris and Linux versions are unaffected. The disk label (block 0) becomes incorrect after running "vxdisk resize" on a CDS format device. If "vxdisk resize" has not been executed, then the disk label is correct.
Running the following commands could cause corruption if vxdisk resize had been previously run:
vxdg flush
vxdisk online
vxdisk -o alldgs list
vxdisk flush
vxdisk resize