Netbackup (NBU) Realtime Server with Continuous Data Protection (CDP) is unable to complete snapshots
Problem
Netbackup (NBU) Realtime Server with Continuous Data Protection (CDP) is unable to complete snapshots
Error Message
The error in the failed snapshot will be displayed with a log entry similar to:
22:24:09.088 [20755] <2> onlfi_vfms_logf: INF - VxVM vxdisk ERROR V-5-1-16007 Data Corruption Protection Activated - User Corrective Action Needed
or
12:26:39.066 [19911] <4> bpfis Exit: INF - EXIT STATUS 156: snapshot error encountered
Cause
When a snapshot image expires, NBU CDP cleans up the older snapshot and prepares to take a newer one. As part of clean up procedure, disks under VxVM are removed using “vxdisk rm <disk>”. But before actual device discovery is initiated, a partial discovery for another lun is triggered by vxattachd. This partial discovery has cleared all the information related to older luns within Dynamic Multipathing (DMP) database. When newer snapshot is taken, OS assigned new luns with the same device number as older ones. As a result DMP detected it to be a corruption scenario and terminated DMP reconfiguration. This resulted in further snapshot failures.
Solution
The current workaround to this problem is to make sure there is no device discovery between the NBU operations. This can be done by disabling the 'vxesd' and 'vxattachd'processes. The preferred method to disable 'vxesd' is to issue:
# /usr/sbin/vxddladm stop eventsource
If this fails, sending a SIGTERM or SIGKILL to the 'vxesd' may be needed.
# kill <pid>
or
# kill -9 <pid>
The 'vxattachd' processes will also need to be terminated using a kill command similar to the one above.
Note: When the error “Data corruption Protection Activated” occurs , the disks are in error/failing/unusable/disabled state. The steps to recover from this state can be found in the related article 000040746.
Applies To
Solaris 9, 10. VxVM 5.1SP1.