NetBackup 8.2 (or greater) multiplexed tape backups may not be readable by media servers at versions pre 8.2. Duplication by media servers at version pre 8.2 could lead to dataloss.
Problem
NetBackup 8.2 (or greater) multiplexed tape backups may not be readable by media servers at versions below 8.2. Duplication by media servers at version pre 8.2 could lead to dataloss.
Error Message
When duplicating multiplexed tape images on an alternate-read-server with preserve-multiplexing set to false, there is no error message, as if the duplication was successful. The duplicated images could be corrupted and non-restorable.
Typically, verifying the copied images (copy 2 in this case) results in errors as seen below. The failures are usually found in the first few files of the tar archive.
NetBackup Catalog Results tab:Verify started 06/15/2021 11:39:38
11:39:38 INF - Verifying policy Openv-Tape, schedule Full (doris09_1623259466), media id RP0100, created 06/09/2021 12:24:26.
11:39:43 INF - If Media id RP0100 is not in a robotic library administrative interaction may be required to satisfy this mount request.
11:39:44 INF - Waiting for mount of media id RP0100 on server prince114vm01 for reading.
11:41:26 INF - Waiting for positioning of media id RP0100 on server prince114vm01 for reading.
11:41:27 INF - Beginning verify on server prince114vm01 of client doris09.
11:41:27 /usr/openv/
11:41:27 /usr/openv/var/
11:41:27 /usr/openv/var/startup_time.txt
11:41:27 /usr/openv/var/clientlogdays.txt
11:41:27 /usr/openv/var/hostdbcache.belton.lock
11:41:27 INF - Filename length does not match for file /usr/openv/var/hostdbcache.belton.lock, in image is 38, in database is 35.
11:41:27 INF - Filename from image (/usr/openv/var/hostdbcache.belton.lock) does not match filename in database (/usr/openv/var/clear_cache_time.txt).
11:41:27 /usr/openv/var/hostdb.cache.belton
11:41:27 INF - Filename length does not match for file /usr/openv/var/hostdb.cache.belton, in image is 34, in database is 38.
11:41:27 INF - Filename from image (/usr/openv/var/hostdb.cache.belton) does not match filename in database (/usr/openv/var/hostdbcache.belton.lock).
11:41:27 /usr/openv/var/hostdbcache.sclacslnxd06.engba.veritas.com.lock
11:41:27 INF - Filename length does not match for file /usr/openv/var/hostdbcache.sclacslnxd06.engba.veritas.com.lock, in image is 62, in database is 34.
11:41:27 INF - Filename from image (/usr/openv/var/hostdbcache.sclacslnxd06.engba.veritas.com.lock) does not match filename in database (/usr/openv/var/hostdb.cache.belton).
11:41:27 INF - Block number does not match for file /usr/openv/var/hostdbcache.sclacslnxd06.engba.veritas.com.lock, in image is 17, in database is 16.
11:41:27 /usr/openv/var/hostdb.cache.sclacslnxd06.engba.veritas.com
11:41:27 INF - Filename length does not match for file /usr/openv/var/hostdb.cache.sclacslnxd06.engba.veritas.com, in image is 58, in database is 62.
11:41:27 INF - Filename from image (/usr/openv/var/hostdb.cache.sclacslnxd06.engba.veritas.com) does not match filename in database (/usr/openv/var/hostdbcache.sclacslnxd06.engba.veritas.com.lock).
11:41:27 /usr/openv/var/vxss/
11:41:27 INF - Filename length does not match for file /usr/openv/var/vxss/, in image is 20, in database is 58.
11:41:27 INF - At least 10 database compare errors occurred, not logging anymore.
...
11:41:27 /usr/openv/
11:41:27 /usr/openv/tmp
11:41:27 /usr/openv/db
11:41:28 /usr/openv/nbwmc.tar.gz
11:41:28 INF - from host prince114vm01, FTL - tar received an invalid archive
11:41:28 INF - Verify of policy Openv-Tape, schedule Full (doris09_1623259466) failed, tar had an unexpected error.
11:41:28 INF - Status = no images were successfully processed.
When duplicating images with preserve-multiplexing set to true, an error is illustrated below. No data-loss occurs since the duplication doesn't succeed due to the error.
Jun 15, 2021 4:09:57 PM - Info bpduplicate (pid=22222)
...
Jun 15, 2021 4:30:45 PM - Critical bptm (pid=6805) Amount of data read (177072739) does not match the fragment kbytes (4389120) for backup id doris09_1623259466, copy 1, fragment 1
Jun 15, 2021 4:30:46 PM - Error bpduplicate (pid=22222) host prince114vm01 backup id doris09_1623259466 read failed, media read error (85).
Jun 15, 2021 4:30:48 PM - Error bpduplicate (pid=22222) Duplicate of backupid doris09_1623259466 failed, media read error (85).
Jun 15, 2021 4:30:58 PM - Error bpduplicate (pid=22222) Status = no images were successfully processed.
Jun 15, 2021 4:30:58 PM - end Duplicate; elapsed time 0:21:01
no images were successfully processed (191)
Cause
NetBackup 8.2 allows for more than one job per second to run simultaneously which leads to backup IDs from multiple clients that could have the same unix time otherwise known as ctime
The backup ID is a combination of the client name, an underscore character, and the unix ctime when the image was created (e.g, client1_1234567890, and client2_1234567890)
NetBackup writes multiplexed tape images by pre-pending an MPX header to each tape block for identification purposes. This MPX header utilizes the backup ID to identify which backup to which it belongs. When a media server running any pre-8.2 version of NetBackup reads a multiplexed tape image written by a NetBackup 8.2 or greater media server, it uses the unix ctime to identify the backup ID of the tape block. Because of this, if a tape fragment contains multiplexed data from clients that share the same UNIX ctime, the data from those images will be mixed together when read. The result of this mixing together is a corrupted data-stream.
Backup images that were not written multiplexed to tape storage units are not affected.
Backup image copies made by a NetBackup 8.2 (or greater) media server are not affected.
Backup images copies that had unique unix ctimes are not affected.
If there are multiplexed tape storage unit image copies that could potentially be affected, consider extending the retention length of the original backup images (copy 1) or any other copies made by media servers running NetBackup 8.2 or greater until any potentially corrupt copies can be identified and recreated from the original copy.
Solution
Do not use alternate read hosts to duplicate/restore/verify multiplexed tape storage unit images written by NetBackup 8.2 or greater on a NetBackup media server running less than version 8.2.
Temporarily turning on preserve multiplexing when duplicating will also avoid the problem and result in failed duplication jobs.
If you suspect that you may have a corrupted image, the bpverify command can used to validate if the image is corrupt or not.