NetBackup MS-Sql agent backups using 2 or more stripes may fail on restore if the backup of each stripe dramatically differs in size.

Article: 100017403
Last Published: 2013-10-29
Ratings: 1 0
Product(s): NetBackup & Alta Data Protection

Problem

One of two restore job fails with timed out waiting for media manager to mount volume(52).

Error Message

Activity monitor shows:

05.09.2005 08:12:47 - begin reading
05.09.2005 16:10:32 - restored image xxxxxx_1125698814 - (timed out waiting for media manager to mount volume(52)); restore time 08:02:32

Looking into bpbrm log, the first job ends successfully but the second job ends with STATUS 52:

22:44:45.521 [5564.4152] <2> bpbrm read_media_msg: read from media manager: EXIT SQL-SERVER-NAME_1309097609 0
22:44:45.521 [5564.4152] <2> bpbrm process_media_msg: media manager for backup id SQL-SERVER-NAME_1309097609 exited with status 0: the requested operation was successfully completed
22:44:45.521 [5564.4152] <2> bpbrm signal_bpbrm_child: sending Normal Exit to bpbrm child 7388
22:49:46.020 [5564.4152] <2> bpbrm send_parent_msg: KEEP_ALIVE 155
22:49:46.145 [5564.4152] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
22:49:46.145 [5564.4152] <2> bpbrm multiplexed_restore: bpbrm.c.16271: keep-alive acknowleged: 0 0 0x00000000
22:54:47.025 [5564.4152] <2> bpbrm send_parent_msg: KEEP_ALIVE 156
22:54:47.150 [5564.4152] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
22:54:47.150 [5564.4152] <2> bpbrm multiplexed_restore: bpbrm.c.16271: keep-alive acknowleged: 0 0 0x00000000
22:59:48.031 [5564.4152] <2> bpbrm send_parent_msg: KEEP_ALIVE 157
22:59:48.156 [5564.4152] <2> bpbrm read_parent_msg: read from parent ACK_KEEP_ALIVE
22:59:48.156 [5564.4152] <2> bpbrm multiplexed_restore: bpbrm.c.16271: keep-alive acknowleged: 0 0 0x00000000
23:04:47.289 [5564.4152] <2> bpbrm brm_child_done: child done, status 52
23:04:47.289 [5564.4152] <2> bpbrm brm_child_done: child 7388 exited with status 52: timed out waiting for media manager to mount volume
23:04:47.289 [5564.4152] <2> bpbrm send_status_to_parent: EXIT SQL-SERVER-NAME_1309097609 52 sent to parent process.
 

Cause

When the restore of the first stripe finishes, it will wait until the second stripe finishes too. During this time the drive and tape used by the first stripe are still reserved and not available to another job. If the second restore stripe needs more time than the "Media mount timeout" setting to complete, the restore operation will time out.

This may be caused by throughput differences during the backup on each stripe causing the subsequent stripe images to be greatly different in size.

Examples of this is two drives connected via two different SAN Fabrics. Each Fabric may be loaded differently at the time of the backup causing dramatically differing stripe sizes.

Solution

Increase the “ Media Mount Timeout” (Netbackup Administration Console -> Host Properties -> Master Servers-> Timeouts) to a setting greater than the difference in restore times of each stripe.

For example: if the restore of stripe 1 takes 3 hours and stripe 2 takes 5 hours, increase the media mount timeout to a setting greater than
2 hours.
 
An alternative solution is to modify the SQL restore script and set STRIPES to 1
 

Was this content helpful?