Search <product_name> all support & community content...

Backup jobs fail with status 50 and NBJM core dumps with stacks referencing various memory related routines.

Article: 100008521

Last Published: 2023-08-31

Ratings: 0 0

Product(s): NetBackup & Alta Data Protection

Problem

Symptoms include backup jobs fail with status 50 and NBJM core dumps with stacks referencing various memory related routines.

Another signature is the nbjm oid 117 log at DebugLevel=1 may show the Resource Broker queue wait times increase from milliseconds to seconds to minutes when the problem occurs.

The time between the queueing and sending operations may be seen to increase over time, as in this example:

02:18:55.290 queueing RB RBdbUpdate : status=KBYTES_WRITTEN 0 02:18:55.492 sending RB RBdbUpdate : status=KBYTES_WRITTEN 0 03:02:53.622 queueing RB RBdbUpdate : status=KBYTES_WRITTEN 0 03:04:03.646 sending RB RBdbUpdate : status=KBYTES_WRITTEN 0 03:05:34.496 queueing RB RBdbUpdate : status=KBYTES_WRITTEN 0 03:10:20.010 sending RB RBdbUpdate : status=KBYTES_WRITTEN 0

Cause

This problem is commonly seen in environments where media server de-duplication (MSDP) is being used extensively.

Solution

Symptoms were alleviated by implementing tuneables on the NetBackup master and media servers.

RBALLOC_KBYTES_THRESHOLD - Throttle the frequency of RBdbUpdate messages.
By default, media servers send disk storage capacity update messages to the master server at intervals of every 1 GB of data written.
Relief occurred when this tuneable was set with a value of 5000000, raising the threshold to 5 GB.

BPSCHED_THRESHOLD - Throttle the frequency that media servers send messages to the master server to update job status in the activity monitor.
By default, job status update messages are sent by the media servers to the master server for each backup and duplication job after every 200 MB, 400 MB or 600 MB of data written, depending on the type of backup (full or incremental), and the client type.
Relief occurred when this tuneable was set with a value of 5000000 raising the threshold to 5 GB.

In a Unix/Linux environment, implement the following touch files on every media server.
This includes your master server if it has media sever functionality.
# echo 5000000 > /usr/openv/netbackup/db/config/RBALLOC_KBYTES_THRESHOLD
# echo 5000000 > /usr/openv/netbackup/db/config/BPSCHED_THRESHOLD

In a Windows environment, create a file with the tuneable name containing the desired value in the $INSTALL\netbackup\db\config\ folder. Make sure that the file name does NOT include a file extension (e.g. ".txt").

If the files have been created correctly, then the nbjm log should report fewer "status=KBYTES_WRITTEN" messages than previously.

Note: Recycle NetBackup services on the media servers (and master server) after creating the above touch files.

Note: The value is 5 million (Five followed by six Zeros. No commas or spaces. Decimal value.)

Note: The effect of increasing BPSCHED_THRESHOLD will cause the media server to send progress updates less frequently. A side effect is that you may notice GUI Activity Monitor Job Updates reflecting progress increments in larger quantities with less frequency between progress update, but this may not be noticeable - it depends on the performance of the backup, the efficiency of the de-duplication, and the amount of fragmentation in the backup data stream sent by the media server.

The 5 GB value was found to be adequate in a moderately large environment. These are tuneables, and as such discovering the optimum values for any given environment may require experimentation. As with any tuning operation, best practice is keep records, monitor stability, and adjust (increase or decrease) as necessary.

Applies To

NetBackup 7.X, 8.X

References

Etrack : 2776304

Backup jobs fail with status 50 and NBJM core dumps with stacks referencing various memory related routines.

Problem

Cause

Solution

References

Was this content helpful?

Translated Content

Backup jobs fail with status 50 and NBJM core dumps with stacks referencing various memory related routines.

Problem

Cause

Solution

References

Was this content helpful?

Article Languages

Translated Content

Translated Content