Partially successful DBPaaS backup due to mtstrm_close_session: a fatal error occured in Multi-Threaded Agent
Problem
A NetBackup operation ends in Status 1 for a DBPaaS backup of a very large amount of data, and the job monitor details may show an error similar to "Error occurred in Multi-Threaded Agent".
Error Message
Dec 22, 2022 7:54:47 AM - begin writing
Dec 23, 2022 2:49:44 AM - Info bpbkar (pid=891430) start to take universal share backup snapshot
Dec 23, 2022 2:49:45 AM - Info bpbkar (pid=891430) waiting universal share backup snapshot ready
Dec 23, 2022 2:51:24 AM - Info bpbkar (pid=891430) preparing universal share backup data
Dec 23, 2022 2:51:38 AM - Info bpbkar (pid=891430) ASSET GUID = 51c907ae-c9e1-5643-b617-493b2c9b363b
Dec 23, 2022 2:51:38 AM - Info bpbkar (pid=891430) starting to backup universal share for /mnt/vpfs_shares/nfsd/nfsdir1/xxxxxx_1671695673
Dec 23, 2022 2:58:54 AM - Info bpbkar (pid=891430) Successfully backed up database [xxxdb1].
Dec 23, 2022 2:58:54 AM - Info bpbkar (pid=891430) Successfully completed backup of 1 database(s) out of 1 databases from instance [xxx].
Dec 23, 2022 2:59:29 AM - Warning bpbrm (pid=891422) from client xxxxxx: WRN - Cannot complete the post-backup operation. None;None;Invoked operation: POST_BACKUP failed;None
Dec 23, 2022 2:59:29 AM - Info bptm (pid=891463) waited for full buffer 2 times, delayed 4522474 times
Dec 23, 2022 2:59:29 AM - Warning bpbrm (pid=891422) from client xxxxxx: WRN - Failed to delete universal share temp directory [xxxxxx_1671695673]. Errno = 39: Directory not empty
Dec 23, 2022 2:59:29 AM - Info bpbkar (pid=891430) universal share sent 1365888512 bytes out of 1395866138112 bytes to server, optimization 99.9%
Dec 23, 2022 2:59:35 AM - Info bpbrm (pid=891422) validating image for client xxxxxx
Dec 23, 2022 2:59:37 AM - Info bptm (pid=891463) Applying WORM locks to backupid xxxxxx copy 1 for 172800 seconds
Dec 23, 2022 3:01:29 AM - Info bptm (pid=891463) WORM locks were successfully applied to copy #1 for backupid xxxxxx_1671695677.
Dec 23, 2022 3:01:29 AM - Info bptm (pid=891463) EXITING with status 0 <----------
Dec 23, 2022 3:01:29 AM - Info yyy.zzz.com (pid=891463) StorageServer=PureDisk:yyy.zzz.com; Report=PDDO Stats for (yyy.zzz.com): scanned: 1363150844 KB, CR sent: 842247 KB, CR sent over FC: 0 KB, dedup: 99.9%, cache disabled, where dedup space saving:99.9%, compression space saving:0.0%
Dec 23, 2022 3:01:29 AM - Critical bptm (pid=891463) Storage Server Error: (Storage server: PureDisk:yyy.zzz.com) mtstrm_close_session: Fatal error occured in Multi-Threaded Agent: Close Session: storageServer=yyy.zzz.com, sessionName=xxxxxx_e092aa8727e3e29970097e3f39cfe91912a3bf017dde4c1ce75b4ccd9653794b_BACKUPNOW+3fae2efb-4a29-4ab0-850f-5bc8b523c697_3_1671695677Query reqeust 13 failed, request is Close Session: storageServer=yyy.zzz.com, sessionName=xxxxxx_BACKUPNOW+3fae2efb-4a29-4ab0-850f-5bc8b523c697_3_1671695677 V-454-95
Dec 23, 2022 3:01:30 AM - Critical bptm (pid=891463) sts_close_server failed: error 2060017 system call failed
Dec 23, 2022 3:01:31 AM - Info bpbkar (pid=891430) done. status: 1: the requested operation was partially successful
Dec 23, 2022 3:01:31 AM - end writing; write time: 19:06:44
Dec 23, 2022 3:01:32 AM - Info nbjm (pid=13273) Successfully updated last backup status for Asset: xxxxxx the requested operation was partially successful(1)
Cause
DBPaaS backup may use the ushare dump during backup operations, and any delays that occur could cause the mtstrmd session to time out in certain environments.
- It is not a fatal error, as no data loss or corruption results from the time out.
- It is safe to ignore the messages.
Solution
The SessionInactiveThreshold option can be configured to use a larger value to prevent the errors from being seen:
- The default value is 480 (8 hours) or 1440 depending on the NetBackup version.
- The maximum SessionInactiveThreshold is 10080.
To increase the value:
1. Edit mtstrm.conf at the following locations:
- Linux/Unix: /usr/openv/lib/ost-plugins/mtstrm.conf
- Windows: install_path\Veritas\NetBackup\bin\ost-plugins\mtstrm.conf
2. Provide a larger value for SessionInactiveThreshold that is 10080 or less.
3. Restart the mtstrmd daemon.