Problem
Media Server Deduplication and PureDisk (with or without PDDO) in some environments seem to perform much slower than expected.
Cause
Performance issues with MSDP or PureDisk may show up as:
1. Web GUI slowness for PureDisk.
2. Connections reset by peer, broken pipe type errors while running many backups or replication jobs
3. General slowness while accessing the system and especially the /Storage partition on PureDisk or the storage path for MSDP.
4. Slow backup performance.
5. Slow processing of tlogs. This can be seen by monitoring the storaged.log on PureDisk /Storage/log/spoold/storaged.log or on MSDP <storage path>\log\spoold\storaged.log. You can examine several runs of queue processing and find the number of transactions processed per minute:
<start> July 28 17:21:52 INFO [1077967168]: Synchronization for transaction log /Storage/queue/sorted-302776-303167.tlog started, 27898688 transactions pending. July 28 17:22:49 INFO [1077967168]: Transaction log 302776-303167: 100000 of 27898688 entries processed. ... <end> July 28 20:53:32 INFO [1077967168]: Transaction log 302776-303167: 27800000 of 27898688 entries processed. July 28 20:54:17 INFO [1077967168]: Number of data store commits: 1758 July 28 21:48:30 INFO [1077967168]: Time required to build index on objects2 table: 3253.408098 July 28 21:48:32 INFO [1077967168]: Time required to drop objects table: 1.632608 July 28 21:48:32 INFO [1077967168]: Time required to rename objects2 table to objects: 0.103928 July 28 21:49:14 INFO [1077967168]: Transaction log 302776-303167 Completed. Expect: 27898688 (1600.33MB) Commit: 27898688 (3682.29MB) Retry: 0 Log: /Storage/queue/sorted-302776-303167.tlog SO: Add 15880, Ref Add 5195366, Ref Add Fail: 0, Ref Del 22686294 DO: Add 26, Ref Add 31, Ref Add Fail: 0, Ref Del 0 TASK: Add 37, End 27, End All 1, Del 0 DCID: SO 1026, SO Fail 0, DO 0, DO Fail 0 MARKER: 0, Fail 0
A server with the recommended performance levels should be able to do 190,000 transactions per minute or better.
6. Slowness loading the cache after restarting the PureDisk/MSDP services or a reboot. This will show up in the spoold process not being able to accept connections until the loading the cache for the containers into ram is complete.
7. If using PDDO/MSDP it will show up possibly as status 84, 83, 213, 800, 801, 2104, and maybe other status errors relating to the disk pool as being down, not available or not writable/readable. If you experience any of these issues it would be good to check the underlying disk performance. This is not intended to replace any other troubleshooting you may need to do in the environment to asses the issue but if the underlying hard disk performance is below our recommended levels it will be seen in the conditions mentioned above.
The requirements from Symantec NetBackup PureDisk Getting Started Guide, pages 32 - 33:
http://www.veritas.com/docs/TECH126976
Hard disk speed recommendations for PureDisk 6.6
The read and write speeds of the hard disks in the storage pool affect overall PureDisk performance.
To determine disk speed
1 To determine the speed of the drives in the storage pool, run the following command:
time (dd if=/dev/zero of=/Storage/data/xyz bs=64k count=409600; sync)
2 The output looks similar to the following:
409600+0 records in
409600+0 records out
----------------------
speed=26843545600/110.9 = 225.5MB/sec
Computers with disk speeds greater than 200MB per second have optimal read and write performance for PureDisk.
Computers with disk speeds between 150-200MB per second have sufficient read and write speed for PureDisk.
Computers with disk speeds between 100-150MB per second have some operations with degraded performance.
Computers with disk speeds less than 100MB per second experience poor performance. Improve disk reads and writes before installing and running PureDisk.
The requirements from our Symantec NetBackup™Deduplication Guide guide page 53 - 54:
http://www.veritas.com/docs/TECH135526
Hard disk speed recommendations for MSDP
About deduplication storage requirements
The following defines the storage for the NetBackupMedia Server Deduplication Option:
Storage media Disk, Storage area network (SAN), direct-attached storage (DAS), or internal disks
Connection
Minimum performance 130 MB/sec minimum read and write
The storage must be configured and operational before you can configure deduplication in NetBackup.
NetBackup requires exclusive use of the disk resources. If the storage is used for purposes other than backups, NetBackup cannot manage disk pool capacity or manage storage lifecycle policies correctly. Therefore, NetBackup must be the only entity that uses the storage.
Local disk storage may leave you vulnerable in a disaster. SAN attached disk can be remounted at a newly provisioned server with the same name.
For further testing:
1. nbperfchk command
You can use the nbperfchk command to test disk performance during a backup window.
nbperfchk is located in /usr/openv/netbackup/bin/support/nbperfchk (Unix), <install_path>\NetBackup\bin\support\nbperfchk (Windows) or in Support> Nbperfchk in the NetBackup appliance CLISH.
Additional information about the nbperfchk command can be found in 000117032.
2. "camel" test tool for older versions of NetBackup that do not include nbperfchk
Disk I/O example
In its simplest form, camel writes data to disk and measures the performance. After extracting camel for your platform
On Windows use this:
camel -i zero: -o H:\Storage\camelfile.tst -s 64g -syncend
On Linux/Unix use this:
./camel.linux -i zero: -o /Storage/camelfile.tst -s 64g -syncend
NOTE: In both cases you need to make sure and use the correct path for your MSDP or PureDisk storage. In the examples above we have used H:\Storage and /Storage respectively. In both cases you will still need to use the output file (camelfile.tst)
NOTE: camelfile.tst will be created in the designated path and should be 64GB. Please make sure the storage drive has enough free space to run this test.
NOTE: After testing is complete please delete camelfile.tst
The "camel" tool can only be obtained by contacting support, however nbperfchk is preferred and recommended.
3. If using Storage Foundation, VXVM, VXFS from Symantec then you will want to use VRTSExplorer and open a support ticket with Storage Foundation to help determine the cause. They have a tool called 'firstlook' that they can use to help determine the issue and work on a solution.
http://www.veritas.com/docs/TECH17676
Solution
1. Possible issues involved are using a single large LUN presented from the SAN with multiple disks in the same Column.
2. Possible switch, cable or SAN issues.
3. Make sure that all components (mainboard for the system, HBA, switch, and SAN) are running the latest firmware that may have possible fixes for the issues being experienced.
4. If using Storage Foundation open a support ticket with our Storage Foundation group to help analyze and work toward resolution.
5. If using iSCSI or NFS make sure that they have all the supporting infrastructure in place that will meet the performance requirements. Possible considerations include:
- Dedicated interface for storage network.
- The interface card's, switch, storage device, and mainboard have up to date firmware as well as using the latest drivers.
- The interface card's, switch and network in general can support a sustained throughput that meet our performance requirements.
- If using a 1GB NIC it will not meet the requirements and you will need at least 4 1GB NICS or a single 10GB NIC to meat the minimum performance requirements.
Note: NFS is not supported for use with MSDP. ISCSI is supported in NetBackup versions 7.5 and above but with the restrictions outlined in the NetBackup Deduplication Guide.
6. On Windows Server 2008 R2, make sure that SP1 is installed as it includes many network and disk i/o performance patches:
https://support.microsoft.com/kb/976932
7. Stagger the backups under NetBackup so that they are not all starting at the same time.
8. As a workaround the following steps can be used to allow netbackup to tolerate slower response times from the disk pool without marking it as down. This is not a permanent fix and will only hide the issue but will allow backups to complete until the real issue can be fixed.
Create the following files with the suggested settings:
Windows: <install path>\Veritas\NetBackup\db\config\
Linux/Unix: /usr/openv/netbackup/db/config/
Note: If your install location is different then please adjust the above paths to match your NetBackup install path.
- This one is just an empty file:
DPS_PROXYNOEXPIRE
- Create this file with the value of 1800 inside:
DPS_PROXYDEFAULTSENDTMO
- Create this file with the value of 1800 inside:
DPS_PROXYDEFAULTRECVTMO
Note: Make sure to use no file type suffix under windows as the files should be created the the exact names listed above.
- Please restart nbrmms (NetBackup Remote Manager and Monitor Service) on the media server
Warning: if the issue persists after one daily or nightly backup schedule window after employing the above configuration changes, please remove the touchfiles and troubleshoot the issue further via logs (nbrmms, dps, spoold, spad) to determine root cause.
Applies To
MSDP = Media Server Deduplication Pool
MSDO = Media Server Deduplication Option
PDDO = PureDisk Deduplication Option