Problem
During MSDP replications it may be observed that the outbound traffic from the target system exceeds the inbound traffic from the source. As the network has QoS (Quality of Service) this extends the time to complete the replications.
Error Message
No error message are observed, but checking the network link usage may show the following (even under a dedicated VLAN for replications):
SOURCE 1
12:00:01 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
06:10:01 AM vlan2772 42263.43 32987.28 443733.27 5780.01 0.00 0.00 0.00
06:20:01 AM vlan2772 55340.10 43085.17 432767.03 48423.99 0.00 0.00 0.00
06:30:02 AM vlan2772 55363.54 43251.52 447104.40 51961.31 0.00 0.00 0.00
06:40:01 AM vlan2772 57134.09 43463.49 438164.16 105934.30 0.00 0.00 0.00
06:50:01 AM vlan2772 56733.02 43167.23 442824.47 109605.10 0.00 0.00 0.00
07:00:02 AM vlan2772 55809.20 43217.49 434369.30 59046.86 0.00 0.00 0.00
SOURCE 2
12:00:01 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
06:10:01 AM vlan2772 29187.42 22611.28 215826.36 51043.68 0.00 0.00 0.00
06:20:01 AM vlan2772 55521.91 42035.03 396982.37 160625.93 0.00 0.00 0.00
06:30:01 AM vlan2772 50918.65 39172.30 382247.13 119702.09 0.00 0.00 0.00
06:40:01 AM vlan2772 56342.39 41844.26 391225.37 211206.83 0.00 0.00 0.00
06:50:01 AM vlan2772 52193.11 39866.17 386502.62 130459.29 0.00 0.00 0.00
07:00:01 AM vlan2772 53836.31 41288.73 395238.77 107320.88 0.00 0.00 0.00
TARGET
12:00:01 AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
06:10:01 AM vlan1317 57056.67 48684.73 57491.76 661817.07 0.00 0.00 0.00
06:20:01 AM vlan1317 89435.87 76606.59 208151.57 833275.24 0.00 0.00 0.00
06:30:01 AM vlan1317 85995.81 73120.68 170598.95 832931.39 0.00 0.00 0.00
06:40:01 AM vlan1317 92588.00 80083.90 315980.87 833101.34 0.00 0.00 0.00
06:50:01 AM vlan1317 88496.91 76426.46 239332.67 832978.66 0.00 0.00 0.00
07:00:01 AM vlan1317 87812.69 74970.78 165445.11 833251.73 0.00 0.00 0.00
Cause
This situation is caused by the client in the source domain being backed up very frequently with small jobs.
- As a result the __dofpcache__ file for the client/policy in the MSDP catalog on the target server grows due to the large number of images listed.
- This list of images is used during the replication process to create the temporary local fingerprint cache for the replication operation on the source MSDP server.
Solution
The solution is to add two settings in the [StorageDatabase] section of the spa.cfg file on the target system which limits the number of images which will be referenced.
- LastFullMaxImageCount=10
- LastFullMaxImageCountSmallDO=5
This can be accomplished with the following commands, or by manually editing the spa.cfg file (after creating a backup copy). The paths below are from a NetBackup appliance so may vary on BYO or Flex systems:
# /usr/openv/pdde/pdag/bin/pdcfg --write=/msdp/data/dp1/pdvol/etc/puredisk/spa.cfg --section=StorageDatabase --option=LastFullMaxImageCount --value=10
# /usr/openv/pdde/pdag/bin/pdcfg --write=/msdp/data/dp1/pdvol/etc/puredisk/spa.cfg --section=StorageDatabase --option=LastFullMaxImageCountSmallDO --value=5
Restart the MSDP Services on the target MSDP server after making the changes.
As an example of the result that may be achieved, the chart below shows the difference after adjusting the above setting made to one replication window.
Notice the activity was reduced from 7hrs+ on the left to less than 2hrs on the right: