Access 3340 appliance version 7.4.2 can suffer hangs or unexpected reboot of 1 node when using Veritas Data Deduplication (VDD)

Article: 100045745
Last Published: 2019-06-21
Ratings: 0 0
Product(s): Appliances

Problem

When using the VDD solution with Access 3340 appliance at version 7.4.2 a hang or unexpected reboot (panic) of the node running the VDD services can occur.

In the case of the hang event, the VDD service is not failed over to the remaining node.

Error Message

There is no error message.

Cause

Processes associated with VDD service, spoold and spad, consume large amounts of memory. Depending on workload and physical RAM installed the memory usage of these processes can become too large, causing unexpected behaviour of the node as other processes find it increasingly difficult to allocate memory they require. Eventually, a hang or panic can occur.

Solution

Patch 7.4.2.100 for Access 3340 has been released which contains fixes for the VDD service and improved tuning of memory management. These fixes and tuning will improve stability and decrease the likelihood of a hang or panic. 

However, depending on workloads, the VDD processes may require large amounts of memory. If the Access 3340 appliance is intended for use with the VDD solution, it is recommended to have at least 380GB of RAM per node.

For Appliances at 7.4.2 it is recommended to apply the 7.4.2.100 patch. If that is not possible, the following tuning can be made to improve stability. Note that this requires the VDD service to be stopped.

1. Tune vm.swappiness from zero to ten. Note these actions need to be carried out on all nodes in the cluster:

Check the current value is zero,adjust to ten and confirm the new value:
sysctl vm.swappiness
sysctl vm.swappiness=10
sysctl vm.swappiness

Adjust boot script to ensure the new value is persistent across reboots

cp /opt/VRTSnas/scripts/misc/nas_always.sh /opt/VRTSnas/scripts/misc/nas_always.orig
vi  /opt/VRTSnas/scripts/misc/nas_always.sh

Alter line 158 to set the swappiness value to 10

158        echo 10 > /proc/sys/vm/swappiness

 

2. Tune spoold MaxCacheSize from the default of 75%

If the 3340 cluster nodes each have 350GB RAM or more, use a value of 25%.
If the 3340 cluster nodes each have less than 350GB RAM , use a value of 50%

Note, this action only needs to be carried out on any one node.

Stop the VDD service using the Access Cluster clish

CLISH> dedupe show
CLISH> dedupe stop

Adjust the VDD config file to set the new value

cp /vx/<FILESYSTEM>/dedupe/etc/puredisk/contentrouter.cfg  /vx/<FILESYSTEM>/dedupe/etc/puredisk/contentrouter.cfg.orig
vi  /vx/<FILESYSTEM>/dedupe/etc/puredisk/contentrouter.cfg

Alter line 401 to set MaxCacheSize from 75% to either 50% or 25%

MaxCacheSize=50%

Restart the VDD service

CLISH> dedupe start
CLISH> dedupe show

After making the tunable changes, monitor the memory usage of the spoold process. If it reaches 80% of RAM or more, then restart the VDD service to ensure stability of the appliance.

In the appliance CLISH, go to the Monitor section and use the Top option
CLISH.Main_Menu> Monitor
CLISH.Monitor> Top

When top is running, press <shift>M to sort the output on memory usage.
Look for the spoold process and make a note of the value in the %MEM column.
Press 'q' to exit from the top screen and return to the clish prompt.

If spoold is using around 80% or more, then restart the VDD service.

CLISH> dedupe stop
CLISH> dedupe start
CLISH> dedupe show

Reconfiguring VDD

In the event that VDD is unconfigured and then configured again the MaxCacheSize tunable will be reset back to the default of 75%. After configuring dedupe, repeat step 2 above to set MaxCacheSize.

EEB Considerations

It is highly recommended to apply patch 7.4.2.100. However, if it is not possible to apply the patch, the following EEB's are available for 7.4.2.  The EEB's marked as public are available from the SORT website, please contact Veritas Support for other EEB's.

EEB Verison Description
3972425 3

Disable selective email notifications
pam.d misconfiguration prevents login
debuginfo appears hung
 When balancing vipgroups, the msdp is moved between nodes

3971871 4 The EEB bundle contains fixes for VDD issues on Access 7.4.2 
3971580 1 Fix Access Appliance memory corruption issue  (public EEB)
3964974 6 Fix reboot/shutdown issue in cluster nodes (public EEB)

Was this content helpful?