Veritas Access Troubleshooting Guide

Last Published:
Product(s): Appliances (Version Not Specified)
Platform: 3340
  1. Introduction
    1.  
      About troubleshooting
    2.  
      General tips for the troubleshooting process
    3.  
      General techniques for the troubleshooting process
  2. General troubleshooting procedures
    1.  
      About general troubleshooting procedures
    2.  
      Viewing the Veritas Access log files
    3.  
      About event logs
    4.  
      About shell-activity logs
    5.  
      Setting the CIFS log level
    6.  
      Setting the NetBackup client log levels and debugging options
    7.  
      Retrieving and sending debugging information
    8.  
      Insufficient delay between two successive OpenStack commands may result in failure
  3. Monitoring Veritas Access
    1.  
      About monitoring Veritas Access operations
    2.  
      Monitoring processor activity
    3.  
      Generating CPU and device utilization reports
    4.  
      Monitoring network traffic
    5.  
      Exporting and displaying the network traffic details
  4. Common recovery procedures
    1.  
      About common recovery procedures
    2.  
      Restarting servers
    3. Bringing services online
      1.  
        Using the services command
    4.  
      Recovering from a non-graceful shutdown
    5.  
      Testing the network connectivity
    6.  
      Troubleshooting with traceroute
    7.  
      Using the traceroute command
    8.  
      Collecting the metasave image of a file system
    9.  
      Replacing an Ethernet interface card (online mode)
    10.  
      Replacing an Ethernet interface card (offline mode)
    11.  
      Replacing a Veritas Access node
    12.  
      Replacing a disk
    13. Speeding up replication
      1.  
        About synchronizing a replication job
      2.  
        Synchronizing an episodic replication job
    14.  
      Uninstalling a patch release or software upgrade
  5. Troubleshooting the Veritas Access cloud as a tier feature
    1.  
      Troubleshooting tips for cloud tiering
    2.  
      Issues when reading or writing data from the cloud tier
    3.  
      Log locations for checking for cloud tiering errors
  6. Troubleshooting Veritas Access installation and configuration issues
    1.  
      How to find the management console IP
    2.  
      Viewing the installation logs
    3.  
      Installation fails and does not complete
    4.  
      Excluding PCI IDs from the cluster
    5.  
      Cannot recover from root file system corruption
    6.  
      The storage disk list command returns nothing
  7. Troubleshooting the LTR upgrade
    1.  
      Locating the log files for troubleshooting the LTR upgrade
    2.  
      Troubleshooting pre-upgrade issues for LTR
    3.  
      Troubleshooting post-upgrade issues for LTR
  8. Troubleshooting Veritas Access CIFS issues
    1.  
      User access is denied on a CTDB directory share
  9. Troubleshooting Veritas Access GUI startup issues
    1.  
      Resolving GUI startup issues
  10.  
    Index

Replacing a Veritas Access node

In some cases, you may need to replace a Veritas Access node. This section describes the steps to replace a Veritas Access node.

To replace a Veritas Access node

  1. Before you delete the node from the cluster, make sure that you do not remove the CVM master node. To remove the CVM master node, you need to switch the CVM master node by switching the Management Console to other node.
  2. If you do not want to trigger the Hot Relocation, set the following tunables to -1 from the CVM master node.
    #vxtune node_reloc_timeout -1
    

    Note:

    After you set the node_reloc_timeout, the storage_reloc_timeout is automatically set to -1.

  3. Run the cluster del command for the node that is to be replaced.
    fss7310>cluster del fss7310_02
  4. Verify that all the plexes are in the NODEVICE/DISABLED state. You can use the #vxprint -p command to check the plex states.
  5. Run the following command to detach the plexes of the volumes:
    # vxplex -f -g <dg-name> -o rm dis <plex-name>
  6. Remove all the disks that are in failed was: state from the disk group by using the vxdg rmdisk command. This command needs to be run from the CVM master node.
    #vxdg  - g <dg-name> rmdisk <disk-name>
  7. Run the vxdisk rm command for the removed disks from all the nodes in the cluster.
    #vxdisk rm <disk-name>

    Note:

    This command needs to be run for all the disks from all the nodes in the cluster.

  8. After all the plexes are disabled, add the new node in the cluster by using the following command:
    fss7310>cluster add <node-ip>
  9. Run the storage disk format command from the Veritas Access management console node for all the disks from the newly added node.
    fss7310>storage disk format <list-of-disks>
  10. Add all the disks from the newly added node to the Veritas Access pool created by using the storage pool adddisk command.
    fss7310> storage pool adddisk pool1 <list-of-devices>
  11. Run the storage fs addmirror command to mirror the file system.
    fss7310> storage fs addmirror <fs-name> <pool-name>
  12. Run the vxassist command to mirror the _nlm_ volume as well.
    #vxassist  - b  - g <dg-name> mirror _nlm_

Example: Replacing a Veritas Access node

To replace a Veritas Access node

  1. Change the value of the vxtune tunable to disable hot relocation:
    # vxtune node_reloc_timeout -1
  2. Run the following command to remove the node from the cluster.
    fss7310> cluster del fss7310_02
    
    Veritas Access 7.4.2 Delete Node Program
    
    fss7310_02
    
    Copyright (c) 2017 Veritas Technologies LLC. All rights reserved. Veritas and the 
    Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC 
    or its affiliates in the U.S. and other countries. Other names may be trademarks 
    of their respective owners.
    
    The Licensed Software and Documentation are deemed to be "commercial computer software" 
    and "commercial computer software documentation" as defined in FAR Sections 12.212 
    and DFARS Section 227.7202.
    
    Logs are being written to /var/tmp/installaccess-201803130635kXW while installaccess 
    is in progress.
    
    Veritas Access 7.4.2 Delete Node Program
    fss7310_02
    Checking communication on fss7310_01 ........................................... Done
    Checking communication on fss7310_02 ........................................... Done
    Checking communication on fss7310_03 ........................................... Done
    Checking communication on fss7310_04 ........................................... Done
    Checking VCS running state on fss7310_01 ....................................... Done
    Checking VCS running state on fss7310_02 ....................................... Done
    Checking VCS running state on fss7310_03 ....................................... Done
    Checking VCS running state on fss7310_04 ....................................... Done
    The following changes will be made on the cluster:
    Failover service group VIPgroup4 will be switched to fss7310_01
    
    Switching failover service group(s) ............................................ Done
    Waiting for service group(s) to come online on the other sub-cluster ........... Done
    All the online failover service group(s) that can be switched have been switched to 
    the other sub-cluster.
    The following parallel service group(s) in the sub-cluster will be offline:
    fss7310_02: CanHostConsole CanHostNLM Phantomgroup_pubeth0 ReconfigGroup cvm iSCSI_INIT 
    vrts_vea_cfs_int_cfsmount1 vrts_vea_cfs_int_cfsmount2 vrts_vea_cfs_int_cfsmount3
    vrts_vea_cfs_int_cfsmount4 vrts_vea_cfs_int_cfsmount5 vrts_vea_cfs_int_cfsmount6
    Offline parallel service group(s) .............................................. Done
    Waiting for service group(s) to be taken offline on the sub-cluster ............ Done
    Stopping VCS on fss7310_02 ..................................................... Done
    Stopping Fencing on fss7310_02 ................................................. Done
    Stopping gab on fss7310_02 ..................................................... Done
    Stopping llt on fss7310_02 ..................................................... Done
    Clean up deleted nodes information on the cluster .............................. Done
    Clean up deleted nodes ......................................................... Done
    Delete node completed successfully
    installaccess log files and summary file are saved at:
    /opt/VRTS/install/logs/installaccess-201803130635kXW
  3. Verify that the plex states are set to NODEVICE/DISABLED.
    [root@fss7310_01 ~]# vxclustadm nidmap
    Name       CVM Nid CM Nid     State
    fss7310_01  2        0     Joined: Master
    fss7310_03  3        2     Joined: Slave
    fss7310_04  1        3     Joined: Slave
    
    [root@fss7310_01 ~]# vxprint -p | grep -i nodevice
    pl _nlm_-02            _nlm_            DISABLED   2097152  - NODEVICE - -
    pl _nlm__dcl-02        _nlm__dcl        DISABLED   67840    - NODEVICE - -
    pl test1_tier1-P02     test1_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test1_tier1-P04     test1_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test1_tier1-P06     test1_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test1_tier1_dcl-02  test1_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test2_tier1-P02     test2_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test2_tier1-P04     test2_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test2_tier1-P06     test2_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test2_tier1_dcl-02  test2_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test3_tier1-P02     test3_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test3_tier1-P04     test3_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test3_tier1-P06     test3_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test3_tier1_dcl-02  test3_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test4_tier1-P02     test4_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test4_tier1-P04     test4_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test4_tier1-P06     test4_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test4_tier1_dcl-02  test4_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test5_tier1-P02     test5_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test5_tier1-P04     test5_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test5_tier1-P06     test5_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test5_tier1_dcl-02  test5_tier1_dcl  DISABLED   67840    - NODEVICE - -
    
    [root@fss7310_01 ~]# vxdisk list | grep "failed was:"
    - - emc0_2256 sfsdg failed was:emc0_2256
    - - emc0_2264 sfsdg failed was:emc0_2264
    - - emc0_2272 sfsdg failed was:emc0_2272
    - - emc0_2280 sfsdg failed was:emc0_2280
    - - emc0_2288 sfsdg failed was:emc0_2288
    - - emc0_2296 sfsdg failed was:emc0_2296
    - - emc0_2304 sfsdg failed was:emc0_2304
    - - emc0_2312 sfsdg failed was:emc0_2312
    - - emc0_2320 sfsdg failed was:emc0_2320
    - - emc0_2328 sfsdg failed was:emc0_2328
    - - emc0_2336 sfsdg failed was:emc0_2336
    - - emc0_2344 sfsdg failed was:emc0_2344
    - - emc0_2352 sfsdg failed was:emc0_2352
    - - emc0_2360 sfsdg failed was:emc0_2360
  4. Remove the affected mirrors for the volumes that are present on the system.
    [root@fss7310_01 ~]# vxplex -f -g sfsdg -o rm dis test1_tier1-P02
    [root@fss7310_01 ~]# for i in `vxprint -p | grep -i NODEVICE | awk '{print $2}'`
    > do
    > echo "vxplex -f -g sfsdg -o rm dis $i"
    > vxplex -f -g sfsdg -o rm dis $i
    > done
    vxplex -f -g sfsdg -o rm dis _nlm_-02
    vxplex -f -g sfsdg -o rm dis _nlm__dcl-02
    vxplex -f -g sfsdg -o rm dis test1_tier1-P04
    vxplex -f -g sfsdg -o rm dis test1_tier1-P06
    vxplex -f -g sfsdg -o rm dis test1_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test2_tier1-P02
    vxplex -f -g sfsdg -o rm dis test2_tier1-P04
    vxplex -f -g sfsdg -o rm dis test2_tier1-P06
    vxplex -f -g sfsdg -o rm dis test2_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test3_tier1-P02
    vxplex -f -g sfsdg -o rm dis test3_tier1-P04
    vxplex -f -g sfsdg -o rm dis test3_tier1-P06
    vxplex -f -g sfsdg -o rm dis test3_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test4_tier1-P02
    vxplex -f -g sfsdg -o rm dis test4_tier1-P04
    vxplex -f -g sfsdg -o rm dis test4_tier1-P06
    vxplex -f -g sfsdg -o rm dis test4_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test5_tier1-P02
    vxplex -f -g sfsdg -o rm dis test5_tier1-P04
    vxplex -f -g sfsdg -o rm dis test5_tier1-P06
    vxplex -f -g sfsdg -o rm dis test5_tier1_dcl-02
    
    [root@fss7310_01 ~]# vxprint -p
    Disk group: sfsdg
    
    TY NAME                 ASSOC           KSTATE  LENGTH   PLOFFS  STATE 
    TUTIL0 PUTIL0
    pl _nlm_-01             _nlm_           ENABLED 2097152  -       ACTIVE 
    - -
    pl _nlm__dcl-01         _nlm__dcl       ENABLED 67840    -       ACTIVE 
    - -
    pl test1_tier1-P01      test1_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-P03      test1_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-P05      test1_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-03       test1_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test1_tier1_dcl-01   test1_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test2_tier1-P01      test2_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-P03      test2_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-P05      test2_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-03       test2_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test2_tier1_dcl-01   test2_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test3_tier1-P01      test3_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-P03      test3_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-P05      test3_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-03       test3_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test3_tier1_dcl-01   test3_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test4_tier1-P01      test4_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-P03      test4_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-P05      test4_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-03       test4_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test4_tier1_dcl-01   test4_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test5_tier1-P01      test5_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-P03      test5_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-P05      test5_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-03       test5_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test5_tier1_dcl-01   test5_tier1_dcl ENABLED 67840    -       ACTIVE
    - -
  5. Remove the affected disks from the disk group by using the vxdg rmdisk command and from all the nodes in the cluster by using the vxdisk rm command.
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2288
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2272
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2280
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2296
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2304
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2312
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2320
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2328
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2336
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2344
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2352
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2360
    [root@fss7310_01 bin]# for i in `vxdisk list | grep -i error | awk '{print $1}'`; 
    do vxdisk rm $i; done
    [root@fss7310_03 ~]# for i in `vxdisk list | grep -i error | awk '{print $1}'`; 
    do vxdisk rm $i; done
    [root@fss7310_04 ~]# for i in `vxdisk list | grep -i error | awk '{print $1}'`; 
    do vxdisk rm $i; done
  6. Run the addnode command for the cluster by using IP.
  7. Add the disks from the newly added node in the pool that is already present.
    [root@fss7310_01 scripts]# /opt/VRTSnas/clish/bin/clish -u admin -c 
    "storage disk format emc0_2257,emc0_2265,emc0_2273,emc0_2281,emc0_2289,emc0_2297,emc0_2305,
    emc0_2313,emc0_2321,emc0_2329,emc0_2337,emc0_2345,emc0_2353,emc0_2361"
    
    You may lose all the data on the disk, do you want to continue (y/n, the default is n):y
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2257 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2265 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2273 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2281 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2289 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2297 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2305 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2313 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2321 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2329 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2337 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2345 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2353 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2361 has been formatted successfully.
    
    [root@fss7310_01 scripts]# /opt/VRTSnas/clish/bin/clish -u admin -c "storage pool 
    adddisk pool1 emc0_2257,emc0_2265,emc0_2273,emc0_2281,emc0_2289,emc0_2297,emc0_2305,
    emc0_2313,emc0_2321,emc0_2329,emc0_2337,emc0_2345,emc0_2353,emc0_2361"
    
    ACCESS Pool SUCCESS V-493-10-2914 Successfully added disks to pool
  8. Mirror the volume by using the storage addmirror command.
    fss7310> storage fs list
    FS    STATUS  SIZE   LAYOUT   MIRRORS COLUMNS USE% USED   NFS     CIFS    FTP     SECONDARY
                                                              SHARED  SHARED  SHARED  TIER
    ===== ======  ====   =======  ======= ======= ==== ====   ======  ======  ======  =========
    test1 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test2 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test3 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test4 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test5 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    
    fss7310> storage fs addmirror test1 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test1
    fss7310> storage fs addmirror test2 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test2
    fss7310> storage fs addmirror test3 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test3
    fss7310> storage fs addmirror test4 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test4
  9. Mirror the _nlm_ volume by using the vxassist mirror command.
    [root@fss7310_01 bin]# vxassist -b -g sfsdg mirror _nlm_
    
    [root@fss7310_01 bin]# vxprint _nlm_
    Disk group: sfsdg
     
    TY NAME          ASSOC        KSTATE  LENGTH  PLOFFS  STATE  
    TUTIL0  PUTIL0
    v _nlm_          fsgen        ENABLED 2097152   -     ACTIVE 
    ATT1 -
    pl _nlm_-01      _nlm_        ENABLED 2097152   -     ACTIVE 
    - -
    sd emc0_2255-01  _nlm_-01     ENABLED 2097152   0     - 
    - -
    pl _nlm_-02      _nlm_        ENABLED 2097152   -    
    TEMPRMSD ATT -
    sd emc0_2257-01  _nlm_-02     ENABLED 2097152   0     - 
    - -
    dc _nlm__dco     _nlm_        -       -         -     - 
    - -
    v _nlm__dcl      gen          ENABLED 67840     -     ACTIVE 
    - -
    pl _nlm__dcl-01  _nlm__dcl    ENABLED 67840     -     ACTIVE 
    - -
    sd emc0_2255-02  _nlm__dcl-01 ENABLED 67840     0     - 
    - -
    sp _nlm__cpmap   _nlm_        -       -         -     - 
    - -