Veritas Access 故障排除指南

Last Published:
Product(s): Access (7.4)
Platform: Linux
  1. 简介
    1.  
      关于故障排除
    2.  
      故障排除过程的通用技巧
    3.  
      故障排除过程的一般方法
    4.  
      关于 support 用户帐户
    5.  
      配置 support 用户帐户
    6.  
      使用 support 帐户登录
  2. 常规故障排除过程
    1.  
      关于常规故障排除过程
    2.  
      查看 Veritas Access 日志文件
    3.  
      关于事件日志
    4.  
      关于 shell 活动日志
    5.  
      设置 CIFS 日志级别
    6.  
      设置 NetBackup 客户端日志级别和调试选项
    7.  
      检索并发送调试信息
    8.  
      两个连续 OpenStack 命令之间的延迟不足可能会导致失败
  3. 监视 Veritas Access
    1.  
      关于监视 Veritas Access 操作
    2.  
      监视处理器活动
    3.  
      生成 CPU 和设备利用率报告
    4.  
      监视网络通信
    5.  
      导出和显示网络通信详细信息
  4. 常见恢复过程
    1.  
      关于常见的恢复过程
    2.  
      重新启动服务器
    3. 使服务联机
      1.  
        使用 services 命令
    4.  
      从非正常关闭中恢复
    5.  
      测试网络连接
    6.  
      使用 traceroute 进行故障排除
    7.  
      使用 traceroute 命令
    8.  
      收集文件系统的元数据保存映像
    9.  
      更换以太网接口卡(联机模式)
    10.  
      更换以太网接口卡(脱机模式)
    11.  
      更换 Veritas Access 节点
    12.  
      更换磁盘
    13. 加速复制
      1.  
        关于同步复制作业
      2.  
        同步间歇性复制作业
    14.  
      卸载修补程序版本或软件升级
  5. 对“将 Veritas Access 云作为层”功能进行故障排除
    1.  
      云分层的故障排除技巧
    2.  
      在云层读取或写入数据时出现问题
    3.  
      用于检查云分层错误的日志位置
  6. 对 Veritas Access 安装和配置问题进行故障排除
    1.  
      如何查找管理控制台 IP
    2.  
      查看安装日志
    3.  
      安装失败且未完成
    4.  
      从群集中排除 PCI ID
    5.  
      无法从 root 文件系统损坏中恢复
    6.  
      storage disk list 命令不返回任何结果
  7. 对 LTR 升级进行故障排除
    1.  
      查找日志文件以对 LTR 升级进行故障排除
    2.  
      对 LTR 的升级前问题进行故障排除
    3.  
      对 LTR 的升级后问题进行故障排除
  8. 对 Veritas Access CIFS 问题进行故障排除
    1.  
      拒绝用户访问 CTDB 目录共享
  9. 对 Veritas Access GUI 启动问题进行故障排除
    1.  
      解决 GUI 启动问题

更换 Veritas Access 节点

在某些情况下,您可能需要更换 Veritas Access 节点。本节介绍更换 Veritas Access 节点的步骤。

更换 Veritas Access 节点

  1. 从群集中删除节点之前,请确保未删除主节点。要删除主节点,您需要通过将管理控制台切换到其他节点来切换主节点。
  2. 如果不希望触发热重定位,请从主节点将以下可调参数设置为 -1。
    #vxtune node_reloc_timeout -1
    

    注意:

    设置 node_reloc_timeout 后,storage_reloc_timeout 会自动设置为 -1。

  3. 对要更换的节点运行 cluster del 命令。
    fss7310>cluster del fss7310_02
  4. 验证所有 plex 是否都处于 NODEVICE/DISABLED 状态。可以使用 #vxprint -p 命令检查 plex 状态。
  5. 运行以下命令,以分离卷的 plex:
    # vxplex -f -g <dg-name> -o rm dis <plex-name>
  6. 使用 vxdg rmdisk 命令从磁盘组中删除处于 failed was: 状态的所有磁盘。此命令需要从主节点运行。
    #vxdg –g <dg-name> rmdisk <disk-name>
  7. 从群集中的所有节点对删除的磁盘运行 vxdisk rm 命令。
    #vxdisk rm <disk-name>

    注意:

    需要从群集中的所有节点对所有磁盘运行此命令。

  8. 在删除所有禁用的 plex 后,使用以下命令在群集中添加新节点:
    fss7310>cluster add <node-ip>
  9. 从主节点对新添加的节点中的所有磁盘运行 storage disk format 命令。
    fss7310>storage disk format <list-of-disks>
  10. 使用 storage pool adddisk 命令,将新添加的节点中的所有磁盘添加到创建的 Veritas Access 池中。
    fss7310> storage pool adddisk pool1 <list-of-devices>
  11. 运行 storage fs addmirror 命令,以镜像文件系统。
    fss7310> storage fs addmirror <fs-name> <pool-name>
  12. 此外,还要运行 vxassist 命令以镜像 _nlm_ 卷。
    #vxassist –b –g <dg-name> mirror _nlm_

示例:更换 Veritas Access 节点

更换 Veritas Access 节点

  1. 更改 vxtune 可调参数的值,以禁用热重定位:
    # vxtune node_reloc_timeout -1
  2. 运行以下命令,以从群集中删除节点。
    fss7310> cluster del fss7310_02
    
    Veritas Access 7.4 Delete Node Program
    
    fss7310_02
    
    Copyright (c) 2017 Veritas Technologies LLC. All rights reserved. ​Veritas and the 
    Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC 
    or its affiliates in the U.S. and other countries. Other names may be trademarks 
    of their respective owners.
    
    The Licensed Software and Documentation are deemed to be "commercial computer software"​ 
    and "commercial computer ​software documentation" as defined in FAR Sections 12.212 
    and DFARS Section 227.7202.
    
    Logs are being written to /var/tmp/installaccess-201803130635kXW ​while installaccess 
    is​ in progress.
    
    Veritas Access 7.4 Delete Node Program
    fss7310_02
    Checking communication on fss7310_01 ........................................... Done
    Checking communication on fss7310_02 ........................................... Done
    Checking communication on fss7310_03 ........................................... Done
    Checking communication on fss7310_04 ........................................... Done
    Checking VCS running state on fss7310_01 ....................................... Done
    Checking VCS running state on fss7310_02 ....................................... Done
    Checking VCS running state on fss7310_03 ....................................... Done
    Checking VCS running state on fss7310_04 ....................................... Done
    The following changes will be made on the cluster:
    Failover service group VIPgroup4 will be switched to fss7310_01
    
    Switching failover service group(s) ............................................ Done
    Waiting for service group(s) to come online on the other sub-cluster ........... Done
    All the online failover service group(s) that can be switched have been switched to​​ ​​​
    the other sub-cluster.
    The following parallel service group(s) in the sub-cluster will be offline:
    fss7310_02: CanHostConsole CanHostNLM Phantomgroup_pubeth0 ReconfigGroup cvm iSCSI_INIT​​ 
    vrts_vea_cfs_int_cfsmount1 vrts_vea_cfs_int_cfsmount2 vrts_vea_cfs_int_cfsmount3
    vrts_vea_cfs_int_cfsmount4 vrts_vea_cfs_int_cfsmount5 vrts_vea_cfs_int_cfsmount6
    Offline parallel service group(s) .............................................. Done
    Waiting for service group(s) to be taken offline on the sub-cluster ............ Done
    Stopping VCS on fss7310_02 ..................................................... Done
    Stopping Fencing on fss7310_02 ................................................. Done
    Stopping gab on fss7310_02 ..................................................... Done
    Stopping llt on fss7310_02 ..................................................... Done
    Clean up deleted nodes information on the cluster .............................. Done
    Clean up deleted nodes ......................................................... Done
    Delete node completed successfully
    installaccess log files and summary file are saved at:
    /opt/VRTS/install/logs/installaccess-201803130635kXW
  3. 验证 plex 状态是否设置为 NODEVICE/DISABLED。
    [root@fss7310_01 ~]# vxclustadm nidmap
    Name       CVM Nid CM Nid     State
    fss7310_01  2        0     Joined: Master
    fss7310_03  3        2     Joined: Slave
    fss7310_04  1        3     Joined: Slave
    
    [root@fss7310_01 ~]# vxprint -p | grep -i nodevice
    pl _nlm_-02            _nlm_            DISABLED   2097152  - NODEVICE - -
    pl _nlm__dcl-02        _nlm__dcl        DISABLED   67840    - NODEVICE - -
    pl test1_tier1-P02     test1_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test1_tier1-P04     test1_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test1_tier1-P06     test1_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test1_tier1_dcl-02  test1_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test2_tier1-P02     test2_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test2_tier1-P04     test2_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test2_tier1-P06     test2_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test2_tier1_dcl-02  test2_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test3_tier1-P02     test3_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test3_tier1-P04     test3_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test3_tier1-P06     test3_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test3_tier1_dcl-02  test3_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test4_tier1-P02     test4_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test4_tier1-P04     test4_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test4_tier1-P06     test4_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test4_tier1_dcl-02  test4_tier1_dcl  DISABLED   67840    - NODEVICE - -
    pl test5_tier1-P02     test5_tier1-L01  DISABLED   699392   - NODEVICE - -
    pl test5_tier1-P04     test5_tier1-L02  DISABLED   699392   - NODEVICE - -
    pl test5_tier1-P06     test5_tier1-L03  DISABLED   699392   - NODEVICE - -
    pl test5_tier1_dcl-02  test5_tier1_dcl  DISABLED   67840    - NODEVICE - -
    
    [root@fss7310_01 ~]# vxdisk list | grep "failed was:"
    - - emc0_2256 sfsdg failed was:emc0_2256
    - - emc0_2264 sfsdg failed was:emc0_2264
    - - emc0_2272 sfsdg failed was:emc0_2272
    - - emc0_2280 sfsdg failed was:emc0_2280
    - - emc0_2288 sfsdg failed was:emc0_2288
    - - emc0_2296 sfsdg failed was:emc0_2296
    - - emc0_2304 sfsdg failed was:emc0_2304
    - - emc0_2312 sfsdg failed was:emc0_2312
    - - emc0_2320 sfsdg failed was:emc0_2320
    - - emc0_2328 sfsdg failed was:emc0_2328
    - - emc0_2336 sfsdg failed was:emc0_2336
    - - emc0_2344 sfsdg failed was:emc0_2344
    - - emc0_2352 sfsdg failed was:emc0_2352
    - - emc0_2360 sfsdg failed was:emc0_2360
  4. 删除系统上存在的卷的受影响的镜像。
    [root@fss7310_01 ~]# vxplex -f -g sfsdg -o rm dis test1_tier1-P02
    [root@fss7310_01 ~]# for i in `vxprint -p | grep -i NODEVICE | awk '{print $2}'`
    > do
    > echo "vxplex -f -g sfsdg -o rm dis $i"
    > vxplex -f -g sfsdg -o rm dis $i
    > done
    vxplex -f -g sfsdg -o rm dis _nlm_-02
    vxplex -f -g sfsdg -o rm dis _nlm__dcl-02
    vxplex -f -g sfsdg -o rm dis test1_tier1-P04
    vxplex -f -g sfsdg -o rm dis test1_tier1-P06
    vxplex -f -g sfsdg -o rm dis test1_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test2_tier1-P02
    vxplex -f -g sfsdg -o rm dis test2_tier1-P04
    vxplex -f -g sfsdg -o rm dis test2_tier1-P06
    vxplex -f -g sfsdg -o rm dis test2_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test3_tier1-P02
    vxplex -f -g sfsdg -o rm dis test3_tier1-P04
    vxplex -f -g sfsdg -o rm dis test3_tier1-P06
    vxplex -f -g sfsdg -o rm dis test3_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test4_tier1-P02
    vxplex -f -g sfsdg -o rm dis test4_tier1-P04
    vxplex -f -g sfsdg -o rm dis test4_tier1-P06
    vxplex -f -g sfsdg -o rm dis test4_tier1_dcl-02
    vxplex -f -g sfsdg -o rm dis test5_tier1-P02
    vxplex -f -g sfsdg -o rm dis test5_tier1-P04
    vxplex -f -g sfsdg -o rm dis test5_tier1-P06
    vxplex -f -g sfsdg -o rm dis test5_tier1_dcl-02
    
    [root@fss7310_01 ~]# vxprint -p
    Disk group: sfsdg
    
    TY NAME                 ASSOC           KSTATE  LENGTH   PLOFFS  STATE 
    TUTIL0 PUTIL0
    pl _nlm_-01             _nlm_           ENABLED 2097152  -       ACTIVE 
    - -
    pl _nlm__dcl-01         _nlm__dcl       ENABLED 67840    -       ACTIVE 
    - -
    pl test1_tier1-P01      test1_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-P03      test1_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-P05      test1_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test1_tier1-03       test1_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test1_tier1_dcl-01   test1_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test2_tier1-P01      test2_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-P03      test2_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-P05      test2_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test2_tier1-03       test2_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test2_tier1_dcl-01   test2_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test3_tier1-P01      test3_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-P03      test3_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-P05      test3_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test3_tier1-03       test3_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test3_tier1_dcl-01   test3_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test4_tier1-P01      test4_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-P03      test4_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-P05      test4_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test4_tier1-03       test4_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test4_tier1_dcl-01   test4_tier1_dcl ENABLED 67840    -       ACTIVE 
    - -
    pl test5_tier1-P01      test5_tier1-L01 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-P03      test5_tier1-L02 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-P05      test5_tier1-L03 ENABLED 699392   -       ACTIVE 
    - -
    pl test5_tier1-03       test5_tier1     ENABLED 2098176  -       ACTIVE 
    - -
    pl test5_tier1_dcl-01   test5_tier1_dcl ENABLED 67840    -       ACTIVE
    - -
  5. 使用 vxdg rmdisk 命令从磁盘组中删除受影响的磁盘,并使用 vxdisk rm 命令从群集中的所有节点删除受影响的磁盘。
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2288
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2272
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2280
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2296
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2304
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2312
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2320
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2328
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2336
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2344
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2352
    [root@fss7310_01 bin]# vxdg -g sfsdg rmdisk emc0_2360
    [root@fss7310_01 bin]# for i in `vxdisk list | grep -i error | ​awk '{print $1}'`; 
    do vxdisk rm $i; done
    [root@fss7310_03 ~]# for i in `vxdisk list | grep -i error | ​awk '{print $1}'`; 
    do vxdisk rm $i; done
    [root@fss7310_04 ~]# for i in `vxdisk list | grep -i error | ​awk '{print $1}'`; 
    do vxdisk rm $i; done
  6. 使用 IP 对群集运行 addnode 命令。
  7. 将新添加的节点中的磁盘添加到已存在的池中。
    [root@fss7310_01 scripts]# /opt/VRTSnas/clish/bin/clish -u master -c 
    "storage disk ​format emc0_2257,emc0_2265,emc0_2273,emc0_2281,emc0_2289,emc0_2297,emc0_2305,
    emc0_2313,emc0_2321,emc0_2329,emc0_2337,emc0_2345,emc0_2353,emc0_2361"
    
    You may lose all the data on the disk, do you want to continue (y/n, the default is n):y
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2257 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2265 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2273 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2281 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2289 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2297 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2305 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2313 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2321 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2329 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2337 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2345 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2353 has been formatted successfully.
    ACCESS Disk SUCCESS V-493-10-4 disk format: emc0_2361 has been formatted successfully.
    
    [root@fss7310_01 scripts]# /opt/VRTSnas/clish/bin/clish -u master -c "storage pool 
    adddisk pool1 ​emc0_2257,emc0_2265,emc0_2273,emc0_2281,emc0_2289,emc0_2297,emc0_2305,
    emc0_2313,emc0_2321,emc0_2329,emc0_2337,emc0_2345,emc0_2353,emc0_2361"
    
    ACCESS Pool SUCCESS V-493-10-2914 Successfully added disks to pool
  8. 使用 storage addmirror 命令镜像卷。
    fss7310> storage fs list
    FS    STATUS  SIZE   LAYOUT   MIRRORS COLUMNS USE% USED   NFS     CIFS    FTP     SECONDARY
                                                              SHARED  SHARED  SHARED  TIER
    ===== ======  ====   =======  ======= ======= ==== ====   ======  ======  ======  =========
    test1 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test2 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test3 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test4 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    test5 online  1.00G  striped     1       3    10%  103M     no      no      no       no
    
    fss7310> storage fs addmirror test1 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test1
    fss7310> storage fs addmirror test2 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test2
    fss7310> storage fs addmirror test3 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test3
    fss7310> storage fs addmirror test4 pool1
    100% [#] Adding mirror to filesystem
    ACCESS fs SUCCESS V-493-10-2131 Added mirror for fs test4
  9. 使用 vxassist mirror 命令镜像 _nlm_ 卷。
    [root@fss7310_01 bin]# vxassist -b -g sfsdg mirror _nlm_
    
    [root@fss7310_01 bin]# vxprint _nlm_
    Disk group: sfsdg
     
    TY NAME          ASSOC        KSTATE  LENGTH  PLOFFS  STATE  
    TUTIL0  PUTIL0
    v _nlm_          fsgen        ENABLED 2097152   -     ACTIVE 
    ATT1 -
    pl _nlm_-01      _nlm_        ENABLED 2097152   -     ACTIVE 
    - -
    sd emc0_2255-01  _nlm_-01     ENABLED 2097152   0     - 
    - -
    pl _nlm_-02      _nlm_        ENABLED 2097152   -    
    TEMPRMSD ATT -
    sd emc0_2257-01  _nlm_-02     ENABLED 2097152   0     - 
    - -
    dc _nlm__dco     _nlm_        -       -         -     - 
    - -
    v _nlm__dcl      gen          ENABLED 67840     -     ACTIVE 
    - -
    pl _nlm__dcl-01  _nlm__dcl    ENABLED 67840     -     ACTIVE 
    - -
    sd emc0_2255-02  _nlm__dcl-01 ENABLED 67840     0     - 
    - -
    sp _nlm__cpmap   _nlm_        -       -         -     - 
    - -