Veritas InfoScale™ 7.3.1 Troubleshooting Guide - Solaris

Last Published:
Product(s): InfoScale & Storage Foundation (7.3.1)
Platform: Solaris
  1. Introduction
    1.  
      About troubleshooting Veritas InfoScale Storage Foundation and High Availability Solutions products
    2.  
      About Veritas Services and Operations Readiness Tools (SORT)
    3. About unique message identifiers
      1.  
        Using Veritas Operations Readiness Tools to find a Unique Message Identifier description and solution
    4. About collecting application and daemon core data for debugging
      1.  
        Letting vxgetcore find debugging data automatically (the easiest method)
      2.  
        Running vxgetcore when you know the location of the core file
      3.  
        Letting vxgetcore prompt you for information
  2. Section I. Troubleshooting Veritas File System
    1. Diagnostic messages
      1. File system response to problems
        1.  
          Recovering a disabled file system
      2.  
        About kernel messages
  3. Section II. Troubleshooting Veritas Volume Manager
    1. Recovering from hardware failure
      1.  
        About recovery from hardware failure
      2.  
        Listing unstartable volumes
      3.  
        Displaying volume and plex states
      4.  
        The plex state cycle
      5.  
        Recovering an unstartable mirrored volume
      6.  
        Recovering an unstartable volume with a disabled plex in the RECOVER state
      7.  
        Forcibly restarting a disabled volume
      8.  
        Clearing the failing flag on a disk
      9.  
        Reattaching failed disks
      10.  
        Recovering from a failed plex attach or synchronization operation
      11. Failures on RAID-5 volumes
        1.  
          System failures
        2.  
          Disk failures
        3.  
          Default startup recovery process for RAID-5
        4. Recovery of RAID-5 volumes
          1.  
            Resynchronizing parity on a RAID-5 volume
          2.  
            Reattaching a failed RAID-5 log plex
          3.  
            Recovering a stale subdisk in a RAID-5 volume
        5.  
          Recovery after moving RAID-5 subdisks
        6. Unstartable RAID-5 volumes
          1.  
            Forcibly starting a RAID-5 volume with stale subdisks
      12.  
        Recovering from an incomplete disk group move
      13.  
        Restarting volumes after recovery when some nodes in the cluster become unavailable
      14. Recovery from failure of a DCO volume
        1.  
          Recovering a version 0 DCO volume
        2.  
          Recovering an instant snap DCO volume (version 20 or later)
    2. Recovering from instant snapshot failure
      1.  
        Recovering from the failure of vxsnap prepare
      2.  
        Recovering from the failure of vxsnap make for full-sized instant snapshots
      3.  
        Recovering from the failure of vxsnap make for break-off instant snapshots
      4.  
        Recovering from the failure of vxsnap make for space-optimized instant snapshots
      5.  
        Recovering from the failure of vxsnap restore
      6.  
        Recovering from the failure of vxsnap refresh
      7.  
        Recovering from copy-on-write failure
      8.  
        Recovering from I/O errors during resynchronization
      9.  
        Recovering from I/O failure on a DCO volume
      10.  
        Recovering from failure of vxsnap upgrade of instant snap data change objects (DCOs)
    3. Recovering from failed vxresize operation
      1.  
        Recovering from a failed vxresize shrink operation
    4. Recovering from boot disk failure
      1.  
        VxVM and boot disk failure
      2.  
        Possible root, swap, and usr configurations
      3.  
        Booting from an alternate boot disk on Solaris SPARC systems
      4.  
        The boot process on Solaris SPARC systems
      5. Hot-relocation and boot disk failure
        1.  
          Unrelocation of subdisks to a replacement boot disk
      6. Recovery from boot failure
        1.  
          Boot device cannot be opened
        2.  
          Cannot boot from unusable or stale plexes
        3.  
          Invalid UNIX partition
        4. Incorrect entries in /etc/vfstab
          1.  
            Damaged root (/) entry in /etc/vfstab
          2.  
            Damaged /usr entry in /etc/vfstab
        5. Missing or damaged configuration files
          1.  
            Restoring a copy of the system configuration file
          2.  
            Restoring /etc/system if a copy is not available on the root disk
      7. Repair of root or /usr file systems on mirrored volumes
        1.  
          Recovering a root disk and root mirror from a backup
      8. Replacement of boot disks
        1.  
          Re-adding a failed boot disk
        2.  
          Replacing a failed boot disk
      9. Recovery by reinstallation
        1.  
          General reinstallation information
        2. Reinstalling the system and recovering VxVM
          1.  
            Prepare the system for reinstallation
          2.  
            Reinstall the operating system
          3.  
            Reinstalling Veritas Volume Manager
          4.  
            Recovering the Veritas Volume Manager configuration
          5.  
            Cleaning up the system configuration
    5. Managing commands, tasks, and transactions
      1.  
        Command logs
      2.  
        Task logs
      3.  
        Transaction logs
      4.  
        Association of command, task, and transaction logs
      5.  
        Associating CVM commands issued from slave to master node
      6.  
        Command completion is not enabled
    6. Backing up and restoring disk group configurations
      1.  
        About disk group configuration backup
      2.  
        Backing up a disk group configuration
      3. Restoring a disk group configuration
        1.  
          Resolving conflicting backups for a disk group
      4.  
        Backing up and restoring Flexible Storage Sharing disk group configuration data
    7. Troubleshooting issues with importing disk groups
      1.  
        Clearing the udid_mismatch flag for non-clone disks
    8. Recovering from CDS errors
      1.  
        CDS error codes and recovery actions
    9. Logging and error messages
      1.  
        About error messages
      2. How error messages are logged
        1.  
          Configuring logging in the startup script
      3. Types of messages
        1.  
          Messages
      4. Using VxLogger for kernel-level logging
        1.  
          Configuring tunable settings for kernel-level logging
      5.  
        Collecting log information for troubleshooting
    10. Troubleshooting Veritas Volume Replicator
      1.  
        Recovery from RLINK connect problems
      2. Recovery from configuration errors
        1. Errors during an RLINK attach
          1.  
            Data volume errors during an RLINK attach
          2.  
            Volume set errors during an RLINK attach
        2. Errors during modification of an RVG
          1.  
            Missing data volume error during modifcation of an RVG
          2.  
            Data volume mismatch error during modification of an RVG
          3.  
            Data volume name mismatch error during modification of an RVG
          4. Volume set configuration errors during modification of an RVG
            1.  
              Volume set name mismatch error
            2.  
              Volume index mismatch error
            3.  
              Component volume mismatch error
      3. Recovery on the Primary or Secondary
        1.  
          About recovery from a Primary-host crash
        2. Recovering from Primary data volume error
          1.  
            Example - Recovery with detached RLINKs
          2.  
            Example - Recovery with minimal repair
          3.  
            Example - Recovery by migrating the primary
          4.  
            Example - Recovery from temporary I/O error
        3. Primary SRL volume error cleanup and restart
          1.  
            About RVG PASSTHRU mode
        4.  
          Primary SRL volume error at reboot
        5.  
          Primary SRL volume overflow recovery
        6. Primary SRL header error cleanup and recovery
          1.  
            Recovering from SRL header error
        7. Secondary data volume error cleanup and recovery
          1.  
            Recovery using a Secondary Storage Checkpoint
          2.  
            Cleanup using a Primary Storage Checkpoint
        8.  
          Secondary SRL volume error cleanup and recovery
        9.  
          Secondary SRL header error cleanup and recovery
        10.  
          Secondary SRL header error at reboot
    11. Troubleshooting issues in cloud deployments
      1.  
        In an Azure environment, exporting a disk for Flexible Storage Sharing (FSS) may fail with "Disk not supported for FSS operation" error
  4. Section III. Troubleshooting Dynamic Multi-Pathing
    1. Dynamic Multi-Pathing troubleshooting
      1.  
        Displaying extended attributes after upgrading to DMP
      2.  
        Recovering from errors when you exclude or include paths to DMP
      3.  
        Downgrading the array support
      4.  
        System un-bootable after turning on dmp_native_support tunable
  5. Section IV. Troubleshooting Storage Foundation Cluster File System High Availability
    1. Troubleshooting Storage Foundation Cluster File System High Availability
      1.  
        About troubleshooting Storage Foundation Cluster File System High Availability
      2. Troubleshooting CFS
        1.  
          Incorrect order in root user's <library> path
        2.  
          CFS commands might hang when run by a non-root user
      3. Troubleshooting fenced configurations
        1.  
          Example of a preexisting network partition (split-brain)
        2. Recovering from a preexisting network partition (split-brain)
          1.  
            Example Scenario I
          2.  
            Example Scenario II
          3.  
            Example Scenario III
      4. Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters
        1.  
          CVM group is not online after adding a node to the Veritas InfoScale products cluster
        2.  
          Shared disk group cannot be imported in Veritas InfoScale products cluster
        3.  
          Unable to start CVM in Veritas InfoScale products cluster
        4.  
          Removing preexisting keys
        5.  
          CVMVolDg not online even though CVMCluster is online in Veritas InfoScale products cluster
        6.  
          Shared disks not visible in Veritas InfoScale products cluster
  6. Section V. Troubleshooting Cluster Server
    1. Troubleshooting and recovery for VCS
      1. VCS message logging
        1.  
          Log unification of VCS agent's entry points
        2.  
          Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
        3.  
          GAB message logging
        4.  
          Enabling debug logs for agents
        5.  
          Enabling debug logs for IMF
        6.  
          Enabling debug logs for the VCS engine
        7.  
          About debug log tags usage
        8. Gathering VCS information for support analysis
          1.  
            Verifying the metered or forecasted values for CPU, Mem, and Swap
        9.  
          Gathering LLT and GAB information for support analysis
        10.  
          Gathering IMF information for support analysis
        11.  
          Message catalogs
      2. Troubleshooting the VCS engine
        1.  
          HAD diagnostics
        2.  
          HAD is not running
        3.  
          HAD restarts continuously
        4.  
          DNS configuration issues cause GAB to kill HAD
        5.  
          Seeding and I/O fencing
        6.  
          Preonline IP check
      3. Troubleshooting Low Latency Transport (LLT)
        1.  
          LLT startup script displays errors
        2.  
          LLT detects cross links usage
        3.  
          LLT link status messages
        4.  
          Unexpected db_type warning while stopping LLT that is configured over UDP
      4. Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
        1.  
          Delay in port reopen
        2.  
          Node panics due to client process failure
      5. Troubleshooting VCS startup
        1.  
          "VCS: 10622 local configuration missing" and "VCS: 10623 local configuration invalid"
        2.  
          "VCS:11032 registration failed. Exiting"
        3.  
          "Waiting for cluster membership."
      6.  
        Troubleshooting Intelligent Monitoring Framework (IMF)
      7. Troubleshooting service groups
        1.  
          VCS does not automatically start service group
        2.  
          System is not in RUNNING state
        3.  
          Service group not configured to run on the system
        4.  
          Service group not configured to autostart
        5.  
          Service group is frozen
        6.  
          Failover service group is online on another system
        7.  
          A critical resource faulted
        8.  
          Service group autodisabled
        9.  
          Service group is waiting for the resource to be brought online/taken offline
        10.  
          Service group is waiting for a dependency to be met.
        11.  
          Service group not fully probed.
        12.  
          Service group does not fail over to the forecasted system
        13.  
          Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
        14.  
          Restoring metering database from backup taken by VCS
        15.  
          Initialization of metering database fails
      8. Troubleshooting resources
        1.  
          Service group brought online due to failover
        2.  
          Waiting for service group states
        3.  
          Waiting for child resources
        4.  
          Waiting for parent resources
        5.  
          Waiting for resource to respond
        6. Agent not running
          1.  
            Invalid agent argument list.
        7.  
          The Monitor entry point of the disk group agent returns ONLINE even if the disk group is disabled
      9. Troubleshooting I/O fencing
        1.  
          Node is unable to join cluster while another node is being ejected
        2.  
          The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
        3.  
          Manually removing existing keys from SCSI-3 disks
        4. System panics to prevent potential data corruption
          1.  
            How I/O fencing works in different event scenarios
        5.  
          Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
        6. Fencing startup reports preexisting split-brain
          1.  
            Clearing preexisting split-brain condition
        7.  
          Registered keys are lost on the coordinator disks
        8.  
          Replacing defective disks when the cluster is offline
        9.  
          The vxfenswap utility exits if rcp or scp commands are not functional
        10. Troubleshooting CP server
          1.  
            Troubleshooting issues related to the CP server service group
          2.  
            Checking the connectivity of CP server
        11. Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
          1.  
            Issues during fencing startup on VCS nodes set up for server-based fencing
        12. Issues during online migration of coordination points
          1.  
            Vxfen service group activity after issuing the vxfenswap command
      10. Troubleshooting notification
        1.  
          Notifier is configured but traps are not seen on SNMP console.
      11. Troubleshooting and recovery for global clusters
        1.  
          Disaster declaration
        2.  
          Lost heartbeats and the inquiry mechanism
        3. VCS alerts
          1.  
            Types of alerts
          2.  
            Managing alerts
          3.  
            Actions associated with alerts
          4.  
            Negating events
          5.  
            Concurrency violation at startup
      12.  
        Troubleshooting the steward process
      13. Troubleshooting licensing
        1.  
          Validating license keys
        2. Licensing error messages
          1.  
            [Licensing] Insufficient memory to perform operation
          2.  
            [Licensing] No valid VCS license keys were found
          3.  
            [Licensing] Unable to find a valid base VCS license key
          4.  
            [Licensing] License key cannot be used on this OS platform
          5.  
            [Licensing] VCS evaluation period has expired
          6.  
            [Licensing] License key can not be used on this system
          7.  
            [Licensing] Unable to initialize the licensing framework
          8.  
            [Licensing] QuickStart is not supported in this release
          9.  
            [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
      14.  
        Verifying the metered or forecasted values for CPU, Mem, and Swap
  7. Section VI. Troubleshooting SFDB
    1. Troubleshooting SFDB
      1.  
        About troubleshooting Storage Foundation for Databases (SFDB) tools

Recovering a root disk and root mirror from a backup

This procedure assumes that you have the following resources available:

  • A listing of the partition table for the original root disk before you encapsulated it.

  • A current full backup of all the file systems on the original root disk that was under Veritas Volume Manager control. If the root file system is of type ufs, you can back it up using the ufsdump command.

    See the ufsdump(1M) manual page.

  • A new boot disk installed to replace the original failed boot disk if the original boot disk was physically damaged.

This procedure requires the reinstallation of the root disk. To prevent the loss of data on disks not involved in the reinstallation, only involve the root disk in the reinstallation procedure.

Several of the automatic options for installation access disks other than the root disk without requiring confirmation from the administrator. Therefore, disconnect all other disks containing volumes from the system prior to starting this procedure. This will ensure that these disks are unaffected by the reinstallation. Reconnect the disks after completing the procedure.

The procedure assumes the device name of the new root disk to be c0t0d0s2, and that you need to recover both the root (/) file system on partition s0, and the /usr file system on partition s6.

To recover a root disk and root mirror from a backup

  1. Boot the operating system into single-user mode from its installation CD-ROM using the following command at the boot prompt:
    ok boot cdrom -s
  2. Use the format command to create partitions on the new root disk (c0t0d0s2). These should be identical in size to those on the original root disk before encapsulation unless you are using this procedure to change their sizes. If you change the size of the partitions, ensure that they are large enough to store the data that is restored to them.

    See the format(1M) manual page.

    A maximum of five partitions may be created for file systems or swap areas as encapsulation reserves two partitions for Veritas Volume Manager private and public regions.

  3. Use the mkfs command to make new file systems on the root and usr partitions. For example, to make a ufs file system on the root partition, enter:
    # mkfs -F ufs /dev/rdsk/c0t0d0s0
    

    See the mkfs(1M) manual page.

    See the mkfs_ufs(1M) manual page.

  4. Mount/dev/dsk/c0t0d0s0 on a suitable mount point such as /a or /mnt:
    # mount /dev/dsk/c0t0d0s0 /a
  5. Restore the root file system from tape into the /a directory hierarchy. For example, if you used ufsdump to back up the file system, use the ufsrestore command to restore it.

    See the ufsrestore(1M) manual page.

  6. Use the installboot command to install a bootblock device on /a.
  7. If the /usr file system is separate from the root file system, use the mkdir command to create a suitable mount point, such as /a/usr/, and mount /dev/dsk/c0t0d0s6 on it:
    # mkdir -p /a/usr
    # mount /dev/dsk/c0t0d0s6 /a/usr
  8. If the /usr file system is separate from the root file system, restore the /usr file system from tape into the /a/usr directory hierarchy.
  9. Disable startup of VxVM by modifying files in the restored root file system.
  10. Create the file /a/etc/vx/reconfig.d/state.d/install-db to prevent the configuration daemon, vxconfigd, from starting:
    # touch /a/etc/vx/reconfig.d/state.d/install-db
  11. Copy /a/etc/system to a backup file such as /a/etc/system.old.
  12. Comment out the following lines from /a/etc/system by putting a * character in front of them:
    set vxio:vol_rootdev_is_volume=1
    rootdev:/pseudo/vxio@0:0

    These lines should then read:

    * set vxio:vol_rootdev_is_volume=1 
    * rootdev:/pseudo/vxio@0:0
  13. Copy /a/etc/vfstab to a backup file such as /a/etc/vfstab.old.
  14. Edit /a/etc/vfstab, and replace the volume device names (beginning with /dev/vx/dsk) for the / and /usr file system entries with their standard disk devices, /dev/dsk/c0t0d0s0 and /dev/dsk/c0t0d0s6. For example, replace the following lines:
    /dev/vx/dsk/rootvol /dev/vx/rdsk/rootvol / ufs 1 no - 
    /dev/vx/dsk/usrvol /dev/vx/rdsk/usrvol /usr ufs 1 yes -

    with this line:

    /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no - 
     /dev/dsk/c0t0d0s6 /dev/rdsk/c0t0d0s6 /usr ufs 1 yes -
  15. Remove /a/dev/vx/dsk/bootdg and /a/dev/vx/rdsk/bootdg:
    # rm /a/dev/vx/dsk/bootdg
    # rm /a/dev/vx/rdsk/bootdg
  16. Shut down the system cleanly using the init 0 command, and reboot from the new root disk. The system comes up thinking that VxVM is not installed.
  17. If there are only root disk mirrors in the old boot disk group, remove any volumes that were associated with the encapsulated root disk (for example, rootvol, swapvol and usrvol) from the /dev/vx/dsk/bootdg and /dev/vx/rdsk/bootdg directories.
  18. If there are other disks in the old boot disk group that are not used as root disk mirrors, remove files involved with the installation that are no longer needed:
    # rm -r /etc/vx/reconfig.d/state.d/install-db

    Start the Veritas Volume Manager I/O daemons:

    # vxiod set 10

    Start the Veritas Volume Manager configuration daemon in disabled mode:

    # vxconfigd -m disable

    Initialize the volboot file:

    # vxdctl init

    Enable the old boot disk group excluding the root disk that VxVM interprets as failed::

    #  vxdctl enable

    Use the vxedit command (or the Veritas Enterprise Administrator (VEA)) to remove the old root disk volumes and the root disk itself from Veritas Volume Manager control.

  19. Use the vxdiskadm command to encapsulate the new root disk and initialize any disks that are to serve as root disk mirrors. After the required reboot, mirror the root disk onto the root disk mirrors.