Veritas InfoScale™ 7.3.1 Troubleshooting Guide - Solaris
- Introduction
- Section I. Troubleshooting Veritas File System
- Section II. Troubleshooting Veritas Volume Manager
- Recovering from hardware failure
- Failures on RAID-5 volumes
- Recovery from failure of a DCO volume
- Recovering from instant snapshot failure
- Recovering from failed vxresize operation
- Recovering from boot disk failure
- Hot-relocation and boot disk failure
- Recovery from boot failure
- Repair of root or /usr file systems on mirrored volumes
- Replacement of boot disks
- Recovery by reinstallation
- Managing commands, tasks, and transactions
- Backing up and restoring disk group configurations
- Troubleshooting issues with importing disk groups
- Recovering from CDS errors
- Logging and error messages
- Troubleshooting Veritas Volume Replicator
- Recovery from configuration errors
- Errors during an RLINK attach
- Errors during modification of an RVG
- Recovery on the Primary or Secondary
- Recovering from Primary data volume error
- Primary SRL volume error cleanup and restart
- Primary SRL header error cleanup and recovery
- Secondary data volume error cleanup and recovery
- Troubleshooting issues in cloud deployments
- Recovering from hardware failure
- Section III. Troubleshooting Dynamic Multi-Pathing
- Section IV. Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting CFS
- Troubleshooting fenced configurations
- Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters
- Troubleshooting Storage Foundation Cluster File System High Availability
- Section V. Troubleshooting Cluster Server
- Troubleshooting and recovery for VCS
- VCS message logging
- Gathering VCS information for support analysis
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting I/O fencing
- System panics to prevent potential data corruption
- Fencing startup reports preexisting split-brain
- Troubleshooting CP server
- Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting licensing
- Licensing error messages
- VCS message logging
- Troubleshooting and recovery for VCS
- Section VI. Troubleshooting SFDB
Disk failures
An uncorrectable I/O error occurs when disk failure, cabling or other problems cause the data on a disk to become unavailable. For a RAID-5 volume, this means that a subdisk becomes unavailable. The subdisk cannot be used to hold data and is considered stale and detached. If the underlying disk becomes available or is replaced, the subdisk is still considered stale and is not used.
If an attempt is made to read data contained on a stale subdisk, the data is reconstructed from data on all other stripe units in the stripe. This operation is called a reconstructing-read. This is a more expensive operation than simply reading the data and can result in degraded read performance. When a RAID-5 volume has stale subdisks, it is considered to be in degraded mode.
A RAID-5 volume in degraded mode can be recognized from the output of the vxprint -ht command as shown in the following display:
V NAME RVG/VSET/COKSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED DEGRADED 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 dS sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA pl r5vol-02 r5vol ENABLED LOG 1440 CONCAT - RW sd disk04-01 r5vol-02disk04 0 1440 0 c2t12d0 ENA pl r5vol-03 r5vol ENABLED LOG 1440 CONCAT - RW sd disk05-01 r5vol-03disk05 0 1440 0 c2t14d0 ENA
The volume r5vol is in degraded mode, as shown by the volume state, which is listed as DEGRADED. The failed subdisk is disk02-01, as shown by the MODE flags; d indicates that the subdisk is detached, and S indicates that the subdisk's contents are stale.
Warning:
Do not run the vxr5check command on a RAID-5 volume that is in degraded mode.
A disk containing a RAID-5 log plex can also fail. The failure of a single RAID-5 log plex has no direct effect on the operation of a volume provided that the RAID-5 log is mirrored. However, loss of all RAID-5 log plexes in a volume makes it vulnerable to a complete failure. In the output of the vxprint -ht command, failure within a RAID-5 log plex is indicated by the plex state being shown as BADLOG rather than LOG.
In the following example, the RAID-5 log plex r5vol-02 has failed:
V NAME RVG/VSET/COKSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED ACTIVE 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 ENA sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA pl r5vol-02 r5vol DISABLED BADLOG 1440 CONCAT - RW sd disk04-01 r5vol-02disk04 0 1440 0 c2t12d0 ENA pl r5vol-03 r5vol ENABLED LOG 1440 CONCAT - RW sd disk05-01 r5vol-12disk05 0 1440 0 c2t14d0 ENA