Veritas InfoScale™ 7.3.1 Troubleshooting Guide - Solaris
- Introduction
- Section I. Troubleshooting Veritas File System
- Section II. Troubleshooting Veritas Volume Manager
- Recovering from hardware failure
- Failures on RAID-5 volumes
- Recovery from failure of a DCO volume
- Recovering from instant snapshot failure
- Recovering from failed vxresize operation
- Recovering from boot disk failure
- Hot-relocation and boot disk failure
- Recovery from boot failure
- Repair of root or /usr file systems on mirrored volumes
- Replacement of boot disks
- Recovery by reinstallation
- Managing commands, tasks, and transactions
- Backing up and restoring disk group configurations
- Troubleshooting issues with importing disk groups
- Recovering from CDS errors
- Logging and error messages
- Troubleshooting Veritas Volume Replicator
- Recovery from configuration errors
- Errors during an RLINK attach
- Errors during modification of an RVG
- Recovery on the Primary or Secondary
- Recovering from Primary data volume error
- Primary SRL volume error cleanup and restart
- Primary SRL header error cleanup and recovery
- Secondary data volume error cleanup and recovery
- Troubleshooting issues in cloud deployments
- Recovering from hardware failure
- Section III. Troubleshooting Dynamic Multi-Pathing
- Section IV. Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting CFS
- Troubleshooting fenced configurations
- Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters
- Troubleshooting Storage Foundation Cluster File System High Availability
- Section V. Troubleshooting Cluster Server
- Troubleshooting and recovery for VCS
- VCS message logging
- Gathering VCS information for support analysis
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting I/O fencing
- System panics to prevent potential data corruption
- Fencing startup reports preexisting split-brain
- Troubleshooting CP server
- Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting licensing
- Licensing error messages
- VCS message logging
- Troubleshooting and recovery for VCS
- Section VI. Troubleshooting SFDB
Resynchronizing parity on a RAID-5 volume
In most cases, a RAID-5 volume does not have stale parity. Stale parity only occurs after all RAID-5 log plexes for the RAID-5 volume have failed, and then only if there is a system failure. Even if a RAID-5 volume has stale parity, it is usually repaired as part of the volume start process.
If a volume without valid RAID-5 logs is started and the process is killed before the volume is resynchronized, the result is an active volume with stale parity.
The following example is output from the vxprint -ht command for a stale RAID-5 volume:
V NAME RVG/VSET/COKSTATE STATE LENGTH READPOL PREFPLEX UTYPE PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE SV NAME PLEX VOLNAME NVOLLAYR LENGTH [COL/]OFF AM/NM MODE ... v r5vol - ENABLED NEEDSYNC 204800 RAID - raid5 pl r5vol-01 r5vol ENABLED ACTIVE 204800 RAID 3/16 RW sd disk01-01 r5vol-01disk01 0 102400 0/0 c2t9d0 ENA sd disk02-01 r5vol-01disk02 0 102400 1/0 c2t10d0 dS sd disk03-01 r5vol-01disk03 0 102400 2/0 c2t11d0 ENA ...
This output lists the volume state as NEEDSYNC, indicating that the parity needs to be resynchronized. The state could also have been SYNC, indicating that a synchronization was attempted at start time and that a synchronization process should be doing the synchronization. If no such process exists or if the volume is in the NEEDSYNC state, a synchronization can be manually started by using the resync keyword for the vxvol command.
Parity is regenerated by issuing VOL_R5_RESYNC ioctls to the RAID-5 volume. The resynchronization process starts at the beginning of the RAID-5 volume and resynchronizes a region equal to the number of sectors specified by the -o iosize option. If the -o iosize option is not specified, the default maximum I/O size is used. The resync operation then moves onto the next region until the entire length of the RAID-5 volume has been resynchronized.
For larger volumes, parity regeneration can take a long time. It is possible that the system may shut down, or the system may crashe before the operation is completed. In case of a system shutdown, the progress of parity regeneration must be kept across reboots. Otherwise, the process has to start all over again.
To avoid the restart process, parity regeneration is checkpointed. This means that the offset up to which the parity has been regenerated is saved in the configuration database. The -o checkpt=size option controls how often the checkpoint is saved. If the option is not specified, the default checkpoint size is used.
Because saving the checkpoint offset requires a transaction, making the checkpoint size too small can extend the time required to regenerate parity. After a system reboot, a RAID-5 volume that has a checkpoint offset smaller than the volume length starts a parity resynchronization at the checkpoint offset.
To resynchronize parity on a RAID-5 volume
- Type the following command:
# vxvol -g diskgroup resync r5vol