Search <book_title>...

Veritas InfoScale™ 7.3.1 Troubleshooting Guide - Solaris

Last Published: 2018-08-22

Product(s): InfoScale & Storage Foundation (7.3.1)

Platform: Solaris

Introduction
Section I. Troubleshooting Veritas File System
1. Diagnostic messages
  1. File system response to problems
    1. Recovering a disabled file system
  2. About kernel messages
Section II. Troubleshooting Veritas Volume Manager
Section III. Troubleshooting Dynamic Multi-Pathing
1. Dynamic Multi-Pathing troubleshooting
Section IV. Troubleshooting Storage Foundation Cluster File System High Availability
1. Troubleshooting Storage Foundation Cluster File System High Availability
Section V. Troubleshooting Cluster Server
1. Troubleshooting and recovery for VCS
Section VI. Troubleshooting SFDB
1. Troubleshooting SFDB
  1. About troubleshooting Storage Foundation for Databases (SFDB) tools

System failures

RAID-5 volumes are designed to remain available with a minimum of disk space overhead, if there are disk failures. However, many forms of RAID-5 can have data loss after a system failure. Data loss occurs because a system failure causes the data and parity in the RAID-5 volume to become unsynchronized. Loss of synchronization occurs because the status of writes that were outstanding at the time of the failure cannot be determined.

If a loss of sync occurs while a RAID-5 volume is being accessed, the volume is described as having stale parity. The parity must then be reconstructed by reading all the non-parity columns within each stripe, recalculating the parity, and writing out the parity stripe unit in the stripe. This must be done for every stripe in the volume, so it can take a long time to complete.

Warning:

While the resynchronization of a RAID-5 volume without log plexes is being performed, any failure of a disk within the volume causes its data to be lost.

Besides the vulnerability to failure, the resynchronization process can tax the system resources and slow down system operation.

RAID-5 logs reduce the damage that can be caused by system failures, because they maintain a copy of the data being written at the time of the failure. The process of resynchronization consists of reading that data and parity from the logs and writing it to the appropriate areas of the RAID-5 volume. This greatly reduces the amount of time needed for a resynchronization of data and parity. It also means that the volume never becomes truly stale. The data and parity for all stripes in the volume are known at all times, so the failure of a single disk cannot result in the loss of the data within the volume.