Search <book_title>...

Veritas NetBackup™ Flex Scale Administrator's Guide

Last Published: 2024-04-01

Product(s): Appliances (3.1)

Platform: NetBackup Flex Scale OS

Replacement procedure for a single OS disk

This topic describes the process of replacing a single OS disk that failed or is unreachable. Each node has two OS disks.

Identifying an OS disk failure (performed by the CHS team)

The following section describes how to identify a single OS disk failure from NetBackup Flex Scale:

An alert is generated for an OS disk failure or for an unreachable disk. To view the alert, do one of the following from the NetBackup Flex Scale infrastructure management UI:

Click Dashboard in the left pane. In the Alerts area, click View details to see a complete list of alerts.
At the top of any screen, click the Bell icon.
Click Settings > Alerts management. On the Alerts management page, use the filters to locate specific types of alerts.

If SMTP is configured for AutoSupport, you receive email alerts. If Call Home is configured for your setup, diagnostic information is sent to the AutoSupport server.

Navigate to Monitor > Infrastructure > Hardware and select the node on which the OS disk went bad, and then click Hard Disk. The UI shows the failure for the corresponding OS disk:

The following section describes how to identify an OS disk failure from third-party tools:

The HPE Integrated Lights-Out (iLO) remote console shows a failure. The Health for the OS disk is shown as Critical and Warning for the Volume of the RAID 1 in iLO.

The health of the node is shown as unhealthy for that node in the NetBackup Flex Scale UI. Navigate to Monitor > Infrastructure > Nodes to view the node health.

Replacement procedure (performed by the HPE vendor)

An HPE representative identifies the faulty disk, its physical location in the appliance, and replaces the faulty OS disk. You can use the AHS logs to find the required details, and then replace the disk.

Note:

With NetBackup Flex Scaleversion 3.1, you can beacon the disk from the UI.

After you get the physical location of the disk on the appliance, replace the OS disk with a new OS disk. Note the model number of the new disk and ensure that it matches with the older one.

To replace the disk, the HPE representative completes the following steps:

Check the disk model number from the iLO remote console.
Identify the corresponding location of the OS disks in the appliance. In this example, Box6 - Bay 1 and Bay 2.
Refer to the HPE procedure to replace the disk.
In iLO, after the OS disk is replaced, Health for the OS disk is set to OK but the Health of the Volume of the RAID 1 is set to Warning till the rebuild completes.

Completing the post-replacement tasks (performed by Veritas TSE)

After the hardware vendor notifies you that the hardware component is replaced, verify that the issue is resolved.

To verify that the issue is resolved, Veritas TSE completes the following steps:

Wait till the RAID controller rebuilds the new OS disk. This operation takes approximately two hours. To check the rebuild progress, run the following command after elevating to root access:

nbfs3.1> support elevate
#    ssacli ctrl all show config
HPE Smart Array P816i-a SR Gen10 in Slot 0 (Embedded)  (sn: PWXLA0BRHDW07G)
   Internal Drive Cage at Port 1I, Box 2, OK
   Internal Drive Cage at Port 2I, Box 3, OK
   Internal Drive Cage at Port 3I, Box 6, OK
   Internal Drive Cage at Port 4I, Box 7, OK
   Port Name: 1I (Mixed)
   Port Name: 2I (Mixed)
   Port Name: 3I (Mixed)
   Port Name: 4I (Mixed)
   Array A (Solid State SATA, Unused Space: 0  MB)
      logicaldrive 1 (1.75 TB, RAID 1, Recovering, 4.13% complete)
      physicaldrive 3I:6:1 (port 3I:box 6:bay 1, SATA SSD, 1.9 TB, Rebuilding)
      physicaldrive 3I:6:2 (port 3I:box 6:bay 2, SATA SSD, 1.9 TB, OK)

After the rebuild completes successfully, verify that all the AutoSupport alerts are resolved and the node state shows healthy in the NetBackup Flex Scale UI. To verify, navigate to Monitor > Infrastructure > Nodes.
In iLO, verify that the Health of the Volume for the RAID 1 is set to OK.