LDOM: How to handle the loss of Primary (Control) and Service I/O Domains when using Veritas DMP

Article: 100033972
Last Published: 2021-09-29
Ratings: 11 0
Product(s): InfoScale & Storage Foundation

Description

LDOMs known as Oracle VM Server for Sparc
 
Oracle's server virtualization and partitioning technology for SPARC. Provides the ability to split a single physical system into multiple, independent virtual systems (known as logical domains).
 
Figure 1.0

User-added image
Enables a system to run and deploy different Operating System instances simultaneously on a single server

 

Solaris Enhancement

Oracle has released an enhancement for Solaris 11.3 with SRU 18.0.6 which enables Veritas Dynamic Multi-Pathing (DMP) to better handle the loss of  Primary (Control) and Service I/O Domains.
 

Control (Primary) / Service domain is not accessible


Commands such as “echo | format”, “vxdisk scandisks” may hang, until the impacted I/O domain returns.

 

Sample Configuration using Veritas DMP and InfoScale

The Control (primary) and Service (altio) I/O domains will both present device paths to the logical domain
Device access is made possible and visible from the logical domain (scooby).

Figure 2.0


User-added image

 

With LDOMs, each logical domain is able to run a different Operating system (OS) release and update. Each LDOM can have more CPU and memory allocated to it, to service the environment specifics for the application. This allows each LDOM environment to be patched independently from each other.

 

Figure 3.0


LDOM configuration using DMP

 

In the above example, both I/O domains are exporting the SAN disk via a DMP path from the Control and Service domain. Multiple paths are exported to the GUEST. The GUEST will only see a single OS device handle per exported DMPNODE from each I/O domain. In other words, the DMPNODE from the Control domain will create one OS device handle in the GUEST. The DMPNODE from the Service domain will create another OS device handle in the GUEST for the same device. As the GUEST domain cannot get the failed I/O(s) back, they cannot be routed through the alternate operational I/O domain.
 

VDC Timeout

In Oracle VM Server configurations the Virtual Disk Client (VDC) driver timeout is set to zero by default, which signifies infinity.

Impact:

This can cause the failed I/O not to return to the guest domain in the event that either the Control (primary) or Service I/O domain crashes (reboots) unexpectedly. Even with the VDC timeout defined, the NEW timeout-noretry-list parameter made available with Solaris 11.3 SRU 18.0.6 needs to be defined for all virtual disk instances.

 

DMP_NATIVE_SUPPORT

To support the handling of Solaris ZFS boot devices, the DMP tunable dmp_native support must be enabled in Solaris 11 hosts, where the use of MPGROUPs is not permitted by the Veritas product suite. For Solaris 10 ZFS based boot devices, dmp_native_support should not be enabled unless the boot disk is presented to the Solaris 10 LDOM GUEST via a MPGROUP.

# vxdmpadm gettune dmp_native_support
# vxdmpadm settune dmp_native_support=on


NOTE: A series of Veritas Volume Manager (VxVM) patches were released to ensure DMP imports ZFS devices using DMP.

Please contact Veritas support to ensure you are running the required Veritas Volume Manager (VxVM) patch level.


When applying the latest patch perform the following steps:

  1. Ensure dmp_native_support is first disabled
  2. Apply the latest available VxVM patch applied
  3. Reboot the system and re-enable dmp_native_support

MPGROUPs remains unsupported for Solaris 11 configurations. MPxIO is not supported with Solaris 10 or 11 LDOM configurations.

 

Additional Information (Solaris 11):

The solution for Solaris 11 deployments is documented below.

The key thing here is that NO MPGROUPs should be configured, and the ZFS devices presented to the Solaris 11 GUEST domain need to have the dmp_native_support tunable enabled for DMP.

 

Solaris 11 Sparc Enhancement

Solaris 11.3 SRU 18.0.6 provides the timeout-noretry-list parameter.
 

File location: /platform/sun4v/kernel/drv/vdc.conf

The vdc.conf file needs to be modified in the GUEST domain to reduce the chances of specific commands hanging in the logical domain when access to a service I/O domain is lost.

 

Sample vdc.conf update:

scooby # cat /platform/sun4v/kernel/drv/vdc.conf
#
# Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
#
#
# Associate the driver with devid resolution.
#
ddi-devid-registrant=1;
timeout-noretry-list=0,1,2,3,4,5;

 


Note:
The “timeout-noretry-list” setting has been added to the vdc.conf file shown above. The Virtual Disk Client (VDC) for virtual disk instances 0,1,2,3,4 & 5 will NOW return an I/O error immediately (for each I/O), after the defined VDC timeout from the time the I/O domain is not accessible.

The VDC driver timeout must also be defined to avoid this issue.

The VDC timeout can be defined in the /etc/system file in the GUEST domain (system wide parameter).

 

Sample file output

# cat /etc/system
.
.
set vdc:vdc_timeout=30
set vdc:vdc_read_timeout=30

 

NOTE: The timeout can be set manually at a Virtual disk level using the “vdisk” command.

 

Sample syntax:

 

# ldm add-vdisk timeout=30 scoobydisk-pri scoobyboot@primary-vds0 scooby

NOTE: The ldm commands are executed from the Control (primary) domain.

Was this content helpful?