Problem
When disabling a switch port in an NPIV environment the system may panic.
Veritas Volume Manager (VxVM) 8.0.0.1601 provides support for Solaris NPIV functionality with Solaris 11.4 SRU 48 and higher.
What is NPIV?
NPIV (N-Port ID Virtualization) is a Fibre Channel facility that enables one Fibre Channel adapter to have many N Port IDs.
Each N Port has a unique identity (port WWN and node WWN) on the SAN and can be used for zoning and LUN masking. Soft zoning, which you can use to group ports together by port WWN, is the preferred method of zoning.
LDOMs known as Oracle VM Server for Sparc.
Oracle's server virtualization and partitioning technology for SPARC. Provides the ability to split a single physical system into multiple, independent virtual systems (known as logical domains).
Figure 1.0
Enables a system to run and deploy different Operating System instances simultaneously on a single server
MPxIO is not supported with Solaris 10 or 11 LDOM configurations, so must be disabled.
Error Message
Sample scat output
void unix:panicsys+0x40((const char *)0x10bc02e7, (va_list)0x2a10e297488, (struct regs *)0x20672870, (int)1, 0x9980081607, , , , , , , , 0x10bc02e7, 0x2a10e297488)
unix:vpanic_common+0x78(0x10bc02e7, 0x2a10e297488, 0x2a10e2973f0, 0x20903000, 0x804, 0x720)
void unix:panic+0x1c((const char *)0x10bc02e7, 0x1a, 0x10bd8400, 0x10bd8400, 1, 0x10bc0000, ...)
scsi_reset_bits_t scsi:scsi_pkt_reset_helps+0x164((struct scsi_pkt *)0x18400a7077440, (scsi_reset_helps_flags_t)3)
void sd:sd_retry_command+0x9bc((struct sd_lun *)0x18400643b6040, (struct buf *)0x18400a8d4f800, (int)0x106, (void (*)())0, (void *)0, (int)5, (clock_t)0, (void (*)())0, (scsi_fm_flags_t)0, (const char *)0x12129ca5)
void sd:sd_pkt_reason_cmd_path_retrynew+0x94((struct sd_lun *)0x18400643b6040, (struct buf *)0x18400a8d4f800, (struct sd_xbuf *)0x1840071bacc40, (struct scsi_pkt *)0x18400a7077440)
void sd:sdintr+0x85c((struct scsi_pkt *)0x18400a7077440)
void scsi:scsi_hba_pkt_comp+0x5a0((struct scsi_pkt *)0x18400a7077440)
void genunix:taskq_d_thread+0xb4((taskq_ent_t *)0x184008f22c230)
unix:thread_start+4()
-- end of kernel thread's stack --
When disabling MPxIO, it may not be disabled correctly.
Users will see the following error in the LDOM NPIV enabled GUEST domain where DMP is enabled and MPxIO is disabled:
# vxdmpadm getctlr all output
VxVM vxdmpadm ERROR V-5-1-10139 Invalid logical controller-name
# vxdmpadm listenclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ARRAY_TYPE LUN_COUNT FIRMWARE
===================================================================================================
Additional InfoScale fixes are required:
Cause
Oracle (Solaris Support) Created Bug 34100510 - panic scsi_pkt_reset_helps: CMD_PATH_RETRYNEW inappropriate
There is no guard against setting FLAG_PKT_PATH_RETRYNEW in sd_pkt_reason_cmd_tran_err() when Solaris I/O Multipathing (MPxIO/scsi_vhci) is not in use.
/* Set FLAG_PKT_PATH_RETRYNEW to ensure retry on a new path. */
The "stmsboot -D fp -d" utility incorrectly believed MPxIO was already disabled, if the configuration file /etc/driver/drv/fp.conf had been removed.
To disable MPxIO, run:
# stmsboot -D -fp -d
Use the "devprop" command to verify if Solaris I/O Multipathing (MPxIO/scsi_vhci) is enabled or disabled for a specific controller.
For example:
# devprop -v -n /dev/cfg/c6 mpxio-disable
mpxio-disable=no
Example
Use fcinfo hba-port
to obtain the OS Device Name:# fcinfo hba-port
Sample output
HBA Port WWN: 2100000e1ec705d1
Port Mode: Initiator
Port ID: 161700
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 7023303
Firmware Version: 8.08.04
FCode/BIOS Version: BIOS: 3.40; fcode: 4.10; EFI: 6.19;
Serial Number: 463916R+1549284486
Driver Name: qlc
Driver Version: 210226-5.10
Type: N-port
State: online
Supported Speeds: 4Gb 8Gb 16Gb
Current Speed: 16Gb
Node WWN: 2000000e1ec705d1
Max NPIV Ports: 7
NPIV port list:
The following command can be used to verify if MPXIO is disable for a specific controller
# devprop -v -n /dev/cfg/c3 mpxio-disable
>>>>>this should be yes.
mpxio-disable=no
Manually disable MPxIO for Fibre Channel in the vsan domain by creating file /etc/driver/drv/fp.conf with the following entry:mpxio-disable="yes";
NOTE: The quotes around yes are required.
Veritas is working with Oracle engineering to support Solaris NPIV Configurations with InfoScale 8.0 onwards.
A series of fixes are being developed in connection with Solaris 11.4 & InfoScale 8.0.
Solution
Oracle have addressed the above "panic" related scenario in Solaris 11.4 SRU 48.
Reference : SR 3-29178482461 : system panic after disabled switch port in NPIV environment
Proposed fix is for vhba to map pkt_reason=CMD_PATH_RETRYNEW
to STATUS_BUSY
.
Proposed fix has been integrated into Solaris trunk build 11.4.48.0.0.123.0 (SRU 48).
Do not configure NPIV without deploying SRU 48 or higher.
Veritas has also created a Volume Manager (VxVM) Private hot-fix for InfoScale 8.0.x onwards to enable Veritas DMP to work with NPIV devices.
MPxIO must be disabled.
Veritas & Oracle are performing further qualification cycles to support Solaris NPIV environments with InfoScale 8.0 onwards.
Please contact Veritas Technical Support to obtain Private hot-fix "Veritas Volume Manager 8.0 Hot Fix 1601"
OPERATING SYSTEMS SUPPORTED BY THE PATCH
----------------------------------------
Solaris 11 SPARC
PACKAGES AFFECTED BY THE PATCH
------------------------------
VRTSvxvm
BASE PRODUCT VERSIONS FOR THE PATCH
-----------------------------------
* InfoScale Enterprise 8.0
* InfoScale Foundation 8.0
* InfoScale Storage 8.0
SUMMARY OF INCIDENTS FIXED BY THE PATCH
---------------------------------------
Patch ID: 8.0.0.1601
* 4073314 (4072241) vxdiskadm functionality is failing due to changes in dmpdr script
* 4075623 (4075620) LDOM NPIV ENABLED (vsan) HBAs are not visible to VxVM with DMP.
* 4075623 (Tracking ID: 4075620)
SYMPTOM:
LDOM NPIV ENABLED (vsan) HBAs are not visible to VxVM with DMP with error "VxVM vxdmpadm ERROR V-5-1-10139 Invalid logical controller-name".
DESCRIPTION:
The device's physical path link is totally new to DMP. DMP can't parse this new format. Hence failed to discovery any devices.
RESOLUTION:
Code changes to support LDOM NPIV ENABLED (vsan) HBAs have been done.