Problem
When installing or restarting an InfoScale environment with NVMe multipathing enabled on InfoScale 8.0.2 or below. The system may panic or report the following stack trace when booting.
Error Message
[1260021.395093] RIP: 0010:gendmpiodone+0xc8/0x2c0 [vxdmp]
[1260021.395115] Code: e8 3d 7a 02 00 48 89 ea 44 89 e7 48 89 c6 49 89 c7 e8 ac e6 02 00 41 89 c6 48 f7 85 88 00 00 00 00 00 00 80 0f 85 bd 01 00 00 <41> f6 47
0b 20 0f 85 11 01 00 00 48 8b 5b 20 31 c0 f6 c7 21 0f 95
[1260021.395117] RSP: 0018:ffffa7668d224e68 EFLAGS: 00010046
[1260021.395119] RAX: 0000000000000000 RBX: ffff8a55051cb800 RCX: ffff8a58074f7600
[1260021.395121] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000001f80
[1260021.395123] RBP: ffff8a58074f7600 R08: ffff8a538a2edf80 R09: 0000000000000000
[1260021.395124] R10: 0000000000000000 R11: 000000428fbfc000 R12: 000000000000003f
[1260021.395125] R13: ffff8a4fc9c64f00 R14: 0000000000000000 R15: 0000000000000000
[1260021.395127] FS: 0000000000000000(0000) GS:ffff8a8ebfdc0000(0000) knlGS:0000000000000000
[1260021.395128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1260021.395130] CR2: 000000000000000b CR3: 000000468ae4c006 CR4: 00000000007706e0
[1260021.395131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1260021.395132] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1260021.395134] PKRU: 55555554
[1260021.395135] Call Trace:
[1260021.395138] <IRQ>
[1260021.395141] blk_update_request+0x105/0x3d0
[1260021.395147] blk_mq_end_request+0x1c/0x140
[1260021.395150] nvme_process_cq+0x160/0x250 [nvme]
[1260021.395155] nvme_irq+0xd/0x20 [nvme]
[1260021.395158] __handle_irq_event_percpu+0x3d/0x180
[1260021.395162] handle_irq_event+0x58/0xb0
[1260021.395164] handle_edge_irq+0x93/0x240
[1260021.395169] __common_interrupt+0x41/0xa0
[1260021.395174] common_interrupt+0x7e/0xa0
[1260021.395180] </IRQ>
[1260021.395181] asm_common_interrupt+0x1e/0x40
[1260021.393081] VxDMP14: p3 p8
[1260021.394246] VxDMP15: p3 p8
[1260021.394512] VxDMP16: p3 p8
[1260021.394763] VxDMP17: p3 p8
[1260021.395075] BUG: kernel NULL pointer dereference, address: 000000000000000b
[1260021.395078] #PF: supervisor read access in kernel mode
[1260021.395080] #PF: error_code(0x0000) - not-present page
[1260021.395081] PGD 800000440a0d3067 P4D 800000440a0d3067 PUD 45003fe067 PMD 0
[1260021.395086] Oops: 0000 [#1] PREEMPT SMP PTI
[1260021.395089] CPU: 63 PID: 0 Comm: swapper/63 Kdump: loaded Tainted: P W OE --------- --- 5.14.0-70.13.1.el9_0.x86_64 #1
[1260021.395092] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 10/28/2021
[1260021.395093] RIP: 0010:gendmpiodone+0xc8/0x2c0 [vxdmp]
[1260021.395115] Code: e8 3d 7a 02 00 48 89 ea 44 89 e7 48 89 c6 49 89 c7 e8 ac e6 02 00 41 89 c6 48 f7 85 88 00 00 00 00 00 00 80 0f 85 bd 01 00 00 <41> f6 47
0b 20 0f 85 11 01 00 00 48 8b 5b 20 31 c0 f6 c7 21 0f 95
[1260021.395117] RSP: 0018:ffffa7668d224e68 EFLAGS: 00010046
[1260021.395119] RAX: 0000000000000000 RBX: ffff8a55051cb800 RCX: ffff8a58074f7600
[1260021.395121] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000001f80
[1260021.395123] RBP: ffff8a58074f7600 R08: ffff8a538a2edf80 R09: 0000000000000000
[1260021.395124] R10: 0000000000000000 R11: 000000428fbfc000 R12: 000000000000003f
[1260021.395125] R13: ffff8a4fc9c64f00 R14: 0000000000000000 R15: 0000000000000000
[1260021.395127] FS: 0000000000000000(0000) GS:ffff8a8ebfdc0000(0000) knlGS:0000000000000000
[1260021.395128] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1260021.395130] CR2: 000000000000000b CR3: 000000468ae4c006 CR4: 00000000007706e0
[1260021.395131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1260021.395132] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1260021.395134] PKRU: 55555554
[1260021.395135] Call Trace:
[1260021.395138] <IRQ>
[1260021.395141] blk_update_request+0x105/0x3d0
[1260021.395147] blk_mq_end_request+0x1c/0x140
[1260021.395150] nvme_process_cq+0x160/0x250 [nvme]
[1260021.395155] nvme_irq+0xd/0x20 [nvme]
[1260021.395158] __handle_irq_event_percpu+0x3d/0x180
[1260021.395162] handle_irq_event+0x58/0xb0
[1260021.395164] handle_edge_irq+0x93/0x240
[1260021.395169] __common_interrupt+0x41/0xa0
[1260021.395174] common_interrupt+0x7e/0xa0
[1260021.395180] </IRQ>
[1260021.395181] asm_common_interrupt+0x1e/0x40
Cause
Veritas is currently not supporting NVMe multipathed disks on Linux with InfoScale 8.0.2 or below.
How to check if native NVMe multipathing is enabled in the kernel:
# cat /sys/module/nvme_core/parameters/multipath
The command displays one of the following:
-
N
Native NVMe multipathing is disabled
Y
Native NVMe multipathing is enabled
Solution
Workaround:
As a workaround, NVMe Multipathing must be disabled prior to installing Veritas InfoScale on Linux:
File reference:
# cat /sys/module/nvme_core/parameters/multipathY
For RHEL8 environments, the following syntax can be used to disable NVMe Multipathing:
# grubby --update-kernel=ALL --args="nvme_core.multipath=N"
The system must then be restarted.
# reboot
The NVMe multipathing file should now show “N” following the system reboot:
# cat /sys/module/nvme_core/parameters/multipathN
Veritas will attempt to provide the support for NVMe multipathing of devices in a future release. Veritas does not have a timeline at this time.
Key Points:
Veritas does not support Veritas Dynamic multipathing & Linux Device Mapper multipathing ( DM-Multipath) supporting the same devices. Only a single multipathing solution can be responsible for managing the multipathing functionality of a device.
Veritas Volume Manager (VxVM) does not support Linux Device Mapper multipathing ( DM-Multipath) managing VxVM disks.
Linux multipathing can be used to manage OS related devices as long as they are blacklisted & excluded correctly.
Whilst the internal boot device can be managed by Linux Device Mapper multipathing ( DM-Multipath), the
VxVM configured SAN data disks must be managed by Veritas DMP or an applicable TPD driver.
VxVM can only work with DMP and Third-Party Drivers (TPDs) such as EMC PowerPath (with EMC storage).