"V-5-1-11092 cleanup_client: (There are minor number conflicts on a slave node)" is reportred when a CVM slave fails to join the cluster because of minor number conflict

Article: 100032022
Last Published: 2016-02-29
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

A Cluster Volume Manager (CVM) slave fails to join the cluster because of minor number conflict.

Error Message

The following messages are logged when the CVM slave tries to join the cluster.

Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20036 Port v[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0652 membership 01
Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20038 Port v[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0652 k_jeopardy ; 2
Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20040 Port v[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0652    visible ; 2

Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20036 Port y[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0653 membership 01
Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20038 Port y[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0653 k_jeopardy ; 2
Feb 29 18:53:21 server102 kernel: GAB INFO V-15-1-20040 Port y[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0653    visible ; 2

Feb 29 18:53:21 server102 vxvm:vxconfigd: V-5-1-7900 CVM_VOLD_CONFIG command received

Feb 29 18:53:21 server102 kernel: VxVM vxio V-5-3-1906 vol_gab_ms_msg: ring broadcast commit completed for join/leave reconfig cs_flags=0x580802

Feb 29 18:53:22 server102 kernel: GAB INFO V-15-1-20036 Port m[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0651 membership 01
Feb 29 18:53:22 server102 kernel: GAB INFO V-15-1-20038 Port m[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0651 k_jeopardy ; 2
Feb 29 18:53:22 server102 kernel: GAB INFO V-15-1-20040 Port m[GAB_LEGACY_CLIENT (refcount 0)] gen   cf0651    visible ; 2

Feb 29 18:53:22 server102 kernel: VxVM vxio V-5-3-2015 reconfig message on port m received cf0650

Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20036 Port w[GAB_USER_CLIENT (refcount 0)] gen   cf0655 membership 01
Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20038 Port w[GAB_USER_CLIENT (refcount 0)] gen   cf0655 k_jeopardy ; 2
Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20040 Port w[GAB_USER_CLIENT (refcount 0)] gen   cf0655    visible ; 2

Feb 29 18:53:26 server102 kernel: VxVM vxio V-5-0-1910 Cleaning incomplete shared diskgroup devldg dgiid 33792.104
Feb 29 18:53:26 server102 vxvm:vxconfigd: V-5-1-11092 cleanup_client: (There are minor number conflicts on a slave node) 231
Feb 29 18:53:26 server102 vxvm:vxconfigd: V-5-1-11467 kernel_fail_join() :              Reconfiguration interrupted: Reason is retry to add a node failed (13, 0)
Feb 29 18:53:26 server102 kernel: VxVM vxio V-5-0-164 Failed to join cluster clus123, aborting

Feb 29 18:53:26 server102 kernel: VxVM vxio V-5-3-1250 joinsio_done: Node aborting, join for node 0 being failed
Feb 29 18:53:26 server102 kernel: VxVM vxio V-5-3-672 abort_joinp: aborting joinp for node 0 with err 11

Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20032 Port y closed
Feb 29 18:53:26 server102 vxvm:vxconfigd: V-5-1-7901 CVM_VOLD_STOP command received
Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20032 Port m closed
Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20032 Port v closed
Feb 29 18:53:26 server102 kernel: GAB INFO V-15-1-20032 Port w closed

Feb 29 18:53:26 server102 AgentFramework[11752]: VCS ERROR V-16-20006-1005 CVMCluster:cvm_clus:monitor:node - state: out of cluster
reason: There are minor number conflicts on a slave node: retry to add a node failed

Feb 29 18:55:26 server102 AgentFramework[11752]: VCS ERROR V-16-2-13066 Thread(4132432704) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.
Feb 29 18:55:26 server102 Had[11726]: VCS ERROR V-16-2-13066 (server102) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.
Feb 29 18:55:26 server102 AgentFramework[11752]: VCS ERROR V-16-2-13068 Thread(4132432704) Resource(cvm_clus) - clean completed successfully.
Feb 29 18:55:26 server102 AgentFramework[11752]: VCS ERROR V-16-2-13072 Thread(4132432704) Resource(cvm_clus): Agent is retrying online (attempt number 1 of 2).

 

Cause

This problem is usually caused by a minor number conflict between CVM shared diskgroup objects, such as volumes, volume sets or Replicated Volume Groups (RVGs) and the private diskgroup objects. Confirm that on the joining CVM slave, the minor numbers of the private diskgroup objects don't overlap with the CVM diskgroup objects. The problem should be automatically taken care of by autoreminor feature which is enabled, by default.


Figure 1 - Extract from the vxtune(1M) manual page
The following tunable parameters apply for Cluster Volume Manager (CVM):              autoreminor                   Turns on or off the automatic reminor functionality. A disk group cannot be imported if the device                   minor  numbers  of  the  disk  group or its objects conflict with those of an existing disk group.                   When autoreminor is on, VxVM automatically assigns new minor numbers  to  a  disk  group  if  VxVM                   detects  a conflict during an import operation. The disk group is then imported. The default value                   is on.                   Note: VxVM does not reminor a disk group that is already imported, regardless of whether autoreminor                   is set to on. For example, if you attempt to add a node to a cluster and the joining node has                   minor numbers that conflict with a disk group in the cluster. In this  case,  the  join  operation                   fails. You must reminor the disk group manually.                   In some scenarios such as with NFS file systems, assigning new minor numbers may result in issues.                   In this case, set the tunable parameter to off. When the autoreminor  parameter  is  set  to  off,                   attempting  to import a disk group with conflicting minor numbers will fail, even when you specify                   the force (-f) option.  You must manually reminor the disk group before you can  import  the  disk                   group.



Volume Manager (VxVM) divides the minor numbers into two sets. One set is for the private diskgroups. The other set is for the CVM shared diskgroups. The two sets of minor numbers are divided by the following vxtune parameter:
  •  sharedminorstart: The starting number in the range used to assign device minor numbers in shared (CVM) disk groups. The default value is 33000.

Sometimes, when a diskgroup is first initialized as a private diskgroup (running vxdg init without the -s option) VxVM will assign a base_minor of less than 33000. Later when the diskgroup is imported as shared, VxVM will not change the minor numbers and they remain below the 33000 boundary. It may then have a chance to collide with existing private diskgroups on another nodes in the cluster.

The same issue also exists the other way round. If a diskgroup iss initialized as shard at first ( vxdg -s init) but later imported as private, then this private diskgroup will have minor number higher than 33000.  In such situation you may want to run vxdg remnor to move the minor numbers back to the correct set to avoid conflicts.

Apart from the above possible cause, there is also one obscure cause for the CVM slave join to fail with minor number conflict. When two  diskgroups are reminored to have the same base_minor while keeping some of the existing in-kernel minor numbers not changes, the CVM slave join will fail with minor number conflict.

The following is an example on how this can happen.

First two diskgroups are created and they have different base_minor, then one diskgroup is reminored to 36000.
server101# vxdg -g proddg reminor 36000server101# vxprint -m -g proddg | grep 'minor=[0-9]'        base_minor=36000        minor=36000

                                                                                                          
                                                                                                              
The CFS filesystems are unmounted and the diskgroup is deported.
server101# hagrp -offline SGprod -sys server101         # umounted the filesystemserver101# vxdg deport proddg


The other diskgroup is also reminored to the same base_minor 36000 while having the CFS filesystems mounted.
server101# vxdg -g devldg reminor 36000VxVM vxdg WARNING V-5-1-3858 Volume devlvol01: Device is open, will renumber on import


Since the filesystem are still mounted, VxVM will not change the existing in-kernel minor number.
server101# ls -lR /dev/vx/rdsk..../dev/vx/rdsk/devldg:total 0crw-------. 1 root root 199, 28000 Feb 29 18:44 devlvol01  <<< in-kernel, the volume minor number is still the old one 28000


But the on-disk configuration has already been changed to 36000.
server101# vxprint -m -g devldg | grep 'minor=[0-9]'        base_minor=36000        minor=36000


Now, the previously deported diskgroup is imported.
server101# hagrp -online SGprod -sys server101Feb 29 18:51:59 server101 vxvm:vxconfigd: V-5-1-11401 : dg import with I/O fence enabledFeb 29 18:51:59 server101 vxvm:vxconfigd: V-5-1-11401 proddg: dg import with I/O fence enabledFeb 29 18:51:59 server101 kernel: sd 3:0:0:13: reservation conflictFeb 29 18:51:59 server101 kernel: sd 5:0:0:13: reservation conflictFeb 29 18:51:59 server101 vxvm:vxconfigd: V-5-1-16765 Selecting configuration database copy from disk_6 from disks: disk_6Feb 29 18:52:03 server101 vxvm:vxconfigd: V-5-1-16766 Trying to import the disk group proddg using configuration database copy from disk_6Feb 29 18:52:04 server101 vxvm:vxconfigd: V-5-1-16254 Disk group import of proddg succeeded.


From the ls -lR /dev/vx/rdsk/ output, there is no minor number conflict.
server101# ls -lR /dev/vx/rdsk..../dev/vx/rdsk/devldg:total 0crw-------. 1 root root 199, 28000 Feb 29 18:44 devlvol01     <<< 28000/dev/vx/rdsk/proddg:total 0crw-------. 1 root root 199, 36000 Feb 29 18:52 prodvol01     <<< 36000


But now the slave will not join because the on-disk base_minor are the same 36000.
server101# hagrp -online cvm -sys server102Feb 29 18:53:26 server102 kernel: VxVM vxio V-5-0-1910 Cleaning incomplete shared diskgroup devldg dgiid 33792.104Feb 29 18:53:26 server102 vxvm:vxconfigd: V-5-1-11092 cleanup_client: (There are minor number conflicts on a slave node) 231

Solution

Confirm that both the in-kernel minor numbers, and the on-disk minor numbers, are not conflicting.

For in-kernel minor numbers, you can use the following commands:
server101# ls -lR /dev/vx/rdsk..../dev/vx/rdsk/devldg:total 0crw-------. 1 root root 199, 28000 Feb 29 18:44 devlvol01/dev/vx/rdsk/proddg:total 0crw-------. 1 root root 199, 36000 Feb 29 18:52 prodvol01


For on-disk minor numbers, use vxprint:
server101# vxprint -m -g devldg | grep 'minor=[0-9]'        base_minor=36000        minor=36000server101# vxprint -m -g proddg | grep 'minor=[0-9]'        base_minor=36000        minor=36000


Run the above command for all of the diskgroups that are currently imported.

If there is any base_minor conflict, please run vxdg reminor to fix the issue.  
# vxdg -g proddg reminor 37000


Choose a new minor number that is not used by any diskgroups.
Note: This number should be a multiple of 1000.

To have the minor numbers change immediately, unmount the filesystems first before running vxdg reminor.

 

Was this content helpful?