Cluster Server 8.0 Application Note: Dynamic Reconfiguration for Oracle Servers - Solaris

Last Published:
Product(s): InfoScale & Storage Foundation (8.0)
Platform: Solaris

Performing dynamic reconfiguration on I/O boards

Under certain circumstances, you must stop VCS on the domain where you are reconfiguring a board.

See Scenarios requiring a VCS shutdown.

For s6800:

In the following scenario, a cluster consists of the dom1 and the dom3 domains. The cluster is running service groups on the dom1 domain, which includes I/O boards N0.IB8 and N0.IB6. N0.IB8 requires dynamic reconfiguration because of a malfunctioning component. The domain dom3 includes I/O boards IO14 and IO15. The disk controllers and NICs are labeled in the following diagrams.

For e12K/15K/25K: In the following scenario, a cluster consists of the dom3 and the S6800f0 domains. The cluster is running service groups on the dom3 domain, which includes I/O boards IO14 and IO15. IO15 requires dynamic reconfiguration because of a malfunctioning component. The domain S6800f0 includes I/O boards IB8 and IB6. The disk controllers and NICs are labeled in the following diagrams.

The highlights of the procedure to dynamically reconfigure the I/O boards (N0.IB8 board and IO15 board) in the dom1 and dom3 domains for s6800 and e12K/15K/25K respectively include:

  • Disabling all the active controllers on the board.

  • Disabling all the NIC devices used for private communications on the board

  • Disabling all the NIC devices used for public communications on the board

  • Disabling the IO board and removing it

  • Adding the replacement IO board

  • Enabling the replacement board

  • Enabling the public NIC devices

  • Enabling the private NIC devices

  • Enabling the active controllers

To verify the status of the cluster before dynamic reconfiguration

  1. Use the VCS command hastatus -sum to verify the current state of the service groups in the cluster. Use the command before reconfiguring the I/O board and after reconfiguration to verify the cluster's state. The output is as follows with slight variations for the different Oracle servers.
    -- SYSTEM STATE
    -- System State Frozen
    A dom3 RUNNING 0
    A s6800f0 RUNNING 0
    -- GROUP STATE
    -- Group System Probed AutoDisabled State
    B ServiceGroupA dom3 Y N ONLINE
    B ServiceGroupA s6800f0 Y N OFFLINE
    B cvm dom3 Y N ONLINE
    B cvm s6800f0 Y N ONLINE
  2. For s6800: By using the cfgadm -lv command, you can show the I/O boards and cards in the dom1 domain. For example:

    # cfgadm -lv

    In the output (not shown), the board N0.IB8 is reported to be connected, configured, and ok. In addition, the condition of each of the slots on N0.IB8 are reported.

    For e12K/15K: By using the cfgadm -al command, you can show the I/O boards and cards in the dom3 domain. For example:

    # cfgadm -al

    Ap_Id Type Receptacle Occupant
    Condition
    IO14 HPCI connected configured ok
    IO14::pci0 io connected configured ok
    IO14::pci1 io connected configured ok
    IO14::pci2 io connected configured ok
    IO14::pci3 io connected configured ok
    IO15 HPCI connected configured ok
    IO15::pci0 io connected configured ok
    IO15::pci1 io connected configured ok
    IO15::pci2 io connected configured ok
    IO15::pci3 io connected configured ok
    SB14 CPU connected configured ok
    SB14::cpu0 cpu connected configured ok
    .
    .
    .
    pcisch1:e14b1slot0 fibre/hp connected configured ok
    pcisch2:e14b1slot3 pci-pci/hp connected configured ok
    pcisch3:e14b1slot2 ethernet/hp connected configured ok
    pcisch4:e15b1slot1 pci-pci/hp connected configured ok
    pcisch5:e15b1slot0 fibre/hp connected configured ok
    pcisch6:e15b1slot3 pci-pci/hp connected configured ok
    pcisch7:e15b1slot2 ethernet/hp connected configured ok

To determine the controllers on a board

  1. Use the command vxdmpadm listctlrall to determine all controllers in the domain. For example, on the dom3 domain:

    # vxdmpadm listctlr all

    CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
    =====================================================
    c0 Disk ENABLED Disk
    c9 HDS9960 ENABLED HDS99600
    c8 HDS9960 ENABLED HDS99600
  2. To determine which controllers are on a specific board, for example IO15, use the following commands to display information about the disks in the domain, their controllers, and the location of the controllers on the IO boards.

    Use the command cfgadm -lv, which provides a verbose listing of all boards in the domain. In the output, you can see the device slots listed for the board IO15.

    # cfgadm -lv

    In the following example (not all output is shown) the listing might contain lines that resemble:

    .
    pcish4:e15b1slot1 . . .
    /devices/pci@1fc,700000:e15b1slot1
    pcish5:e15b1slot0 . . .
    /devices/pci@1fc,600000:e15b1slot0
    pcish6:e15b1slot3 . . .
    /devices/pci@1fd,700000:e15b1slot3
    pcish7:e15b1slot2 . . .
    /devices/pci@1fd,600000:e15b1slot2
    .

    The listing indicates that the device labeled pci@1fc is used by slots 0 and 1 of board 15, the device labeled pci@1fd is used by slots 3 and 2.

    Using the format command in the domain, you can list the disk devices. The listing may be lengthy, but in the output, the controller, indicated by "c#" in the first two characters of the device name, corresponds to a device that is listed in the previous command (step a). For example:

    # format

    c0t0d0 <SUN18G ..... /pci@1dc,700000/pci@1.. .....
    c8t0d0 <HITACHI-OPEN ....
    /pci@1dc,600000/fibre-channel ...
    .
    c9t0d0 <HITACHI-OPEN ....
    /pci@1fc,600000/fibre-channel ...

    A comparison of the output of the previous two commands shows that board 15 slot 0 contains the controller c9.

  3. As an alternative to using the format command, you can also use the following procedure to determine which storage controllers are impacted by dynamic reconfiguration on a given slot or I/O board for e25K on Solaris.

    Verify which I/O controllers are impacted by dynamic reconfiguration on the board IO4 on sol10 (cougar) by using the following command:

    cougar# cfgadm -s "cols=ap_id:physid" | grep IO4

    IO4 /devices/pseudo/dr@0:IO4
    IO4_C3V0 /devices/pci@9c,600000:IO4_C3V0
    IO4_C3V1 /devices/pci@9d,600000:IO4_C3V1
    IO4_C5V0 /devices/pci@9c,700000:IO4_C5V0
    IO4_C5V1 /devices/pci@9d,700000:IO4_C5V1

    The -s parameter is used to limit output to the ap_id and physical id columns.

    Notice the pci@... In the phys id, use grep again using pci@9[cd],[67]00000:

    cougar# cfgadm -s "cols=ap_id:physid" | grep pci@9[cd],[67]

    IO4_C3V0 /devices/pci@9c,600000:IO4_C3V0
    IO4_C3V1 /devices/pci@9d,600000:IO4_C3V1
    IO4_C5V0 /devices/pci@9c,700000:IO4_C5V0
    IO4_C5V1 /devices/pci@9d,700000:IO4_C5V1
    c0 /devices/pci@9c,700000/pci@1/scsi@2:scsi
    c1 /devices/pci@9c,700000/pci@1/scsi@2,1:scsi
    c2 /devices/pci@9c,600000/SUNW,qlc@1,1/fp@0,0:fc
    c3 /devices/pci@9c,600000/SUNW,qlc@1/fp@0,0:fc
    c4 /devices/pci@9d,700000/SUNW,qlc@1/fp@0,0:fc
    c5 /devices/pci@9d,700000/SUNW,qlc@1,1/fp@0,0:fc

    c0 and c1 are located on IO4_C5V0, c2 and c3 are on IO4_C3V0, and c4 and c5 are on IO4_C5V1

    On sol 9, the procedure is almost the same:

    jaguar# cfgadm -s "cols=ap_id:physid" | grep e17

    e17 corresponds to the IO board #17

    pcisch4:e17b1slot1 /devices/pci@23c,700000:e17b1slot1
    pcisch5:e17b1slot0 /devices/pci@23c,600000:e17b1slot0
    pcisch6:e17b1slot3 /devices/pci@23d,700000:e17b1slot3
    pcisch7:e17b1slot2 /devices/pci@23d,600000:e17b1slot2

    jaguar# cfgadm -s "cols=ap_id:physid" | grep pci@23[cd],[67]

    c4 /devices/pci@23c,700000/pci@1/scsi@2:scsi
    c5 /devices/pci@23c,700000/pci@1/scsi@2,1:scsi
    c6 /devices/pci@23d,700000/SUNW,qlc@1/fp@0,0:fc
    pcisch4:e17b1slot1 /devices/pci@23c,700000:e17b1slot1
    pcisch5:e17b1slot0 /devices/pci@23c,600000:e17b1slot0
    pcisch6:e17b1slot3 /devices/pci@23d,700000:e17b1slot3
    pcisch7:e17b1slot2 /devices/pci@23d,600000:e17b1slot2

    c4 and c5 are on e17b1slot1 and c6 is on slot3

To determine the network interfaces on the board

  • Verify which network interfaces correspond to which slot on the I/O board (since each I/O board can carry upto four PCI cards) by using the grep command to match the /etc/path_to_inst for pci identifiers.

    For e25K on Solaris

    IO4_C3V0 /devices/pci@9c,600000:IO4_C3V0
    IO4_C3V1 /devices/pci@9d,600000:IO4_C3V1
    IO4_C5V0 /devices/pci@9c,700000:IO4_C5V0
    IO4_C5V1 /devices/pci@9d,700000:IO4_C5V1

    cougar# grep pci@9[cd],[67] /etc/path_to_inst |grep network

    "/pci@9c,700000/network@3,1" 0 "eri"
    "/pci@9c,700000/pci@1/network@0" 0 "ce"
    "/pci@9c,700000/pci@1/network@1" 1 "ce"
    "/pci@9d,600000/pci@1/network@0" 2 "ce"

    IO4_C5V0 contains eri0, c0, and c1. IO4_C3V1 contains ce2.

    cougar#

To disable the controllers on the board

  1. Disable the active controllers on the I/O system card using the vxdmpadm command.

    vxdmpadm disable ctlr=ctlr

    For s6800:

    # vxdmpadm disable ctlr=c2

    For e12K/15K:

    # vxdmpadm disable ctlr=c9

  2. Using the vxdmpadm command, verify that the controller is disabled. The output for all Oracle servers (s6800 and e12K/15K/25K) will be similar except for minor differences.

    # vxdmpadm listctlr all

    For s6800: In this example, the only controller on board is c2.

    CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
    =====================================================
    c0 Disk ENABLED Disk
    c2 HDS9960 DISABLED HDS99600
    c1 HDS9960 ENABLED HDS99600

    For e12K/15K: In this example, the only controller on board IO15 is c9.

    CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME
    =====================================================
    c0 Disk ENABLED Disk
    c9 HDS9960 DISABLED HDS99600
    c8 HDS9960 ENABLED HDS99600
  3. If a card has more than one controller, repeat this command for each controller on the card to be reconfigured.

To list the status of the private network links and to disable them

  1. Enter the command lltstat -nv:

    The output resembles:

    For s6800:

    LLT node information:
    Node State Links
    * 0 dom1 OPEN 2
    1 dom3 OPEN 2
    2 CONNWAIT 0
    .
    .
    31 CONNWAIT 0

    The output shows that both domains have two links for private communication. Both links are "OPEN," that is, operational.

    For e12K/15K:

    LLT node information:
    Node State Links
    0 s6800f0 OPEN 2
    * 1 dom3 OPEN 2
    2 CONNWAIT 0
    .
    .
    31 CONNWAIT 0

    The output shows that both domains have two links for private communication. Both links are "OPEN," that is, operational.

  2. Display the /etc/llttab file using the following command:

    # cat /etc/llttab

    For s6800:

    set-node dom1
    set-cluster 13
    link qfe4 /dev/qfe:4 - ether - -
    link qfe0 /dev/qfe:0 - ether - -

    The devices qfe0 and qfe4 are shown as the private network links.

    For e12K/15K:

    set-node dom3
    set-cluster 13
    link cd3 /dev/ce:3 - ether - -
    link cd8 /dev/ce:8 - ether - -

    The devices ce3 and ce8 are shown as the private network links.

  3. Disable the private network link device.

    For example for s6800, the private network link device is: qfe4,on I/O board N0.IB8.

    # /sbin/lltconfig -u qfe4

    For example for e12K/15K, the private network link device is: ce8, on I/O board 15.

    # /sbin/lltconfig -u ce8

  4. Check the status of the private network links:

    # lltstat -nv

    For s6800:

    LLT node information:
    Node State Links
    * 0 dom1 OPEN 2
    dom3 OPEN 1
    2 CONNWAIT 0
    .
    .
    .
    31 CONNWAIT 0

    For e12K/15K:

    LLT node information:
    Node State Links
    0 s6800f0 OPEN 1
    * 1 dom3 OPEN 2
    2 CONNWAIT 0
    .
    .
    .
    31 CONNWAIT 0

To list the status of the public NICs and to disable them

  1. Use the command ifconfig -a.

    For s6800: For example, qfe3 (on board N0.IB6) and qfe7 (on board N0.IB8), the NICs used for the public network connections, are operational.

    # ifconfig -a

    lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232
    index
    1 inet 127.0.0.1 netmask ff000000
    ge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500
    index 2 inet 10.182.65.99 netmask fffff000 broadcast
    10.182.79.255 ether 0:3:ba:8:ec:40
    qfe3:
    flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,
    NOFAILOVER> mtu 1500 index 3 inet 10.182.66.143 netmask
    ffffff00 broadcast 10.255.255.255 groupname mn1 ether
    0:3:ba:8:ec:40
    qfe7:
    flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,
    NOFAILOVER> mtu 1500 index 4 inet 10.182.66.144 netmask
    ffffff00 broadcast 10.255.255.255 groupname mn1 ether
    0:3:ba:8:ec:40
  2. For s6800: To disable the device qfe7 on board N0.IB8, use the commands:

    # ifconfig qfe7 down

    # ifconfig qfe7 unplumb

    For e12K/15K: To disable the device ce5 on board IO15, use the command:

    # ifconfig ce5 down

  3. For s6800: Use the ifconfig -a command to verify that qfe7 is down. No information about qfe7 should appear in the output.

    For e12K/15K/25K: Use the ifconfig -a command to verify that ce5 is down. No information about ce5 should appear in the output.

    # ifconfig -a

To disable and remove the IO board

  1. When the controllers and network interface cards are disabled, disconnect the board:

    For s6800:

    # cfgadm -c disconnect N0.IB8

    For e12K/15K:

    # cfgadm -c disconnect IO15

    Note:

    The -f option is recommended only when a normal disconnect attempt fails and there is no clear way to make the command succeed gracefully.

  2. Use the cfgadm command to check the status of the I/O board:

    # cfgadm -al

    For s6800: In the output, the fields Receptable, Occupant, and Condition for N0.IB8 show disconnected, unconfigured, and unknown respectively.

    The I/O board may be physically removed at this time. Before adding the new board to the dom1 domain, you must test it in another spare domain.

    For e12K/15K:

    Ap_Id Type Receptacle Occupant
    Condition
    IO14 HPCI connected configured ok
    IO14::pci0 io connected configured ok
    IO14::pci1 io connected configured ok
    IO14::pci2 io connected configured ok
    IO14::pci3 io connected configured ok
    IO15 HPCI disconnected unconfigured
    unknown
    SB14 CPU connected configured ok
    SB14::cpu0 cpu connected configured ok
    .
    .

    The I/O board, IO15, may be physically removed at this time.

To add the new IO board

  1. Physically add the board, connecting all necessary cables, and configure it:

    For s6800:

    # cfgadm -c configure N0.IB8

    For e12K/15K:

    # cfgadm -c configure IO15

    Note:

    Make sure that the output of the cfgadm command shows the slot where the new board is to be added. The status is disconnected, unconfigured, and unknown.

  2. Run the cfgadm -al command to verify the board has been configured; the board should be connected, configured, and ok. If you have stopped VCS, you may skip step 3 through step 6.
  3. Reconfigure the network interface cards on the new board:

    For s6800:

    # ifconfig qfe7 plumb

    # ifconfig qfe7 up

    For e12K/15K:

    # ifconfig ce5 plumb

  4. Run the command ifconfig -a to verify that the NICs are up and running.
  5. Reconfigure LLT to reestablish the private network links:

    For s6800:

    # /sbin/lltconfig -t qfe4 -d /dev/qfe:4

    For e12K/15K:

    # /sbin/lltconfig -t ce8 -d /dev/ce:8

  6. Verify the private network links are restored using the command lltstat -nv:

    # /sbin/lltstat -nv

  7. For s6800: Enable the controller c2 on the N0.IB8 using vxdmpadm command:

    # vxdmpadm enable ctlr=c2

    For e12K/15K:Enable the controller c9 on the IO15 using vxdmpadm command:

    # vxdmpadm enable ctlr=c9

  8. Verify that the controller is up and running:

    # vxdmpadm listctlr all

    If you have stopped VCS before reconfiguring the I/O board, restart it. Refer to the section, See Stopping and starting VCS.