Cluster Server 8.0 Application Note: Dynamic Reconfiguration for Oracle Servers - Solaris

Last Published:
Product(s): InfoScale & Storage Foundation (8.0)
Platform: Solaris

Performing dynamic reconfiguration on a CPU/memory board

You may want to remove a CPU/memory board that is malfunctioning or you may want to reconfigure a board from one domain to another where it is needed more.

To reassign a board from one domain to another, you must unconfigure it from one domain and reassign it to another domain. This can be done without physically removing the board from its slot. To replace a board, however, you must unconfigure it from one domain, physically remove it, add its replacement board and reconfigure it to the domain.

Use the following procedures to dynamically reconfigure a CPU/memory board.

To determine the status of the board you are reconfiguring

  1. If necessary, log in as the administrator to the domain containing the CPU/memory board.
  2. Determine the attachment point of the board you are removing:

    # cfgadm

    Ap_Id Type Receptable Occupant Cond
    .
    N0.SB2 CPU connected configured ok
    .
  3. Make sure you have checked whether the board has permanent memory.

    See “To determine if the CPU/memory board has permanent memory”.

To unbind processes bound to CPU on the board

  1. To determine if any processes are bound to a CPU, enter:

    # pbind -q

  2. If a processes is bound to the board, the output indicates the process ID and the ID number of the CPU.
    process id 650: 0
  3. If you see no output or see output showing no processes bound to a CPU on the board, you are reconfiguring, perform the steps in To unconfigure the board.
  4. Unbind all processes bound to the CPU on the board. For example, enter:

    # pbind -u 650

  5. Rebind the processes to a processor on another board, if necessary. For example, bind process 650 to processor with ID 9, which is on another board, using the command:

    # pbind -b 650 9

  6. If you attempt to unconfigure a board with processes bound to it, you receive a message that resembles:
    cfgadm: Hardware specific failure: unconfigure SB15: Failed to
    off-line:dr@0:SB15::cpu3

To unconfigure the board

  1. Unconfigure and disconnect the board:

    # cfgadm -v -c disconnect SB2

  2. If the board does not contain permanent memory, the command's output resembles the following with slight variations for each server:
    request delete capacity (4 cpus)
    request delete capacity (2097152 pages)
    request delete capacity SB2 done
    request offline SUNW_cpu/cpu448
    request offline SUNW_cpu/cpu449
    request offline SUNW_cpu/cpu450
    request offline SUNW_cpu/cpu451
    request offline SUNW_cpu/cpu448 done
    request offline SUNW_cpu/cpu449 done
    request offline SUNW_cpu/cpu450 done
    request offline SUNW_cpu/cpu451 done
    unconfigure SB2
    unconfigure SB2 done
    notify remove SUNW_cpu/cpu448
    notify remove SUNW_cpu/cpu449
    notify remove SUNW_cpu/cpu450
    notify remove SUNW_cpu/cpu451
    notify remove SUNW_cpu/cpu448 done
    notify remove SUNW_cpu/cpu449 done
    notify remove SUNW_cpu/cpu450 done
    notify remove SUNW_cpu/cpu451 done
    disconnect SB2
    disconnect SB2 done
    poweroff SB2
    poweroff SB2 done
    unassign SB2 skipped

    Skip to 4.

  3. If the board has permanent memory, the system prompts you to proceed:
    System may be temporarily suspended; proceed (yes/no)?

    If the answer is "yes," dynamic reconfiguration proceeds. The system is suspended during reconfiguration. When the system resumes operation on another board, the board you are reconfiguring is disconnected. If the disconnect operation succeeds, the output resembles the following with slight variations for different servers:

    request suspend SUNW_OS
    request suspend SUNW_OS done
    request delete capacity (2097152 pages)
    request delete capacity SB15 done
    request offline SUNW_cpu/cpu480
    request offline SUNW_cpu/cpu481
    request offline SUNW_cpu/cpu482
    request offline SUNW_cpu/cpu483
    request offline SUNW_cpu/cpu480 done
    request offline SUNW_cpu/cpu481 done
    request offline SUNW_cpu/cpu482 done
    request offline SUNW_cpu/cpu483 done
    unconfigure SB15
    unconfigure SB15 done
    notify remove SUNW_cpu/cpu480
    notify remove SUNW_cpu/cpu481
    notify remove SUNW_cpu/cpu482
    notify remove SUNW_cpu/cpu483
    notify remove SUNW_cpu/cpu480 done
    notify remove SUNW_cpu/cpu481 done
    notify remove SUNW_cpu/cpu482 done
    notify remove SUNW_cpu/cpu483 done
    disconnect SB15
    disconnect SB15 done
    poweroff SB15

    Skip to 4.

    Note:

    If there are real-time processes running on the board you are unconfiguring, the disconnect operation may not succeed. You must stop these processes in the appropriate manner before continuing with dynamic reconfiguration.

  4. If the board has real-time processes that must be stopped, the dynamic reconfiguration operation fails, indicating the PID of those processes that are running. There may be slight variations in output for different Oracle Sun Enterprise servers.

    For example:

    .
    .
    notify remove SUNW_cpu/cpu481 done
    notify remove SUNW_cpu/cpu482 done
    notify remove SUNW_cpu/cpu483 done
    cfgadm: Hardware specific failure: unconfigure SB15:
    Cannot
    quiesce realtime thread: 621
  5. To determine the name of the processes, use the command:

    # ps -ef | grep PID

  6. Stop the process in the appropriate manner. For example, the processes in our example must be stopped using the kill command:

    # kill -9 PID

  7. Retry the command in 1.
  8. To verify the board is disconnected and unconfigured, use the cfgadm command:

    # cfgadm

    Ap_Id Type Receptable Occupant Cond
    .
    N0.SB2 CPU disconnected unconfigured unknown
    .

    Now you can remove the board from the slot, or reassign it to another domain.

    Note:

    Do not remove the board until you have verified it is disconnected.

  9. If you are replacing the board immediately, see To add a board to a domain. Otherwise, return the cluster to operation without replacing the disconnected CPU/memory board using the procedure in the following section.

To add a board to a domain

  1. Log in as administrator to the domain where you plan to add or configure the boards.
  2. If you are adding a new or a replacement board to a domain (for example, dom1), verify the state of the slot to contain the board.

    To be configured with a new board, the slot must have the following states and condition:

    • Receptacle state: empty

    • Occupant state: unconfigured

    • Condition: unknown

    Verify this by using the cfgadm command to list the slots, as in the following example. In the dom1 domain, slot SB2 is to contain the CPU board:

  3. Use the cfgadm command to connect and configure a CPU or memory board:

    cfgadm -v -c configure SBx

    For example:

    # cfgadm -v -c configure SB2

    assign SB2
    assign SB2 done
    poweron SB2
    poweron SB2 done
    test SB2
    test SB2 done
    connect SB2
    connect SB2 done
    configure SB2
    configure SB2 done
    notify online SUNW_cpu/cpu448
    notify online SUNW_cpu/cpu449
    notify online SUNW_cpu/cpu450
    notify online SUNW_cpu/cpu451
    notify add capacity (4 cpus)
    notify add capacity (2097152 pages)
    notify add capacity SB2 done
  4. Verify the new board has been connected and configured using the command cfgadm. For example:

    # cfgadm

    Ap_Id Type Receptable Occupant Cond
    .
    SB2 CPU connected configured ok