Veritas InfoScale™ 7.3.1 Troubleshooting Guide - Solaris
- Introduction
- Section I. Troubleshooting Veritas File System
- Section II. Troubleshooting Veritas Volume Manager
- Recovering from hardware failure
- Failures on RAID-5 volumes
- Recovery from failure of a DCO volume
- Recovering from instant snapshot failure
- Recovering from failed vxresize operation
- Recovering from boot disk failure
- Hot-relocation and boot disk failure
- Recovery from boot failure
- Repair of root or /usr file systems on mirrored volumes
- Replacement of boot disks
- Recovery by reinstallation
- Managing commands, tasks, and transactions
- Backing up and restoring disk group configurations
- Troubleshooting issues with importing disk groups
- Recovering from CDS errors
- Logging and error messages
- Troubleshooting Veritas Volume Replicator
- Recovery from configuration errors
- Errors during an RLINK attach
- Errors during modification of an RVG
- Recovery on the Primary or Secondary
- Recovering from Primary data volume error
- Primary SRL volume error cleanup and restart
- Primary SRL header error cleanup and recovery
- Secondary data volume error cleanup and recovery
- Troubleshooting issues in cloud deployments
- Recovering from hardware failure
- Section III. Troubleshooting Dynamic Multi-Pathing
- Section IV. Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting CFS
- Troubleshooting fenced configurations
- Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters
- Troubleshooting Storage Foundation Cluster File System High Availability
- Section V. Troubleshooting Cluster Server
- Troubleshooting and recovery for VCS
- VCS message logging
- Gathering VCS information for support analysis
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting I/O fencing
- System panics to prevent potential data corruption
- Fencing startup reports preexisting split-brain
- Troubleshooting CP server
- Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting licensing
- Licensing error messages
- VCS message logging
- Troubleshooting and recovery for VCS
- Section VI. Troubleshooting SFDB
Recovery from RLINK connect problems
This section describes the errors that may be encountered when connecting RLINKs. To be able to troubleshoot RLINK connect problems, it is important to understand the RLINK connection process.
Connecting the Primary and Secondary RLINKs is a two-step operation. The first step, which attaches the RLINK, is performed by issuing the vradmin startrep command. The second step, which connects the RLINKs, is performed by the kernels on the Primary and Secondary hosts.
When the vradmin startrep command is issued, VVR performs a number of checks to ensure that the operation is likely to succeed, and if it does, the command changes the state of the RLINKs from DETACHED/STALE to ENABLED/ACTIVE. The command then returns success.
If the command is successful, the kernel on the Primary is notified that the RLINK is enabled and it begins to send messages to the Secondary requesting it to connect. Under normal circumstances, the Secondary receives this message and connects. The state of the RLINKs then changes from ENABLED/ACTIVE to CONNECT/ACTIVE.
If the RLINK does not change to the CONNECT/ACTIVE state within a short time, there is a problem preventing the connection. This section describes a number of possible causes. An error message indicating the problem may be displayed on the console.
If the following error displays on the console:
VxVM VVR vxrlink INFO V-5-1-5298 Unable to establish connection with remote host <remote_host>, retrying
Make sure that the vradmind daemon is running on the Primary and the Secondary hosts; otherwise, start the vradmind daemon by issuing the following command:
# /usr/sbin/vxstart_vvr
For an RLINK in a shared disk group, make sure that the virtual IP address of the RLINK is enabled on the logowner.
If there is no self-explanatory error message, issue the following command on both the Primary and Secondary hosts:
# vxprint -g diskgroup -l rlink_name
In the output, check the following:
The remote_host of each host is the same as local_host of the other host.
The remote_dg of each host is the same as the disk group of the RVG on the other host.
The remote_dg_dgid of each host is the same as the dgid (disk group ID) of the RVG on the other host as displayed in the output of the vxprint -l diskgroup command.
The remote_rlink of each host is the same as the name of the corresponding RLINK on the other host.
The remote_rlink_rid of each host is the same as the rid of the corresponding RLINK on the other host.
Make sure that the network is working as expected. Network problems might affect VVR, such as prevention of RLINKs from connecting or low performance. Possible problems could be high latency, low bandwidth, high collision counts, and excessive dropped packets.
For an RLINK in a private disk group, issue the following command on each host.
For an RLINK in a shared disk group, use vxprint -Vl | grep logowner to find the logowner node, then issue the following command on the logowner on the Primary and Secondary.
# ping -s remote_host
Note:
This command is only valid when ICMP ping is allowed between the VVR Primary and the VVR Secondary.
After 10 iterations, type Ctrl-C. There should be no packet loss or very little packet loss. To ensure that the network can transmit large packets, issue the following command on each host for an RLINK in a private disk group.
For an RLINK in a shared disk group, issue the following command on the logowner on the Primary and Secondary:
# ping -I 2 remote_host 8192
The packet loss should be about the same as for the earlier ping command.
Issue the vxiod command on each host to ensure that there are active I/O daemons. If the output is 0 volume I/O daemons running, activate I/O daemons by issuing the following command:
# vxiod set 10
VVR uses well-known ports to establish communications with other hosts.
Issue the following command to display the port number:
# vxprint -g diskgroup -l rlink_name
Issue the following command to ensure that the heartbeat port number in the output matches the port displayed by vxprint command:
# vrport
Confirm that the state of the heartbeat port is Idle by issuing the following command:
# netstat -an -P udp
The output looks similar to this:
UDP: IPv4 Local Address Remote Address State -------------------- -------------------- ------- *.port-number Idle
Check for VVR ports on the Primary and Secondary sites.
Run the vrport utility and verify that ports are same at both ends.
Check whether the required VVR ports are open. Check for UDP 4145, TCP 4145, TCP 8199, and the anonymous port. Enter the following commands:
# netstat -an -P udp | grep 4145 *.4145 Idle *.4145 Idle # netstat -an -P tcp | grep 4145 *.4145 *.* 0 0 49152 0 LISTEN *.4145 *.* 0 0 49152 0 LISTEN # netstat -an -P tcp | grep 8199 *.8199 *.* 0 0 49152 0 LISTEN 10.180.162.41.32990 10.180.162.42.8199 49640 0 49640 0 ESTABLISHED *.8199 *.* 0 0 49152 0 LISTEN
Perform a telnet test to check for open ports. For example, to determine if port 4145 is open, enter the following:
# telnet <remote> 4145
Use the netstat command to check if vradmind daemons can connect between the Primary site and the Secondary site.
# netstat -an -P tcp | grep 8199 | grep ESTABLISHED 10.180.162.41.32990 10.180.162.42.8199 49640 0 49640 0 ESTABLISHED
If there is no established connection, check if the
/etc/hosts
file has entries for the Primary and Secondary sites. Add all participating system names and IP addresses to the/etc/hosts
files on each system or add the information to the name server database of your name service.On Solaris 11, you must manually edit the
/etc/hosts
file to remove the hostname from the lines for loopback addresses.For example:
::1 seattle localhost
127.0.0.1 seattle loghost localhost
needs to be changed to:
::1 localhost
127.0.0.1 loghost localhost
129.148.174.232 seattle