Storage Foundation for Sybase ASE CE 7.4 Administrator's Guide - Linux
- Overview of Storage Foundation for Sybase ASE CE
- About Storage Foundation for Sybase ASE CE
- About SF Sybase CE components
- About optional features in SF Sybase CE
- Administering SF Sybase CE and its components
- Administering SF Sybase CE
- Starting or stopping SF Sybase CE on each node
- Administering VCS
- Administering I/O fencing
- About the vxfentsthdw utility
- Testing the coordinator disk group using the -c option of vxfentsthdw
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- Administering CVM
- Changing the CVM master manually
- Administering CFS
- Administering the Sybase agent
- Administering SF Sybase CE
- Troubleshooting SF Sybase CE
- About troubleshooting SF Sybase CE
- Troubleshooting I/O fencing
- Fencing startup reports preexisting split-brain
- Troubleshooting Cluster Volume Manager in SF Sybase CE clusters
- Troubleshooting interconnects
- Troubleshooting Sybase ASE CE
- Prevention and recovery strategies
- Prevention and recovery strategies
- Managing SCSI-3 PR keys in SF Sybase CE cluster
- Prevention and recovery strategies
- Tunable parameters
- Appendix A. Error messages
Sybase ASE CE components
Sybase ASE consists of a single monolithic, user space process named dataserver. A single ASE instance may consist of multiple dataserver processes, each representing an 'engine' in a single instance. The engines communicate via shared memory. ASE's internal threads run across these engines, allowing a single instance to scale to tens of thousands of concurrent users and dozens of processors on an SMP system.
Sybase ASE CE has various clustering components and a failure detection mechanism to enable multiple instances of the same database to simultaneously access it while providing protection against failures at various levels.
The following components are part of Sybase ASE CE:
CMS (Cluster Membership Service)
Membership management is provided by CMS which is built into the dataserver binary. ASE only handles application level membership management. It is only concerned about applications, namely dataserver, running on the cluster nodes. ASE does not differentiate between a software level failure and a physical node failure.
Quorum Device
ASE utilizes a single quorum device to assist with membership management. Quorum device serves as a membership voting area, but also acts as a configuration repository and a semaphore for numerous operations. All access to the quorum device is through a quorum management library which exposes a common API. The cluster definition is stored in the configuration section of the quorum device. This definition includes the instances in the cluster, the nodes they run on, interconnect address, etc. This is essential information to bootstrap each instance. The quorum API provides a disk based distributed locking mechanism. This distributed lock is implemented entirely in software and requires no network communication.
Quorum locks currently have three primary uses:
Race prevention at boot time
Configuration changes
Split brain prevention
The quorum API also provides a mechanism to query the state of each instance without needing to connect to the database server.
CIPC
Sybase has a built-in layer known as CIPC (Cluster Inter Process Communication) to provide message passing capabilities to the various subsystems within the dataserver. Cluster instances communicate via connection oriented UDP/IP, with CIPC providing reliability on top of UDP. Sybase recommends two private networks for the cluster interconnect.
The following mechanisms are used within ASE CE:
Heart-beating among instances
ASE instances exchange periodic heartbeats over the cluster interconnect to signify instance health. The default period is 5 seconds, and this is dynamically configurable. There is also a dynamically configurable number of retries before which missing heartbeats translate into membership failure. Although heartbeat messages are sent explicitly, "proxy heartbeating" is also supported where any message exchange between instances during the heartbeat period can serve as a proxy for the true heartbeat message. This has improved reliability in stress situations.
The heartbeat interval can be bypassed for software failures - failures where the underlying hardware is intact. Sybase CE instances use UDP, the UDP driver on the remote node provides notification when the ASE process exists. This allows the remaining instances to immediately go into membership failure. In this situation the time from process exit to formation of the new cluster view may be under one second.
Monitoring the health of private interconnects
A separate mechanism called linkswitch is used to monitor the health of the two interconnect links. Linkswitch is part of the larger CIPC module. When multiple links are configured, linkswitch will detect the loss of one of the links and provide traffic switching. It also detects when a down link comes back online.
Note:
The above mechanism of cluster heart-beating, linkswitch, and connected UDP allow CMS to detect the failure of the ASE process, individual interconnects, and the overall physical node (although it is not always clear which of these failures has occurred).
Monitoring the accessibility to the disk sub-system
A quorum heartbeat mechanism is used to determine when an instance has lost the ability to write to the disk subsystem. ASE periodically writes a heartbeat value to the quorum device. If this write fails ASE assumes that is has lost access to the disk subsystem and the instance terminates. The frequency of the heartbeat writes and the number of retries are both configurable. Note that this scheme assumes that the access to the quorum device utilizes the same fabric / SAN as the database devices.