Understanding OCFS2 Heartbeat….

The O2CB cluster stack uses a heartbeat to determine whether a node is dead or alive.

Connectivity of nodes and storage devices must be assured to prevent file system corruption. OCFS2 sends regular keepalive packets (heartbeat) to ensure that the nodes are alive. There are two types of heartbeat modes: local and global.

Local Heartbeat

In local heartbeat mode, a heartbeat is established when a volume is mounted by a node. The O2CB cluster stack does disk heartbeat on a per-mount basis on an area on disk, called the heartbeat file, which is reserved during format. The heartbeat thread is started and stopped automatically during mount and unmount. Each node that has a file system mounted writes every two seconds to its block in the heartbeat file. Each node opens a TCP connection to every node that establishes a heartbeat. If the TCP connection is lost for more than 10 seconds, the node is considered dead, even if the heartbeat is continuing.

If a node loses network connectivity to more than half of the heartbeating nodes, it has lost the quorum and fences itself off from the cluster. A quorum is the group of nodes in a cluster that are allowed to operate on the shared storage. Fencing is the act of forcefully removing a node from a cluster. A node with OCFS2 mounted fences itself when it realizes that it does not have quorum in a degraded cluster. It does this so that other nodes do not attempt to continue trying to access its resources. OCFS2 panics the node that is fenced off. A surviving node then replays the journal of the fenced node to ensure that all updates are on disk.

Local heartbeat requires as many heartbeat threads as there are mounts. This becomes a problem on clusters having five or more mounts. While each heartbeat I/O is small—one sector writes and a maximum of 255 sectors read every two seconds—the amount of I/O operations per second (IOPS) can add up.

Also, because the heartbeat is started on every mount, the mount is slow due to the need to wait for the heartbeat thread to stabilize. And because the number of mounts on each node in a cluster can vary, a node must self-fence if the heartbeat I/O times out to even one device.

Global Heartbeat

A solution to this problem is global heartbeat. This heartbeat scheme decouples the mount with the heartbeat. It allows you to mount 50 or more volumes without the additional heartbeat I/O overhead. Mounts are faster because there is no need to wait for the heartbeat thread to stabilize. And the loss of one heartbeat device need not force the node to self-fence.

With global heartbeat, you can configure heartbeat devices on all nodes. The heartbeat is started when the O2CB cluster stack is started. All nodes in the cluster ensure that the devices are the same on all nodes. A node self-fences if the heartbeat I/O times out on 50% or more of the devices. A recommendation is to set up at least three heartbeat devices. Any fewer and the node must self-fence on losing just one device.

The list of heartbeat devices is stored in /etc/ocfs2/cluster.conf. The notation includes a new heartbeat: stanza that has the heartbeat region and the cluster name. Use the region and not the device name so as to not force stable and consistent device names across the cluster. The heartbeat device is either an existing ocfs2 volume that you mount or an ocfs2 volume that is specifically formatted as a heartbeat device, using the mkfs.ocfs2 –H command.

The cluster stanza has a heartbeat mode that is set to local or global. A cluster can have up to 32 heartbeat regions. Regions are named using the Universally Unique Identifier (UUID). The following example specifies two heartbeat stanzas for the mycluster cluster:

heartbeat:

region = 1313112313ABCDFE309888C34A0DB6B2

cluster = mycluster

heartbeat:

region = 2324242424ABCDFE309888C34A0DB6B2

cluster = mycluster

cluster:

node_count = 10

heartbeat_mode = global

name = mycluster