Appendix D. Upgrading

Table of Contents

D.1. Upgrading Cluster Software

There are three approaches to upgrading a cluster, each with advantages and disadvantages.

Table D.1. Upgrade Methods

Method	Available between all versions	Can be used with Pacemaker Remote nodes	Service outage during upgrade	Service recovery during upgrade	Exercises failover logic	Allows change of messaging layer ^[a]
Complete cluster shutdown	yes	yes	always	N/A	no	yes
Rolling (node by node)	no	yes	always ^[b]	yes	yes	no
Detach and reattach	yes	no	only due to failure	no	no	yes
^[a] For example, switching from Heartbeat to Corosync. ^[b] Any active resources will be moved off the node being upgraded, so there will be at least a brief outage unless all resources can be migrated "live".

D.1.1. Complete Cluster Shutdown

In this scenario, one shuts down all cluster nodes and resources, then upgrades all the nodes before restarting the cluster.

On each node:
1. Shutdown the cluster software (pacemaker and the messaging layer).
2. Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system.
3. Check the configuration with the crm_verify tool.
On each node:
1. Start the cluster software. The messaging layer can be either Corosync or Heartbeat and does not need to be the same one before the upgrade.

One variation of this approach is to build a new cluster on new hosts. This allows the new version to be tested beforehand, and minimizes downtime by having the new nodes ready to be placed in production as soon as the old nodes are shut down.

D.1.2. Rolling (node by node)

In this scenario, each node is removed from the cluster, upgraded, and then brought back online, until all nodes are running the newest version.

If you plan to upgrade other cluster software — such as the messaging layer — at the same time, consult that software’s documentation for its compatibility with a rolling upgrade.

Pacemaker has three version numbers that affect rolling upgrades:

Pacemaker release version: Rolling upgrades are possible as long as the major version number (the x in x.y.z) stays the same. For example, a rolling upgrade may be done from 1.0.8 to 1.1.15, but not from 0.6.7 to 1.0.0.
CRM feature set: This version number applies to the communication between full cluster nodes.
It increases when a cluster node running the older version would have problems if the cluster’s Designated Controller (DC) has the newer version. To avoid these problems, Pacemaker ensures that the longest-running node is the DC, and that nodes with an older feature set cannot join the cluster.
Therefore, if the CRM feature set is changing in the Pacemaker version you are upgrading to, you should run a mixed-version cluster only during a small rolling upgrade window. If one of the older nodes drops out of the cluster for any reason, it will not be able to rejoin until it is upgraded.
LRMD protocol version: This version number applies to communication between a Pacemaker Remote node and the cluster. It increases when an older cluster node would have problems hosting the connection to a newer Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will accept connections only from cluster nodes with the same or newer LRMD protocol version.
For rolling upgrades, this means that all cluster nodes should be upgraded before upgrading any Pacemaker Remote nodes.
Unlike with CRM feature set differences between full cluster nodes, mixed LRMD protocol versions between Pacemaker Remote nodes and full cluster nodes are fine, as long as the Pacemaker Remote nodes have the older version. This can be useful, for example, to host a legacy application in an older operating system version used as a Pacemaker Remote node.

See the ClusterLabs wiki’s Release Calendar to figure out whether the CRM feature set and/or LRMD protocol version changed between the the Pacemaker release versions in your rolling upgrade.

Warning

The interpretation of the LRMD protocol version changed in Pacemaker 1.1.15. If you are planning a rolling upgrade from an earlier Pacemaker version to Pacemaker 1.1.15 or later involving Pacemaker Remote nodes, you will need to take special precautions to avoid problems. See Upgrading to Pacemaker 1.1.15 or later from an earlier version on the ClusterLabs wiki.

To perform a rolling upgrade, on each node in turn:

Put the node into standby mode, and wait for any active resources to be moved cleanly to another node. (This step is optional, but allows you to deal with any resource issues before the upgrade.)
Shutdown the cluster software (pacemaker and the messaging layer) on the node.
Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system.
If this is the first node to be upgraded, check the configuration with the crm_verify tool.
Start the messaging layer. This must be the same messaging layer (Corosync or Heartbeat) that the rest of the cluster is using.

Note

Rolling upgrades were not always possible with older heartbeat and pacemaker versions. Rolling upgrades that cross compatibility boundaries listed in the following table must be performed in multiple steps.

Table D.2. Version Compatibility Table

Version being Installed	Oldest Compatible Version
Pacemaker 1.x.y	Pacemaker 1.0.0
Pacemaker 0.7.x	Pacemaker 0.6 or Heartbeat 2.1.3
Pacemaker 0.6.x	Heartbeat 2.0.8
Heartbeat 2.1.3 (or less)	Heartbeat 2.0.4
Heartbeat 2.0.4 (or less)	Heartbeat 2.0.0
Heartbeat 2.0.0	None. Use an alternate upgrade strategy.

D.1.3. Detach and Reattach

The reattach method is a variant of a complete cluster shutdown, where the resources are left active and get re-detected when the cluster is restarted.

This method may not be used if the cluster contains any Pacemaker Remote nodes.

Tell the cluster to stop managing services. This is required to allow the services to remain active after the cluster shuts down.
```
# crm_attribute --name maintenance-mode --update true
```
On each node, shutdown the cluster software (pacemaker and the messaging layer), and upgrade the Pacemaker software. This may also include upgrading the messaging layer. While the underlying operating system may be upgraded at the same time, that will be more likely to cause outages in the detached services (certainly, if a reboot is required).
Check the configuration with the crm_verify tool.
On each node, start the cluster software. The messaging layer can be either Corosync or Heartbeat and does not need to be the same one as before the upgrade.
Verify that the cluster re-detected all resources correctly.
Allow the cluster to resume managing resources again:
```
# crm_attribute --name maintenance-mode --delete
```

Note

Support for maintenance mode was added in Pacemaker 1.0.0. If you are upgrading from an earlier version, you can detach by setting is-managed to false for all resources.