High availability
To ensure the high availability of deployed applications, the Juju controller must itself be highly available. This is accomplished by activating 'highly available' mode. With this mode enabled, multiple redundant controllers are instantiated while the single active controller is defined as the master. Automatic failover occurs should the master lose connectivity.
Juju HA and MongoDB
Juju HA is tightly integrated with the MongoDB database since that is where all environment data is stored. The MongoDB software has a native HA implementation and this is what Juju HA uses. A MongoDB cluster is called a replica set and, in the context of Juju HA, i) each of its members corresponds to a different controller and ii) the MongoDB replica set master corresponds to the controller master.
The number of controllers must be an odd number to prevent ties during voting for master, and the number of controllers cannot exceed 7, due to MongoDB limitations. This means a Juju HA cluster can have 3, 5 or 7 controllers.
Activating and modifying HA
Juju HA is activated and modified with the juju enable-ha
command.
As will be shown in the next section, it is also used to recover from failed
controllers.
When activating HA, by default, this command sets the level of controller
redundancy in the environment to 3. The optional -n
switch can modify this
number.
When modifying HA, the -n
switch can be used to increase the number of
redundant controllers. The only way to decrease is to create a backup of your
environment and then restore the backup to a new environment, which starts with
a single controller. You can then increase to the desired number.
Whenever you run enable-ha, the command will report the changes it intends to make, which will shortly be implemented.
For complete syntax, see the command reference page.
Recovering from controller failure
In the advent of failed controllers, Juju does not automatically re-spawn new controllers nor remove the failed ones. However, as long as more than half of the original number of controllers remain available you can manually recover. The process is detailed below.
- Run
juju enable-ha
. - Verify that the output of
juju status
shows a value ofhas-vote
for thecontroller-member-status
attribute for each new server and a value ofno-vote
for each old server. Once confirmed, the new servers are fully operational as cluster members and the old servers have been demoted (no longer part of HA). This process can take between 30 seconds to 20 minutes depending on machine resources and Juju data volume. - Use the
juju remove-machine
command to remove the old machines entirely.
You cannot repair the cluster as outlined above if fewer than half of the original controllers remain available because the MongoDB replica set will not have the quorum necessary to elect a new master. You must restore from backups in this case.
HA and logging
All Juju machines send their logs to a controller in the HA cluster. Each
controller, in turn, sends those logs to a MongoDB database which is
synchronized across controllers. The user reads logging information with the
juju debug-log
command as normal. See
Viewing logs.