...
The inventory may define these hosts:
observers
: Set of hosts to observe the cluster (only the first is considering)Hypervisors
: Set of machines to hosts VMs.
Info |
---|
Remember that the cluster must contain an odd number of machines. For example, three hypervisors or one observer and two hypervisors. |
...
All nodes in the cluster have access to a shared storage via Ceph (see Shared storage section). With it, the cluster is in N to N redundancy mode.
Corosync will provides messaging and membership services.
Pacemaker will manage the cluster (synchronize resources between each node).
...
More details on Pacemaker here and Corosync here.
...
For instance, with pacemaker-remote pacemaker can monitor services and containers directly inside a VM.
Drawio | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
For more information about pacemaker-remote refer to https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Remote/singlehtml/.
Management tool
The vm_manager project is an a high-level interface of pacemaker and Ceph to manage the VM like a resource. He is installed during the installation step and provides the vm-mgr
command.
Sub-command
All sub-commands has -n
, --name
required option to specify which resource should be used.
add_colocation
: Adds a colocation constraint between resourcesclone
: Creates a copy of the VMcreate
: Generates a new resource from a VMcreate_snapshot
: Creates a snapshot of a resourcedisable
: Stops and removes the resource on the clusterenable
: Adds and starts the resource on the clusterget_metadata
: Gets a metadata of a resourcelist
: Lists all resourceslist_metadata
: Lists all keys of a resourcelist_snapshots
: Lists all created snapshotspurge
: Deletes all snapshots of a resourceremove
: Removes the resourceremove_snapshot
: Removes a snapshot of a resourcerollback
: Rollbacks to a snapshot for a resourceset_metadata
: Sets a metadata of a resourcestart
: Start a resourcestatus
: Gets the status of a resourcestop
: Stops a resource
Resources status
...
Please refer to How to manage VM in SEAPATH for more detais.
Node replacement
In case one of the nodes suffers a difficult to repair situation (lost motherboard for example, or lost disk with no RAID), it might become necessary to replace the server with a blank one.
From the cluster point of view, we will need to remove the old node and add the new one, for both corosync/pacemaker and ceph.
The
ansible/playbooks/replace_machine_remove_machine_from_cluster.yaml
playbook can remove a node from the cluster. For this, themachine_to_remove
should be set to the hostname to remove.
The below command should be launched in the ansible project.Code Block language bash cqfd run ansible-playbook -i /path/to/inventory.yaml -e machine_to_remove=HOSTNAME playbooks/replace_machine_remove_machine_from_cluster.yaml
A new host should be installed with the ISO installer and the same hostname, ip address, etc... than the old node.
Make the "cluster network" connections between hosts.
Restart the cluster_setup_debian.yml playbook to configure the new host in the cluster (more details here).
...