...
The inventory may define these hosts:
observers
: Set of hosts to observe the cluster (only the first is considering)
...
Hypervisors
: Set of machines to hosts VMs.
Info |
---|
Remember that the cluster must contain an odd number of machines. For example, three hypervisors or one observer and two hypervisors. |
Node redundancy
All nodes in the cluster have an access to a shared storage via ceph (more details hereCeph (see Shared storage section). With it, the cluster is in N to N redundancy mode.
Corosync will provides messaging and membership services.
Pacemaker will manage the cluster (synchronize resources between each node).
...
More details on Pacemaker here and Corosync here.
pacemaker-remote
pacemaker-remote is a component which can be installed inside a VM to allow Pacemaker to manage and monitor resources inside this VM.
For instance, with pacemaker-remote pacemaker can monitor services and containers directly inside a VM.
Drawio | |||
---|---|---|---|
|
...
|
...
|
More details on pacemaker here and corosync here.
libvirt
To use pacemaker likeFor more information about pacemaker-remote refer to https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Remote/singlehtml/.
Management tool
The vm_manager project is an a high-level interface of pacemaker and ceph Ceph to manage the VM like a resource. He is installed during the installation step and provides the vm-mgr
command.
Sub-command
All sub-commands has -n
, --name
required option to specify which resource should be used.
add_colocation
: Adds a colocation constraints between resourcesclone
: Creates a copy of the VMcreate
: Generates a new resource from a VMcreate_snapshot
: Creates a snapshot of a resourcedisable
: Stops and removes the resource on the clusterenable
: Adds and starts the resource on the clusterget_metadata
: Gets a metadata of a resourcelist
: Lists all resourceslist_metadata
: Lists all keys of a resourcelist_snapshots
: Lists all created snapshotspurge
: Deletes all snapshots of a resourceremove
: Removes the resourceremove_snapshot
: Removes a snapshot of a resourcerollback
: Rollbacks to a snapshot for a resourceset_metadata
: Sets a metadata of a resourcestart
: Start a resourcestatus
: Gets the status of a resourcestop
: Stops a resource
Resources status
Undefined
:Disabled
:Failed
:Started
:Starting
:Stopping
:Stopped
:
Manage VM
...
Please refer to How to manage VM in SEAPATH for more detais.
Node replacement
In case one of the nodes suffers a difficult to repair situation (lost motherboard for example, or lost disk with no RAID), it might become necessary to replace the server with a blank one.
From the cluster point of view, we will need to remove the old node and add the new one, for both corosync/pacemaker and ceph.
The
ansible/playbooks/replace_machine_remove_machine_from_cluster.yaml
playbook can remove a node from the cluster. For this, themachine_to_remove
should be set to the hostname to remove.
The below command should be launched in the ansible project.Code Block language bash
...
cqfd
...
run ansible-
...
playbook -
...
i /path/to/
...
inventory.
...
yaml -
...
Check the execution of the resource:
Code Block | ||
---|---|---|
| ||
crm status |
Get the status of the resource:
Code Block | ||
---|---|---|
| ||
vm-mgr status --name NAME |
Delete VM in the cluster:
...
language | bash |
---|
...
e machine_to_remove=HOSTNAME playbooks/replace_machine_remove_machine_from_cluster.yaml
A new host should be installed with the ISO installer and the same hostname, ip address, etc... than the old node.
Make the "cluster network" connections between hosts.
Restart the cluster_setup_debian.yml playbook to configure the new host in the cluster (more details here).
...