How to manage VM in SEAPATH

This page describe the vm_manager tool: https://github.com/seapath/vm_manager

Deploying a virtual machine on a SEAPATH cluster requires to handle many elements: ceph, qemu, libvirt, corosync. vm_manager was created as a wrapper around these components.

vm_manager is not useful for a standalone SEAPATH hypervisor. If you wish to deploy a VM for that use case, refer to the deploy_vms_standalone.yaml playbook.

The cluster_vm Ansible module is  wrapper above vm_manager cli. A more detailed documentation can be generated from sources.

It can be called from a playbook to perform actions on VMs. For instance, an example of playbook that creates a VM from a predefined image disk and XML Libvirt configuration would be:

- name: Create and start guest0 cluster_vm: name: guest0 command: create system_image: my_disk.qcow2 xml: "{{ lookup('file', 'my_vm_config.xml', errors='strict') }}"

Playbooks can be executed in any hypervisor. Other playbook examples are stored in the inventories/examples directory.

This section describes the VM architecture and the cluster_vm commands from a high-level point of view. Please read cluster_vm module documentation for further information. 

Like other Ansible modules, the cluster_vm documentation can also be displayed by executing ansible-doc cluster_vm command from the Ansible root repository.

You will also find information on how to troubleshoot problems related to VM management on page VM Troubleshooting.

VM status

In the SEAPATH cluster the VMs can have several statuses:

  • Undefined: The VM does not exist yet.

  • Disabled: The VM exists and its data disk has been created, but it is not enabled to be used on the cluster.

  • Starting: The VM is enabled and performing an operation of start.

  • Started: The VM is enabled and started. Note: This doesn’t mean that the VM is ready and has finished booting, which can take some time.

  • Stopping: The VM is enabled and performing a power-off action.

  • Stopped: The VM is enabled and stopped.

  • Failed: The VM is enabled, but it has failed to start.

VM Manager commands

All sub-commands has -n, --name required option to specify which resource should be used.

  • add_colocation: Adds a colocation constraint between resources

  • clone: Creates a copy of the VM

  • create: Generates a new resource from a VM

  • create_snapshot: Creates a snapshot of a resource

  • disable: Stops and removes the resource on the cluster

  • enable: Adds and starts the resource on the cluster

  • get_metadata: Gets a metadata of a resource

  • list: Lists all resources

  • list_metadata: Lists all keys of a resource

  • list_snapshots: Lists all created snapshots

  • purge: Deletes all snapshots of a resource

  • remove: Removes the resource

  • remove_snapshot: Removes a snapshot of a resource

  • rollback: Rollbacks to a snapshot for a resource

  • set_metadata: Sets a metadata of a resource

  • start: Start a resource

  • status: Gets the status of a resource

  • stop: Stops a resource

VM architecture

The diagram below describes how a VM is stored in the SEAPATH cluster. All non-volatile VM data is stored using Ceph, which is in charge of the maintenance of the data-store and data replication between all the hypervisors.

  • VM is stored in a Ceph RBD group named as the VM name

  • VM contains:

  •  

    • Metadata

    • Image data disk

    • Image data snapshots

Metadata provides information associated with a VM. It consists of a list of pairs (key, value) that are set at the moment of the VM creation. You can define as many metadata fields as you want but some keys are reserved:

KEY

VALUE MEANING

KEY

VALUE MEANING

vm_name

VM name

_base_xml

Initial Libvirt XML VM configuration

xml

Libvirt XML file used for the VM configuration. It is autogenerated by modifying the _base_xml file.

VM deployment

The VM data disk is set when creating a new VM or cloning an existing one, as described in the schemas below.

Create a VM

Create a VM from scratch by importing an image disk with the create command:

- name: Create and start guest0 cluster_vm: name: guest0 command: create system_image: my_disk.qcow2 xml: "{{ lookup('file', 'my_vm_config.xml', errors='strict') }}"

Clone a VM

Copy an existing VM with the clone command:

- name: Clone guest0 into guest1 cluster_vm: name: guest1 src_name: guest0 command: clone

VM network configuration

The network configuration inside the VMs is done with the playbook file cluster_setup_network.yaml. You need to use an inventory that describes the VMs instead of the cluster as in the example vms_inventory_example.yaml file.

VM snapshots

Disk snapshots can be used to save the disk image data at a given moment, that can be later recovered.

Snapshot creation

Snapshots can be created when the VM is stopped or running, but if you perform a snapshot when the VM is running, only the data written on the disk will be saved.

Volatile data such as the content of the RAM or the data not written on the disk will not be stored on the snapshot.

 

Snapshot rollback

You can restore the VM to a determined previous state by performing a rollback operation based on a snapshot. The data saved during the snapshot operation will be restored and replace the current disk image data. All current disk image data will be lost. The rollback operation does not remove the snapshot, it is possible to reuse the snapshot to re-apply a later rollback.

The rollback operation must be applied on a disabled machine. So if the VM is enabled, it will be automatically disabled before the rollback and re-enabled once the operation is finished.

Other snapshot operations

With the cluster_vm module it is also possible to:

  • List all snapshots

  • Remove a particular snapshot

  • Remove multiple snapshots by purging:

  •  

    • All of them        

    • The n oldest one   

    • The oldest ones to a specific date

An example playbook that removes the snapshots created before a determined date would be:

The purge operation can be performed regularly to avoid over space. This can be easily done with a tool like Ansible Tower or AWX.

Update a VM

Updating the VM data inside the VM

Updating the VM data cannot be performed by the cluster_vm module, but you can use its snapshot system to cancel the update in case of error as described in the diagram below. To achieve this, you can base your playbook on the update skeleton example.

Updating VM configuration or metadata

The VM configuration and metadata are immutable. To change them, you must create a new VM from the existing one with the clone command.

The file update configuration example can help you to create a playbook to achieve this operation according to the following diagram.

Troubleshooting

This section describes the unstable scenarios that can occur while executing Ansible commands on the cluster and which operations should be performed to recover a stable situation.

Ansible command is interrupted

The execution of a cluster_vm command can be interrupted for different reasons: crash on the hypervisor, network failure, manual stop of the operation… For the commands that modify the system, the interruption might result in an undesirable scenario, where a fix action will be required:

Command

How to fix

Command

How to fix

create

Re-call the command with the force parameter set to true.

clone

remove

Re-call the command.

start

stop

create_snapshot

rollback_snapshot

remove_snapshot

enable

disable

purge_image

Note: purging snapshots according to number or date is not transactional. In case of interruption only a part of them might be removed. In this case, it is necessary to re-call the transaction.

VM cannot be enabled

Enabling a VM on the Pacemaker cluster might fail if its XML configuration is invalid. Pacemaker will detect it and the VM will remain in a Stopped or Failed state, triggering a Timeout error. The commands that can enable a VM are:

Command

How to fix

Command

How to fix

create

Remove the VM  (*), fix the configuration and try creating it again.

clone

rollback_snapshot

enable

(*) Note: Calling the create or clone commands with the force parameter set to true will automatically remove the VM before its creation.

“VM is not on the cluster” error

If the VM is not enabled on the Pacemaker cluster there are three commands that will fail with the “VM is not on the cluster” error.

Command

Error message

How to fix

Command

Error message

How to fix

start

VM is not on the cluster.

VM has to be created and enabled on the cluster.

stop

disable

Unnecessary action / accessing nonexistent VM, snapshot or metadata

Creating a VM or snapshot that already exists or trying to access a nonexistent VM, snapshot or metadata will fail according to the following errors:

Command

Error message

How to fix

Command

Error message

How to fix

create

VM already exists.

Choose a nonexistent VM name

 

clone

VM already exists.

Choose an nonexistent VM name.

Error opening image.

Choose an existent VM name.

remove

VM does not exist.

Choose an existing VM name.

list_snapshots

Error opening image.

Choose an existing VM name.

create_snapshot

Error opening image.

Choose an existent VM name.

Snapshot already exists.

Choose a nonexistent snapshot_name.

rollback_snapshot

Error opening image.

Choose an existent VM name.

Snapshot does not exist on VM.

Choose an existent snapshot_name.

remove_snapshot

Error opening image.

Choose an existent VM name.

Error checking if snapshot is protected.

Choose an existent snapshot_name.

purge_image

Error opening image.

Choose an existent VM name.

get_metadata

Error opening image.

Choose an existent VM name.

No metadata for image.

Choose an existent metadata_name.

Invalid parameter name

Names for VMs, snapshots and metadata keys must only contain letters and numbers without spaces. Additionally, metadata has also reserved keys that cannot be used. In case of not following these rules, the commands create, clone and create_snapshot will fail with the error “Parameter must not contain spaces or special chars”.

Command

Error message

How to fix

Command

Error message

How to fix

create

Parameters must not contain spaces or special chars.

Verify VM name and metadata keys.

clone

Verify VM name and metadata keys (src_name and name cannot be the same).

create_snapshot

Verify snapshot_name.