Cephadm: Good bye ceph-deploy

As you probably may know, ceph-deploy, the beloved deployment utility for CEPH, is no longer maintained. Cephadm is the new tool/package to deploy CEPH clusters.

CERN has a pretty good intro PDF to it.  Cephadm includes many nice features including the ability to adopt running CEPH clusters.

Two quick notes that’ll save you some time

  • While adding hosts using the
ceph orch host add hostname

You need to specify the IP of the host as follows

ceph orch host add hostname IP.IP.IP.IP

if you get the following error, despite injecting the ssh keys correctly

Error ENOENT: Failed to connect to hostname (hostname).  Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to run: 
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@hostname
  • When adding OSDs

If you are deploying a cluster with a “relatively” moderate number of OSDs per host, you may run into the following error scenario while using:

 ceph orch apply osd --all-available-devices

The command basically adds the available hdds/ssds to be part of your cluster. Under the hood, this is done by running a docker container that’s in charge of that OSD. Basically the following command is run

/bin/bash /var/lib/ceph/{FSID}/osd.{NUM}/unit.run

It does that for every available OSD in your hosts. You may find that some of the OSDs don’t start and are stuck in error start despite your efforts to use

ceph orch daemon restart osd.xx

If you dig deeper (by executing the docker shell directly or looking into the logs) , you will find the following self-explanatory error

 /var/lib/ceph/osd/ceph-xx/block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr

The solution is simply to set the asynchronous non-blocking IO into a higher value using

sudo sysctl -w fs.aio-max-nr=1048576

If that solves your issue, apply it to sysctl.conf to persist

Happy cephadmining 🙂




Migrating VMs with attached RBDs

From the title, this is obviously a very common scenario that you may want to do. One thing that we rarely think about though is “backends” for the attached volumes when we create volumes.

When you create a volume, the volume is created on a cinder backend and kept attached to this backend until it’s deleted , or migrated to another backend. The backends are defined in cinder configuration and are provided by your host(s) running the cinder-volume service. To find your backends, run the following command

cinder get-pools

When you attach the volume to a VM,  the volume keeps its backend. It relies on this backend to do any operations to that volume. This includes migrating the VM from a host to another.

You may run into a scenario where you get this error when trying to migrate a VM with attached RBD

 CinderConnectionFailed: Connection to cinder host failed: Unable to establish connection to

But when you go and check, Cinder is working correctly. You are able to create new volumes and attach them to instances. But a particular VM is unable to migrate. You may find also you’r unable to snapshot the volume attached to the VM. The thing to check for here is the RBD backend of the volume

You can find this using

cinder show VOLUME_ID

this will show you alot of details on the volume including the following attribute

| os-vol-host-attr:host | HOSTNAME@ceph#RBD |

HOSTNAME will likely be “one” of your controllers. You will need to go and check that cinder-volume service is running correctly on that controller. If it’s down, you can’t operate that volume for anything (snapshots, attach/detach and migrate)

If you’ve lost your controller forever, or you were testing a new backend that no longer exists, then you might want to migrate the volume from the dead backend. This is detailed in the following manual


Happy VM migrations !

Busy Cinder volumes & Ceph

If you run into an issue where a Cinder volume you attached to a VM can not be deleted even after detaching it from the VM, and when you look into the logs you find something like

ERROR cinder.volume.manager ....... Unable to delete busy volume.


WARNING cinder.volume.drivers.rbd ......... ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.

There are multiple scenarios that might cause these errors, among which are:

  • Scenario 1: First error message mentioned above, You mighthave created a snapshot of the volume, whether inside cinder or directly from ceph rbd command line. Ceph will not allow you to delete a volume that has snapshots attached to it. The snapshots on the volume can be listed by
    • rbd snap ls POOLNAME/VOLUMEid
    • And then the snapshots can be purged by (only if the snapshots were created outside cinder) :
    • rbd snap purge POOLNAME/VOLUMEid

      If you have the volume snapshots created inside cinder , it’s definitely better to clear them from inside cinder instead.

  • Scenario 2: The other scenario is that libvirt on one of the compute nodes is still attached to that volume (the second error message above). This could happen if the VM did not terminate correctly or the detachment didn’t actually happen. To verify that , you will need to list the watchers of the rbd using
    • rbd status POOLNAME/VOLUMEid
    • This will show you the IP of the watcher (the compute node in this case) and the cookie used for the connection

One possibility of this scenario is that a VM did not fully release the volume, i.e detach. To release it, you will have to restart the VM making sure that qemu process has no reference to the volume ID. You might have read that you need reboot the compute node, to release the detachment,  but you don’t have to do that if you can just restart the VM with ensuring no attachment to the volume in the qemu process.

Hope that helps !