Skip to content
Mohamed Elsakhawy, PhD
  • Mohamed Elsakhawy, Ph.D
  • Talks
  • Technical Blog

Cephadm: Good bye ceph-deploy

23 June 2020 0 comments Article CEPH, Technical Blog

As you probably may know, ceph-deploy, the beloved deployment utility for CEPH, is no longer maintained. Cephadm is the new tool/package to deploy CEPH clusters.

CERN has a pretty good intro PDF to it.  Cephadm includes many nice features including the ability to adopt running CEPH clusters.

Two quick notes that’ll save you some time

  • While adding hosts using the
ceph orch host add hostname

You need to specify the IP of the host as follows

ceph orch host add hostname IP.IP.IP.IP

if you get the following error, despite injecting the ssh keys correctly

Error ENOENT: Failed to connect to hostname (hostname).  Check that the host is reachable and accepts connections using the cephadm SSH key
you may want to run: 
> ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@hostname
  • When adding OSDs

If you are deploying a cluster with a “relatively” moderate number of OSDs per host, you may run into the following error scenario while using:

 ceph orch apply osd --all-available-devices

The command basically adds the available hdds/ssds to be part of your cluster. Under the hood, this is done by running a docker container that’s in charge of that OSD. Basically the following command is run

/bin/bash /var/lib/ceph/{FSID}/osd.{NUM}/unit.run

It does that for every available OSD in your hosts. You may find that some of the OSDs don’t start and are stuck in error start despite your efforts to use

ceph orch daemon restart osd.xx

If you dig deeper (by executing the docker shell directly or looking into the logs) , you will find the following self-explanatory error

 /var/lib/ceph/osd/ceph-xx/block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr

The solution is simply to set the asynchronous non-blocking IO into a higher value using

sudo sysctl -w fs.aio-max-nr=1048576

If that solves your issue, apply it to sysctl.conf to persist

Happy cephadmining 🙂

 

 

 

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • My talk at SCaLE 22x
  • SCALE 22x
  • My talk at OpenInfra Days North America
  • Paper Accepted in ATC USENIX
  • Paper accepted at WoSC ‘7

Recent Comments

  1. Hung on Neutron: How a VM communicates
  2. hungpq on Neutron: How a VM communicates
  3. Atul on VM to VM communication: different networks
  4. mohamed on Traffic flows from an Openstack VM
  5. Amit Pawar on Traffic flows from an Openstack VM

Copyright Mohamed Elsakhawy, PhD 2026 | Theme by ThemeinProgress | Proudly powered by WordPress