Cephadm: Good bye ceph-deploy
As you probably may know, ceph-deploy, the beloved deployment utility for CEPH, is no longer maintained. Cephadm is the new tool/package to deploy CEPH clusters.
CERN has a pretty good intro PDF to it. Cephadm includes many nice features including the ability to adopt running CEPH clusters.
Two quick notes that’ll save you some time
- While adding hosts using the
ceph orch host add hostname
You need to specify the IP of the host as follows
ceph orch host add hostname IP.IP.IP.IP
if you get the following error, despite injecting the ssh keys correctly
Error ENOENT: Failed to connect to hostname (hostname). Check that the host is reachable and accepts connections using the cephadm SSH key you may want to run: > ssh -F =(ceph cephadm get-ssh-config) -i =(ceph config-key get mgr/cephadm/ssh_identity_key) root@hostname
- When adding OSDs
If you are deploying a cluster with a “relatively” moderate number of OSDs per host, you may run into the following error scenario while using:
ceph orch apply osd --all-available-devices
The command basically adds the available hdds/ssds to be part of your cluster. Under the hood, this is done by running a docker container that’s in charge of that OSD. Basically the following command is run
/bin/bash /var/lib/ceph/{FSID}/osd.{NUM}/unit.run
It does that for every available OSD in your hosts. You may find that some of the OSDs don’t start and are stuck in error start despite your efforts to use
ceph orch daemon restart osd.xx
If you dig deeper (by executing the docker shell directly or looking into the logs) , you will find the following self-explanatory error
/var/lib/ceph/osd/ceph-xx/block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr
The solution is simply to set the asynchronous non-blocking IO into a higher value using
sudo sysctl -w fs.aio-max-nr=1048576
If that solves your issue, apply it to sysctl.conf to persist
Happy cephadmining 🙂
Leave a Reply