Ceph RBD snapshots for an attached volume

You might find yourself in a scenario where you need to backup a CEPH volume attached to an Openstack Instance. CEPH snapshots come automatically to mind as the “state-in-time” solution. Once you take a CEPH snapshot, you can export it and backup the volume either as a physical file or at the file system level, possibly by mounting it.

Openstack allows you to use cinder to initiate the volume snapshots. The other option is to initiate CEPH snapshots yourself using the “rbd snap create” command. In either cases, taking a CEPH snapshot allows you to get the volume in-time state which you can later export using “rbd  export”. The one drawback with snapshotting a volume attached to a running VM is that the snapshot happens without the VM knowing about it. This inheritely might cause file system consistency issues in the backup snapshot and can cause the VM to freeze as the volume becomes briefly unavailable during the snapshot taking.

The solution to the VM freezing issue is to instruct libvirt to enable RBD caching. This can be achieved by adding the following line under the libvirt section in nova.conf on the compute node.


You will need to restart nova services on the compute host, after that RBD caching will be enabled for nova on the compute host and will prevent the VM from freezing after the snapshot is taken. You can find more on RBD caching configuration options in:




VM getting a DHCP address

DHCP requests are broadcast requests sent by the VM to its boradcast domain. If a DHCP server exists in this domain, it will respond back providing a DHCP IP lease following the DHCP protocol. In openstack, the same procedure is followed. A VM starts by sending its DHCP request to its boardcast domain which goes through br-int. Since this is broadcast, it exists br-int as well to br-tun and gets sent to all hosts in the environment using the dedicated tunnel ID for the network.

Once the request reaches the network node, it then reaches a network namespace created specifically to allow the dhcp request to be handled. This DHCP namespace name is qdhcp-{UUID} . The qdhcp namespace looks as follows


Individually, it looks like this


As you can see, the dhcp namespace has a tap interface which is attached to the br-int bridge on the network node. The tap interface is attached to a dnsmasq process. dnsmasq is a service that does manythings (obviously dns included). But it also allows providing dhcp addresses when acting as a dhcp server

On the network node, if you do a ps -ef | grep dns you will see the following


If you would like to see the dhcp namespaces on the network node, you can use ip netns


and if you go inside any of these namespaces, you will see the tap interface that is attached to the dnsmasq process


The IP attached to the dhcp namespace is assigned by default to the tap interface. Note that when you look into the flow rules on br-tun for any compute host, you may find an entry for the MAC address of this tap interface. This is used to prevent sending the dhcp request to every compute host and network host in the environment. Since the flow rules will direct the dhcp request to the VXLAN port that is connecting the compute host to the network node only.





Adding a new node to ceph

If you are expanding your ceph cluster with extra nodes.  You will need to prepare the node to have ceph installed and prepare the OSDs to be part of the ceph cluster. In order to do this, you can use ceph-deploy to install ceph on the new nodes and prepare/activate the osds on it. The procedure is pretty straightforward. 

On the new node, first create the ceph user and enable key-based login to it.

On the node where you have ceph.conf available‚Äč (normally the node you deploy from ) cd into its directory and execute the following as ceph user

ceph-deploy install #Newnodename

ceph-deploy prepare #Newnodename:#Osdmountpoint

ceph-deploy activate #Newnodename:#Osdmountpoint

You can verify that the new OSDs are active via

ceph osd tree

If the OSDs are showing as down and out of the cluster.  You can add them in and bring them up them using. 

ceph osd in osd@#ID


systemctl start ceph-osd@#ID

The above two commands are to be executed on the new node

VM to VM communication: different networks

So far we have only spoken about VM communication when they belong to the same network. But what happens when the VM has to communicate with another VM on a different network. The common rule of networking is that changing networks requires routing. This is exactly what neutron does to allow those kinds of VMs to communicate

One thing to note here is that a VM in openstack is attached to a subnet, which specifies which IP block it will get an IP from. The rule of thumb is, if a subnet is changed then routing has to be involved. This is because an l2 switch doesn’t understand IPs, so changing IP blocks between subnets is not understood by a switch. Remeber switches understand MAC addresses, routers understand IPs

So let’s look at how a two VMs belonging to different networks actually communicate.


The logical diagram is similar to what we see above. Packets flow from VM1 down through the tap, qbr, qvb-qvo, br-int and br-tun. This time though they have to go through routing before reaching VM2. Routing is done in the router namespaces created by the l3-agent on the network node. In order to see it in more details, let’s look at a more realistic diagram of the communication


As you can see in the diagram above, packets flow through the br-tun from the first compute host to the network node. The network node has a very similar logical diagram as the compute node. It has a br-tun & br-int combination to allow it to establish tunnels to the compute hosts and network hosts in the environment and to VLAN tag local traffic. It has some new entities though which are the qrouter namespace and the qdhcp namespace. From their names it’s obvious they are responsible for routing and dhcp.

Traffic from VM1 reaches the br-tun on the network node on its dedicated tunnel (remember dedicated tunnel ID per network ). br-tun at the network node does the VXLAN tunnel ID to VLAN mapping and pushes the traffic up to the br-int on the network node. The br-int pushes the traffic to the router namespace named qrouter-uuid. This traffic received by the qruoter namespace is router and then pushed down again through the br-int and br-tun. This time it leaves the network node over a VXLAN dedicated tunnel ID, which is different from the VM1 tunnel ID (remember different networks so different tunnel ID). The packets are received by the br-tun of the VM2’s compute host and then goes up through br-int the same way till it reaches VM2

The only case where the tunnel IDs are the same is: If the router is connected to two subnets within the SAME network.

Let’s look more into how qrouter namespace is designed


The qrouter namespace has two kinds of interfaces.

  •  qr interfaces: Those are interfaces that connect the qrouter to the subnets that it is routing between
  • qg interface: This is the interface that connects the qrouter to the router’s gateway


If we are assuming a single subnet per network. We can then simplify the qrouter namespace diagram as follows


As you can see above , traffic arrives from a certain network on the qr interface. It gets to the routing table of the qrouter namespace and then either goes to the other qr interface if it’s destined to the other network or to the qg interface if it’s destined to the gateway (for example reaching the public network)

Let’s look at a practical scenario, two VMs connected to two different networks.


A qrouter namespace physically looks like this on the network node.


If we look inside the qrouter namespace we can see the qr and qg interfaces


qr interfaces are assigned the ips for the gateways of each network connected and if we look inside the qrouter name space routing table we will see that the qg interface is the default gateway device and that each qr interface is the gatway for the network it’s connected to


A couple of things to remember

  • qrouter namespaces are only created when subnets are attached to routers , not when routers are created. i.e. an empty router (no connections to gateway of networks) will have no namespace on the network nodes
  • qrouter namespaces are created by the l3-agent
  • Although there are more than one qr interface in the qruoter namespace, there’s only one qg interface. This is because the router will have one gateway
  • the network node will have as many qrouter namespaces as routers you have for your tenants. The traffic flowing those routers is totally isolated via (VXLAN Tunnel ID, VLAN tag, network namespaces combination)

In the next post we will explain the dhcp address acquiring process of  a VM