cinder-manage: Did you know about it ?

A tool that’s less known-about for cinder is cinder-manage. You might have run into it during upgrades. The most common use case is

cinder-manage db sync

This is normally executed during upgrades to bring the database to the latest version, or to create the schema for a new installation. But there’s actually additional usages for it. Few of them are

cinder-manage service list

the output will look like that

Binary Host Zone Status State Updated At RPC Version Object Version Cluster 
cinder-scheduler controller-server nova enabled :-) 2017-10-15 19:45:37 4.5 4.5 
cinder-volume controller-server@ceph nova enabled :-) 2017-10-15 19:45:31 4.6 4.6

The output can be used to diagnose issues when cinder-scheduler reports that the volume backend is down although cinder-volume is up. The output of the above command is the only reliable source to show how cinder-scheduler, cinder-volume and cinder-backup status is.

If you have multiple backends for cinder, or use multiple cinder-scheduler/cinder-volume on multiple controller nodes. The output will look like this

Binary Host Zone Status State Updated At RPC Version Object Version Cluster 
cinder-scheduler controller-server1 nova enabled :-) 2017-10-15 19:45:37 4.5 4.5 
cinder-volume controller-server1@ceph nova enabled :-) 2017-10-15 19:45:31 4.6 4.6
cinder-volume controller-server1@ceph2 nova enabled :-) 2017-10-15 19:45:31 4.6 4.6
cinder-scheduler controller-server2 nova enabled :-) 2017-10-15 19:45:37 4.5 4.65
cinder-volume controller-server2@ceph nova enabled XX 2017-10-15 19:45:37 4.6 4.6

As you can see above, there are multiple backends for cinder-volume on controller-server1. One of them is ceph and the other is ceph2 and both are enabled and up. It’s easy to spot that cinder-volume on the controller-server2 is showing as down, so you should expect the ceph backend to not be available. If you check the cinder-volume service using systemctl status, the service itself might be running. If that is the case you need to look deeper to why the ceph backend for cinder-volume is down

If you decide to remove a certain cinder-volume/cinder-scheduler/cinder-backup service from your deployment, you can do that by stopping the service on the controller host, and then removing it using

cinder-manage service remove cinder-scheduler controller-server2
cinder-manage service remove cinder-volume controller-server2

If this small use case got you excited, check out the following uses as well

cinder-manage logs errors
cinder-manage logs syslog
cinder-manage volume delete --> Important in the case of stuck volumes
cinder-manage host list
cinder-manage config list --> you can use it to verify what the running configuration for cinder is

The manual for cinder-manage is at

Have fun !

OpenStack Performance tuning

So,  you’ve managed to deploy OpenStack in a production environment, and now you would like to make sure that your precious investment in hardware doesn’t get ruined by poor performance tuning. You might want to consider reading this post.

You have to remember first that OpenStack is a Cloud Computing Enabler framework, i.e. none of the computations/file transfers done by your VMs are processed by OpenStack services. OpenStack relies on Linux native technologies such as libvirt, qemu, KVM, network namespaces and so on to implement various features. So your target for performance tuning are NOT ONLY OpenStack services, they could be Linux native services as well.

Performance tuning for OpenStack services

To do that, consider the following enhancements:

  • One thing that is mostly forgotten after deploying a production environment is disabling verbose and debugging logging. You probably spent sometime deploying your environment and getting it to where it is. And probably during this cycle you had to enable debugging in some services and verbose logging on others. Remember to go back and disable all of these. File IOPS will forsure reduce performance. You can do that by setting those options to false in nova.conf, neutron.conf, cinder.conf and keystone.conf
  • OpenStack supporting services: Don’t forget that mariadb/mysql and rabbitmq/amqp are still your environment’s backbone. Slowness in any of the supporting services will directly result in every service within openstack using it being slow. Keep close attention to your mysql bufferes and rabbitmq caches. A good feature in mysql/mariadb is slow query log.  The long_query_time variable has to be set to a value that , if exceeded, will have the query logged to the slow query log. This is good to know if you have database slowness.
  • Is keystone accessing the database for every single transaction? Keystone sits at the heart of all of the services as it’s the auth service. Too many users accessing the environment and generating lots of tokens may choke your mysql database performance. Make sure that keystone is configured to store its tokens in memcached instead.
  • How large is your nova database ? Did you know that OpenStack keeps a record for every instance you create ? I am sure you knew that, but did you know that it also keeps a record after you delete that instance ? Check out this tool to clean up the nova database
  • Do you have any backends in cinder.conf that you’r not using ? Have you configured multiple of these but only are using one? consider cleaning this out. A short look in cinder logs will show it’s complaining about the unused backend
  • What’s your store for glance images ? Is it a filesystem_store_datadir  sitting on the controller node ? If you don’t have the option to change this one to a network based storage, ensure it sits on a different LUN/HDD/SDD than the OS and Openstack services on the controller. Glance is not only used for providing images during a VM’s boot, those images get cached eventually at the compute hosts so it’s not going to be a big deal for performance. But it is also used for users taking snapshots of their ephemeral VMs. Don’t leave the environment prone to slowness as users take snapshots of their VMs
  • Adding to the previous question, it’s always better to keep the APIs for OpenStack services on a network other than the data network, i.e. the network where glance transfers images, CEPH transfers its RBDs and such.
  • Are you using CEPH ? Is your OSDs replication traffic and RBDs traffic going on the same network ? consider splitting this out to two different networks

This is not all.  But it covers to some extent the common architecture and implementation issues that can affect performance. There are plenty of tutorials out there that discuss how to optimize libvirt, qemu/kvm, Apache , network stack, CEPH. These are all your targets if you want to optimize the actual performance of VMs and volumes.