Quota usage refresh in Openstack

Openstack stores quota usage for tenants in the database in quota_usages table. Nova and cinder have by default their own separate databases and in each database you get a new quota_usages table.

The structure of the quota_usages table is as follows

+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| deleted_at | datetime | YES | | NULL | |
| id | int(11) | NO | PRI | NULL | auto_increment |
| project_id | varchar(255) | YES | MUL | NULL | |
| resource | varchar(255) | NO | | NULL | |
| in_use | int(11) | NO | | NULL | |
| reserved | int(11) | NO | | NULL | |
| until_refresh | int(11) | YES | | NULL | |
| deleted | int(11) | YES | | NULL | |
| user_id | varchar(255) | YES | MUL | NULL | |
+---------------+--------------+------+-----+---------+----------------+

 

Remember that quotas are managed per project, so in this table project_id is your navigating key. Project IDs are used to identify projects. For a particular project, you can retrieve the project ID using

openstack project list | grep $PROJECT_NAME

The other interesting fields in the quota_usages table are

resource: for example in nova, it can be “instance, ram, cores, security_groups”

in_use : This is the amount per resource that Openstack “thinks” that project is using

Occasionally the in_use field is not updated properly and you might find yourself in a situation where openstack is reporting usage that doesn’t exist. You have two options at this point

  • Use the nova-manage project quota_usage_refresh command to try to refresh the quota for a specific project. The syntax is something like
nova-manage project quota_usage_refresh --project PROJECT_ID --user USER_ID --key cores
  • If that doesn’t help, you may have to update the MySQL database using the update statement. You will need to restart the respective service after that to see the change

Glance and CEPH backend

Using CEPH as a backend for glance images has slowly become the default deployment methodology in many production deployments. It is usually as easy as creating a new pool in ceph ( glance pool) and creating a user to be associated with glance. The glance CEPH user will normally authenticate using cephx and store images and snapshots in CEPH.

The configuration in glance-api.conf looks something like this on the controller/s

[glance_store]
stores = glance.store.rbd.Store
default_store = rbd
rbd_store_pool = POOLNAME
rbd_store_user = USERNAME
rbd_store_ceph_conf = /etc/ceph/ceph.conf

then in the default location of the CEPH keyrings “/etc/ceph”, you will need to add the keyring for the CEPH user associated with glance.

On CEPH nodes, don’t forget to grant permissions to the glance user to the glance pool. The permissions need to be read/write such that it can create new images and read the existing ones. The default command to create a new user and grant read/write permissions to the pool is:

ceph auth get-or-create client.user mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=glancepool’ -o /etc/ceph/ceph.client.images.keyring

If after you create the pool and configure glance-api and the keyring properly on the controller node you get something like this in nova-conductor.log when provisioning new VMs

WARNING nova.scheduler.utils [req-a3c6f93e-484a-43e0-9e73-5bbdc451b2c6   - - -] Failed to compute_task_build_instances: Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance ID. Last exception: HTTPInternalServerError (HTTP 500)
ERROR nova.scheduler.utils [req-a3c6f93e-484a-43e0-9e73-5bbdc451b2c6  - - -] [instance: ID] Error from last host: HOST (node HOST): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1780, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2016, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance ID was re-scheduled: HTTPInternalServerError (HTTP 500)\n']

check the glance logs as well, you will most likely find a 500 error in glance logs (api.log)

INFO eventlet.wsgi.server [req-a3c6f93e-484a-43e0-9e73-5bbdc451b2c6 ] IP "GET /v2/images/2IDfile HTTP/1.1" 500 139 0.182237

This error is not very indicative, however it means that you would want to check the permissions on the keyring file for the glance user in the /etc/ceph and make sure that the user running glance (by default “glance” ) has read permissions to it at least

For example

ls -l /etc/ceph/ceph.client.glancepool.keyring 
-r--r----- 1 glance glance 64 Oct 27 11:12 /etc/ceph/ceph.client.glancepool.keyring

This permission is the minimum that glance can access the glance pool in CEPH

Have fun !

 

 

VM Cold migrations/resizing in openstack

Cold migrations are an integral piece of any QEMU/KVM deployment. It’s cold or “non-live” as you have to power down the VM, move it to the new host and power it back up. Openstack follows the same procedure when it comes to migrating VMs.

Cold migrations in Openstack are done via the user running the openstack-nova-compute process. This user is “nova” in most cases. In order to allow cold migrations, the user “nova” has to be able to ssh, password-less, to the other compute hosts in the environment. This is why this part of openstack configuration is needed

https://docs.openstack.org/nova/pike/admin/ssh-configuration.html#cli-os-migrate-cfg-ssh

The basic steps for cold migrations are as follows

  • An admin user initiates the migration using either “openstack server migrate” , “nova migrate” or the dashboard
  • nova-scheduler tries to identify a target host based on the scheduler configuration the VM specs
  • openstack-nova-compute uses the “nova” user to ssh to the target host, create the needed /var/lib/nova/instances directories, copy the VM definitions and the disk over to the target system
  • nova-scheduler now starts the new VM on the new host

A common error that you might see if you have the ssh keys misconfigured is a variation of the following error.

2017-10-17 16:38:27.149 8053 ERROR oslo_messaging.rpc.server ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
2017-10-17 16:38:27.149 8053 ERROR oslo_messaging.rpc.server Command: ssh -o BatchMode=yes IP mkdir -p /var/lib/nova/instances/ID
2017-10-17 16:38:27.149 8053 ERROR oslo_messaging.rpc.server Exit code: 255
2017-10-17 16:38:27.149 8053 ERROR oslo_messaging.rpc.server Stdout: u''
2017-10-17 16:38:27.149 8053 ERROR oslo_messaging.rpc.server Stderr: u'Host key verification failed.\r\n'

To test it, try switching to the nova user on the source compute host using “su – nova” and ssh to the target host.

su - nova
ssh target-host

If you get a similar message

Warning: Permanently added the RSA host key for IP address 'IP' to the list of known hosts.

This indicates that: although you have setup the password-less ssh properly, the host key of the target compute host was not “yet” trusted on the source compute host.  This will mostly happen if you copied the keys manually and did not use the ssh-copy-id command

cinder-manage: Did you know about it ?

A tool that’s less known-about for cinder is cinder-manage. You might have run into it during upgrades. The most common use case is

cinder-manage db sync

This is normally executed during upgrades to bring the database to the latest version, or to create the schema for a new installation. But there’s actually additional usages for it. Few of them are

cinder-manage service list

the output will look like that

Binary Host Zone Status State Updated At RPC Version Object Version Cluster 
cinder-scheduler controller-server nova enabled :-) 2017-10-15 19:45:37 4.5 4.5 
cinder-volume controller-server@ceph nova enabled :-) 2017-10-15 19:45:31 4.6 4.6

The output can be used to diagnose issues when cinder-scheduler reports that the volume backend is down although cinder-volume is up. The output of the above command is the only reliable source to show how cinder-scheduler, cinder-volume and cinder-backup status is.

If you have multiple backends for cinder, or use multiple cinder-scheduler/cinder-volume on multiple controller nodes. The output will look like this

Binary Host Zone Status State Updated At RPC Version Object Version Cluster 
cinder-scheduler controller-server1 nova enabled :-) 2017-10-15 19:45:37 4.5 4.5 
cinder-volume controller-server1@ceph nova enabled :-) 2017-10-15 19:45:31 4.6 4.6
cinder-volume controller-server1@ceph2 nova enabled :-) 2017-10-15 19:45:31 4.6 4.6
cinder-scheduler controller-server2 nova enabled :-) 2017-10-15 19:45:37 4.5 4.65
cinder-volume controller-server2@ceph nova enabled XX 2017-10-15 19:45:37 4.6 4.6

As you can see above, there are multiple backends for cinder-volume on controller-server1. One of them is ceph and the other is ceph2 and both are enabled and up. It’s easy to spot that cinder-volume on the controller-server2 is showing as down, so you should expect the ceph backend to not be available. If you check the cinder-volume service using systemctl status, the service itself might be running. If that is the case you need to look deeper to why the ceph backend for cinder-volume is down

If you decide to remove a certain cinder-volume/cinder-scheduler/cinder-backup service from your deployment, you can do that by stopping the service on the controller host, and then removing it using

cinder-manage service remove cinder-scheduler controller-server2
cinder-manage service remove cinder-volume controller-server2

If this small use case got you excited, check out the following uses as well

cinder-manage logs errors
cinder-manage logs syslog
cinder-manage volume delete --> Important in the case of stuck volumes
cinder-manage host list
cinder-manage config list --> you can use it to verify what the running configuration for cinder is

The manual for cinder-manage is at

https://docs.openstack.org/cinder/latest/man/cinder-manage.html

Have fun !

OpenStack Performance tuning

So,  you’ve managed to deploy OpenStack in a production environment, and now you would like to make sure that your precious investment in hardware doesn’t get ruined by poor performance tuning. You might want to consider reading this post.

You have to remember first that OpenStack is a Cloud Computing Enabler framework, i.e. none of the computations/file transfers done by your VMs are processed by OpenStack services. OpenStack relies on Linux native technologies such as libvirt, qemu, KVM, network namespaces and so on to implement various features. So your target for performance tuning are NOT ONLY OpenStack services, they could be Linux native services as well.

Performance tuning for OpenStack services

To do that, consider the following enhancements:

  • One thing that is mostly forgotten after deploying a production environment is disabling verbose and debugging logging. You probably spent sometime deploying your environment and getting it to where it is. And probably during this cycle you had to enable debugging in some services and verbose logging on others. Remember to go back and disable all of these. File IOPS will forsure reduce performance. You can do that by setting those options to false in nova.conf, neutron.conf, cinder.conf and keystone.conf
verbose=false
debug=false
  • OpenStack supporting services: Don’t forget that mariadb/mysql and rabbitmq/amqp are still your environment’s backbone. Slowness in any of the supporting services will directly result in every service within openstack using it being slow. Keep close attention to your mysql bufferes and rabbitmq caches. A good feature in mysql/mariadb is slow query log.  The long_query_time variable has to be set to a value that , if exceeded, will have the query logged to the slow query log. This is good to know if you have database slowness.
    .
  • Is keystone accessing the database for every single transaction? Keystone sits at the heart of all of the services as it’s the auth service. Too many users accessing the environment and generating lots of tokens may choke your mysql database performance. Make sure that keystone is configured to store its tokens in memcached instead.
    .
  • How large is your nova database ? Did you know that OpenStack keeps a record for every instance you create ? I am sure you knew that, but did you know that it also keeps a record after you delete that instance ? Check out this tool to clean up the nova database  https://gist.github.com/mousavian/d68bcd903207366c1bfd
    .
  • Do you have any backends in cinder.conf that you’r not using ? Have you configured multiple of these but only are using one? consider cleaning this out. A short look in cinder logs will show it’s complaining about the unused backend
    .
  • What’s your store for glance images ? Is it a filesystem_store_datadir  sitting on the controller node ? If you don’t have the option to change this one to a network based storage, ensure it sits on a different LUN/HDD/SDD than the OS and Openstack services on the controller. Glance is not only used for providing images during a VM’s boot, those images get cached eventually at the compute hosts so it’s not going to be a big deal for performance. But it is also used for users taking snapshots of their ephemeral VMs. Don’t leave the environment prone to slowness as users take snapshots of their VMs
    .
  • Adding to the previous question, it’s always better to keep the APIs for OpenStack services on a network other than the data network, i.e. the network where glance transfers images, CEPH transfers its RBDs and such.
    .
  • Are you using CEPH ? Is your OSDs replication traffic and RBDs traffic going on the same network ? consider splitting this out to two different networks

This is not all.  But it covers to some extent the common architecture and implementation issues that can affect performance. There are plenty of tutorials out there that discuss how to optimize libvirt, qemu/kvm, Apache , network stack, CEPH. These are all your targets if you want to optimize the actual performance of VMs and volumes. 

 

 

 

 

Private External Networks in Neutron

You might find yourself in a position where you need to restrict access by tenants to specific external networks. In Openstack there’s the notion that external networks are accessible by all tenants and anyone can attach their private router to it. This might not be the case if you want to only allow specific users to access a specific external networks.

There is no way to directly configure this in neutron. I.e. Any external network that you have in your deployment basically can have tenants attach their routers to it and make it their default gateway. In order to work around this, let’s look into how neutron saves router and ports in the neutron database schema , a router is defined as follows

 

MariaDB [neutron]> desc routers$$
+——————+————–+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——————+————–+——+—–+———+——-+
| project_id | varchar(255) | YES | MUL | NULL | |
| id | varchar(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | | NULL | |
| status | varchar(16) | YES | | NULL | |
| admin_state_up | tinyint(1) | YES | | NULL | |
| gw_port_id | varchar(36) | YES | MUL | NULL | |
| enable_snat | tinyint(1) | NO | | 1 | |
| standard_attr_id | bigint(20) | NO | UNI | NULL | |
| flavor_id | varchar(36) | YES | MUL | NULL | |
+——————+————–+——+—–+———+——-+

each router has an id, name, project ID where it’s created under. You will notice also the field gateway_port_id. This is the port that connects the tenant router to its default gateway. i.e. your external network

Each router has a unique port for gateway. Tenant routers do not share a common port. Let’s look how a port looks like in the database schema

MariaDB [neutron]> desc ports$$
+——————+————–+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+——————+————–+——+—–+———+——-+
| project_id | varchar(255) | YES | MUL | NULL | |
| id | varchar(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | | NULL | |
| network_id | varchar(36) | NO | MUL | NULL | |
| mac_address | varchar(32) | NO | | NULL | |
| admin_state_up | tinyint(1) | NO | | NULL | |
| status | varchar(16) | NO | | NULL | |
| device_id | varchar(255) | NO | MUL | NULL | |
| device_owner | varchar(255) | NO | | NULL | |
| standard_attr_id | bigint(20) | NO | UNI | NULL | |
| ip_allocation | varchar(16) | YES | | NULL | |
+——————+————–+——+—–+———+——-+

As you can see , a port has an id and a network_id where it’s attached to. Note that in the ports table, network_id refer to both external and “tenant” networks.

If we know our external network ids, we can tell what ports are attached to them, and possibly enable/disable future attachments. To know our external network ids, it’s easy to run

(neutron) net-external-list

This will show you the IDs for the external networks and then with a simple query you can select from the ports table what ports are attached to your external network

select id from ports where network_id=$NETWORK_ID’ $$

This returns a list of the ports currently connected to your external network.

If you want to disable tenants from attaching anything (routers or floating IPs) to this external network, you can acheive this by using a BEFORE TRIGGER in mysql

DELIMITER $$

create trigger ports_insert before insert on ports for each row begin IF (new.network_id = ‘$NETWORK_ID’) then set new.id = NULL ; END IF ; END $$

 

This trigger basically changes the insert statement that neutron writes to the database when a tenant attaches a router to your external network. It sets the ID of the new port to NULL, which is invalid for this field as seen from the above description of the ports table. This effectively disables any routers/floating ips to be attached to the external network you choose. But remember , you’r also included in that, you can’t attach anything to this external network even as admin. You can always tweak the trigger to check project_id field and only restrict access to specific projects

 

 

 

 

 

 

 

 

 

 

 

 

Busy Cinder volumes & Ceph

If you run into an issue where a Cinder volume you attached to a VM can not be deleted even after detaching it from the VM, and when you look into the logs you find something like

ERROR cinder.volume.manager ....... Unable to delete busy volume.

or

WARNING cinder.volume.drivers.rbd ......... ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.

There are multiple scenarios that might cause these errors, among which are:

  • Scenario 1: First error message mentioned above, You mighthave created a snapshot of the volume, whether inside cinder or directly from ceph rbd command line. Ceph will not allow you to delete a volume that has snapshots attached to it. The snapshots on the volume can be listed by
    • rbd snap ls POOLNAME/VOLUMEid
    • And then the snapshots can be purged by (only if the snapshots were created outside cinder) :
    • rbd snap purge POOLNAME/VOLUMEid

      If you have the volume snapshots created inside cinder , it’s definitely better to clear them from inside cinder instead.

  • Scenario 2: The other scenario is that libvirt on one of the compute nodes is still attached to that volume (the second error message above). This could happen if the VM did not terminate correctly or the detachment didn’t actually happen. To verify that , you will need to list the watchers of the rbd using
    • rbd status POOLNAME/VOLUMEid
    • This will show you the IP of the watcher (the compute node in this case) and the cookie used for the connection

One possibility of this scenario is that a VM did not fully release the volume, i.e detach. To release it, you will have to restart the VM making sure that qemu process has no reference to the volume ID. You might have read that you need reboot the compute node, to release the detachment,  but you don’t have to do that if you can just restart the VM with ensuring no attachment to the volume in the qemu process.

Hope that helps !