Here are the slides from my presentation at both ApacheCon and CloudStack Day in Austin, TX.
Archives For cloudstack
As KVM seems more and more interesting, at work we wanted to do a Proof-of-Concept. The KVM hypervisor cluster had to be controlled by CloudStack and also integrate with NSX (formerly known as Nicira).
NSX is owned by VMware these days and is one of the first Software Defined Networking solutions. At Schuberg Philis we use this since early 2012.
Choosing an OS
To me, the most interesting part of KVM is the fact you only need a very basic Linux box with some tooling and you have a nice modern hypervisor ready to rock. Since we’re using CloudStack to orchestrate everything, we do not need cluster features. In fact, this prevents the “two captain” problem that we sometimes encounter with XenServer and VMware ESX. We compared Ubuntu with CentOS/RHEL and both work fine. It all depends on your needs.
Installing the software is pretty straight forward:
yum install kvm libvirt python-virtinst qemu-kvm bridge-utils pciutils
apt-get install qemu-kvm libvirt-bin bridge-utils virt-manager openntpd
Installing Open vSwitch
Open vSwitch is a multilayer virtual switch and it brings a lot of flexibility in the way you can create interfaces and bridges in Linux. There are two options here. If you need STT tunnels, you need the NSX patched version of Open vSwitch. If you need VXLAN or GRE tunnels, you can use the open source version that comes with Ubuntu and CentOS. Both ship version 2.3.1 which works perfectly fine.
yum install openvswitch kmod-openvswitch
apt-get install openvswitch-switch
Configuring Open vSwitch
Instead of classic Linux bridges, we use Open vSwitch bridges. In our POC lab environment, we were using HP DL380 G9 servers that have 2 10Gbit NICs to two Arista switches. They run a LACP bond and on top of this we create the bridges for KVM to use. Because we setup the Open vSwitch networking over and over again while debugging and testing different OS’es, I wrote a script that can quickly configure networking. You can find it at Github.
To give some quick pointers:
Create a bridge:
ovs-vsctl add-br cloudbr0
Create an LACP bond:
ovs-vsctl add-bond cloudbr0 bond0 eno49 eno50\ bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
Create a so-called fake bridge (with a VLAN tag):
ovs-vsctl add-br mgmt0 cloudbr0 123
Get an overview of current configuration:
Get an overview of current bond status:
---- bond0 ---- bond_mode: balance-tcp bond may use recirculation: yes, Recirc-ID : 300 bond-hash-basis: 0 updelay: 0 ms downdelay: 0 ms next rebalance: 9887 ms lacp_status: negotiated active slave mac: fc:00:00:f2:00(eno50) slave eno49: enabled may_enable: true hash 139: 3 kB load slave eno50: enabled active slave may_enable: true hash 101: 1 kB load hash 143: 5 kB load hash 214: 5 kB load hash 240: 5 kB load
Add hypervisor to NSX
For now I assume you already have a NSX cluster running capable of acting as a controller/manager for Open vSwitch. If you don’t know NSX, have a look because it’s awesome.
We do need to connect our Open vSwitch to the NSX cluster. To do that, you need a SSL certificate. This is how you generate one:
cd /etc/openvswitch ovs-pki req ovsclient ovs-pki self-sign ovsclient ovs-vsctl -- --bootstrap set-ssl \ "/etc/openvswitch/ovsclient-privkey.pem" \ "/etc/openvswitch/ovsclient-cert.pem" \ /etc/openvswitch/vswitchd.cacert
Next, add the hypervisor to NSX. You can either set the authentication to be ip-address based (it will then exchange certificates on connect) or copy/paste the certificate (ovsclient-cert.pem) to NSX at once. The first method allows for easier automation. I’m showing the UI here, but of course you can also use the API to add the hypervisor.
The final step is to connect Open vSwitch to NSX:
ovs-vsctl set-manager ssl:10.10.10.10:6632
Then NSX should show green lights and tunnels are being created.
To get an idea of what’s going on, you can run:
ovs-vsctl list manager ovs-vsctl list controller ovs-vsctl show
Debugging this can be done from the command line (check Open vSwitch logs) or from NSX.
At this point, NSX is controlling our Open vSwitch.
Setting up the CloudStack agent
When running KVM, CloudStack runs an agent on the hypervisor in order to configure VMs.
Installing the agent is simply installing a RPM or DEB package. Depending on the version you will use different repositories. At Schuberg Philis, we build our own packages that we serve in a repository.
Because we’re using Open vSwitch, some settings need to be tweaked in the agent.properties file, found in /etc/cloudstack/agent.
echo "libvirt.vif.driver=com.cloud.hypervisor.kvm.resource.OvsVifDriver" \ >> /etc/cloudstack/agent/agent.properties echo "network.bridge.type=openvswitch" \ >> /etc/cloudstack/agent/agent.properties
You may also want to set the log level to debug:
sed -i 's/INFO/DEBUG/g' /etc/cloudstack/agent/log4j-cloud.xml
CloudStack requires some KVM related settings to be tweaked:
# Libvirtd echo 'listen_tls = 0' >> /etc/libvirt/libvirtd.conf echo 'listen_tcp = 1' >> /etc/libvirt/libvirtd.conf echo 'tcp_port = "16509"' >> /etc/libvirt/libvirtd.conf echo 'mdns_adv = 0' >> /etc/libvirt/libvirtd.conf echo 'auth_tcp = "none"' >> /etc/libvirt/libvirtd.conf # libvirt-bin.conf sed -i -e 's/libvirtd_opts="-d"/libvirtd_opts="-d -l"/' \ /etc/init/libvirt-bin.conf service libvirt-bin restart # qemu.conf sed -i -e 's/\#vnc_listen.*$/vnc_listen = "0.0.0.0"/g' \ /etc/libvirt/qemu.conf
On CentOS 7 Systemd ‘co-mounts’ cpu,cpuacct cgroups, and this causes issues for launching a VM with libvirt. On the mailing list this is the suggested fix
Edit /etc/systemd/system.conf and pass empty string to JoinControllers parameter. Then rebuild the initramfs via ‘new-kernel-pkg –mkinitrd –install `uname -r`’.
Something I currently don’t like: SELinux and AppArmour need to be disabled. I will dive into this and get it fixed. For now, let’s continue:
#AppArmour (Ubuntu) ln -s /etc/apparmor.d/usr.sbin.libvirtd /etc/apparmor.d/disable/ ln -s /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper /etc/apparmor.d/disable/ apparmor_parser -R /etc/apparmor.d/usr.sbin.libvirtd apparmor_parser -R /etc/apparmor.d/usr.lib.libvirt.virt-aa-helper # SELinux (CentOS) setenforce permissive vim /etc/selinux/config SELINUX=permissive
You can now add the host to CloudStack, either via de UI or the API.
Keep an eye on the agent log file:
After a few minutes, the hypervisor is added and you should be able to spin up virtual machines! 🙂
When we spin up a VM, CloudStack does the orchestration. So, CloudStack is the one to talk to NSX to provide the network (lswitch) and NSX communicates with Open vSwitch on the hypervisor. The VM is provisioned by CloudStack and KVM/Libvirt makes sure the right virtual interfaces are plugged in Open vSwitch. This way VMs on different hypervisors can communicate over their own private guest network. All dynamically created without manual configuration. No more VLANs!
If it does not work right away, look at the different log files and see what happens. There usually are hints that help you solve the problem.
KVM hypervisors can be connected to NSX and using Open vSwitch you can build a Software Defined Networking setup. CloudStack is the orchestrator that connects the dots for us. I’ve played with this setup for some time now and I find it very fast. We’ll keep testing and probably create some patches for CloudStack. Great to see that the first KVM related pull request I sent is already merged 🙂
Looking forward to more KVM!
We had to migrate from one controller to another and that could easily be done by changing the Open vSwitch configuration on the hypervisors, like this:
ovs-vsctl set-manager ssl:10.18.59.84:6632
It will then get a list of all nodes and use those to communicate.
Although this works, I found that when rebooting the hypervisor it would revert to the old setting. Also, when a pool master fail-over happened, Xapi ran a xe toolstack-restart and that caused the whole cluster to revert to the old setting. Oops.
Changing it in Xapi was the solution:
xe pool-set-vswitch-controller address=10.18.59.84
Now the change is persistent 🙂
Due to the Ghost bug aka CVE-2015-0235, we had to upgrade 500+ system vm’s. We’re running CloudStack 4.4.2. The version of the systemvm template it uses was 4.4.1 and so we created 4.4.2 and used that instead. It was quite some work to get it done so we thought it was worth sharing how we did it in this blog. I did this work together with my Schuberg Philis colleague Daan Hoogland.
1. Build new CloudStack RPM’s with MinVRVersion set to 4.4.2
Basically this was a single digit change in a single file (api/src/com/cloud/network/VirtualNetworkApplianceService.java).
2. Build new systemvm with latest patches
Obviously we had to build a new systemvm template with this same version. We used the latest Debian 7 release:
And set the version also to 4.4.2:
3. Upload the template to CloudStack
Upload the template as admin user. We couldn’t use systemvm-xenserver-4.4 as a name, because it was already there. So we gave it a temporary name: systemvm-xenserver-4.4.2.
Wait until they are READY.
4. Stop CloudStack management servers
Unfortunately you need to stop CloudStack. First of all because we’re going to upgrade the RPM’s. Second to get the new template registered.
Also, since we will hack the SQL database in the next step (or should I say: we used the SQL-API) it’s better to do this when CloudStack is not running.
5. Hack the SQL database
When the templates are downloaded, we made the following changes:
– renamed systemvm-xenserver-4.4 to systemvm-xenserver-4.4-old
– renamed systemvm-xenserver-4.4.2 to systemvm-xenserver-4.4
– set the type of systemvm-xenserver-4.4.2 to SYSTEM
Get an overview with this query:
SELECT * FROM cloud.vm_template where type='SYSTEM';
Example of update query:
UPDATE `cloud`.`vm_template` SET `name`='systemvm-xenserver-4.4', `type`='SYSTEM' WHERE `id`='2152'; UPDATE `cloud`.`vm_template` SET `name`='systemvm-vmware-4.4', `type`='SYSTEM' WHERE `id`='2153';
As you can see, in our case the old and new template id’s were as follows:
1952: old template, 2152 new
1953: old template, 2153 new
Finally, you need to update the vm_template_id from old -> new like in this example:
UPDATE `cloud`.`vm_instance` SET `vm_template_id`='2152' WHERE `vm_template_id`='1952' and removed is NULL; UPDATE `cloud`.`vm_instance` SET `vm_template_id`='2153' WHERE `vm_template_id`='1953' and removed is NULL;
6. Install new CloudStack RPM’s
Now that CloudStack is still down, upgrade the RPM’s. This is a quick install as there are almost no changes.
7. Start the management servers
It’s time to start the management servers again. When it’s ready, check the virtual routers:
All flags RequiresUpgrade are set!
8. Destroy SSVM and CP
We had to destroy the Secondary Storage VM’s and Console Proxies for them to get recreated with the new templates. Rebooting did not work.
9. Reboot routers
Just reboot your routers and they get upgraded automatically!
We used internal developed tooling to do this automated. The tools send maintenance notifications to the tenant when their router is upgraded (and when it’s finished). We’ll open source the tools in the coming months, so stay tuned!
I think we need an easier way to do this 😉
When contributing to open source projects, it’s pretty common these days to fork the project on Github, add your contribution, and then send your work as a so-called “pull request” to the project for inclusion. It’s nice, clean and fast. I did this last week to contribute to Apache CloudStack. When I wanted to contribute again today, I had to figure out how to get my “forked” repo up-to-date before I could send a new contribution.
Remember, you can read/write to your fork but only-read from the upstream repository.
Adding upstream as a remote
When you clone your forked repo to a local development machine, you get it setup like this:
git remote -v
origin [email protected]:remibergsma/cloudstack.git (fetch) origin [email protected]:remibergsma/cloudstack.git (push)
As this refers to the “static” forked version, no new commits come in. For that to happen, we need to add the original repo as an extra “remote” that we’ll call “upstream”:
git remote add upstream https://github.com/apache/cloudstack
Now, run the same command again and you’ll see two:
git remote -v
origin [email protected]:remibergsma/cloudstack.git (fetch) origin [email protected]:remibergsma/cloudstack.git (push) upstream https://github.com/apache/cloudstack (fetch) upstream https://github.com/apache/cloudstack (push)
The cloned git repo is now configured to both the forked and the upstream repo.
Let’s fetch the updates from upstream:
git fetch upstream
remote: Counting objects: 151, done. remote: Compressing objects: 100% (123/123), done. remote: Total 151 (delta 39), reused 0 (delta 0) Receiving objects: 100% (151/151), 153.30 KiB | 0 bytes/s, done. Resolving deltas: 100% (39/39), done. From https://github.com/apache/cloudstack 2f2ff4b..49cf2ac 4.4 -> upstream/4.4 aca0f79..66b7738 4.5 -> upstream/4.5 * [new branch] hotfix/4.4/CLOUDSTACK-8073 -> upstream/hotfix/4.4/CLOUDSTACK-8073 85bb685..356793d master -> upstream/master b963bb1..36c0c38 volume-upload -> upstream/volume-upload
We now got the new updates in. Before you continue, be sure to be on the master branch:
git checkout master
Then we will rebase the new changes to our own master branch:
git rebase upstream/master
You can achieve the same by merging, but rebasing is usually cleaner and doesn’t add the extra merge commit.
Updating 4e1527e..356793d Fast-forward SSHKeyPairResponse.java | 12 ++++++++++++ SolidFireSharedPrimaryDataStoreLifeCycle.java | 33 +++++++++++++++++++++++++++++++++ RulesManagerImpl.java | 2 +- ManagementServerImpl.java | 5 +---- 4 files changed, 47 insertions(+), 5 deletions(-)
Finally, update your fork at Github with the new commits:
git push origin master
Branches other than master
Imagine you want to track another branch and sync that as well.
git checkout -b 4.5 origin/4.5
This will setup a local branch called ‘4.5’ that is linked to ‘origin/4.5’.
If you want to get them in sync again later on, the workflow is similar to above:
git checkout 4.5 git fetch upstream git rebase upstream/4.5 git push origin 4.5
Automating this process
I wrote this script to synchronise my clones with upstream:
#!/bin/bash # Sync upstream repo with fork repo # Requires upstream repo to be defined # Check if local repo is specified and exists if [ -z $1 ]; then echo "Please specify repo to sync: $0 <dir>" exit 1 fi if [ ! -d $1 ]; then echo "Dir $1 does not exist!" exit 1 fi # Go into git repo and update cd $1 # Check upstream git remote -v | grep upstream >/dev/null 2>&1 RES=$? if [ $RES -gt 0 ]; then echo "Upstream repo not defined. Please add it: git remote add http://github.com/..." exit 1 fi # Update and push git fetch upstream git rebase upstream/master git push origin master
Execute like this:
Watch my talk “Start using Configuration Management in 5 steps”, at the CloudStack Collaboration Conference, Denver, Co, USA (April 9-11 2014).
Here are the slides from my presentation at the CloudStack Collaboration Conference in Denver: