Archives For 30 November 2011

DRBD (Distributed Redundant Block Device) is an open source storage solution that is best compared with a RAID-1 (mirror) between two servers. I’ve implemented this for both our cloud storage as our cloud management servers.

We’re in the process of replacing both cloud storage nodes (that is: everything except the disks and its raid array) and of course no downtime is allowed. Although DRBD is made for redundancy (one node can be offline without impact), completely replacing a node is a bit tricky.

Preventing a ‘split brain’ situation
The most important thing to remember is that only one storage node is allowed to be the active node at all times. If this is violated, a so-called ‘split brain’ happens. DRBD has methods in surviving such a state, but it is best to prevent it from happening.

When discussing this project in our Team, it was suggested to boot the replaced storage node without any networking cables. Usually this is a safe way to prevent the node from interacting with others. In this case, it is not such a good idea: since the new secondary server has the same disks and configuration as the old one unexpeced thing may happen. When booting without networking cables attached, both nodes cannot find one another and the newly booted secondary may decode to become primary itself. A split brain situation will then occur: both nodes will be primary at the same time and access the data. You won’t be able to recover from this, unless you manually decide which node is the master (and lose the changes on the other node). In this case this can be easily decided, but it’s a lot of unneccessary trouble.

Instead, boot the replaced node with network cables connected so the replication network will be up and both nodes immediately will see each other. In our case, this means connecting the 10Gbps connection between the nodes. This connection is used by DRBD for syncing and by Heartbeat for sending the heartbeats. This prevents entering the ‘split brain’ state and immediately starts syncing.

Note: If you want to replace everything including the disks, you’ll have to manually join the cluster with the new secondary node and then sync the data. In this case it doesn’t matter whether the networking cables are connected or not, since this new node won’t be able to become primary anyway.

The procedure
Back to our case: replacing all hardware except for the disks. We managed to successfully replace both nodes using this procedure:

  1. shut down the secondary node
  2. replace the hardware, install the existing disks and 10Gbps card
  3. boot the node with at least the 10Gbps connection active
  4. the node should sync with the primary
  5. when syncing finishes, redundancy is restored
  6. make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
  7. when all is fine, check if DRBD and Heartbeat are running without errors on both nodes
  8. then stop heartbeat on the primary node. A fail-over to the new secondary node will occur
  9. if all went well, you can now safely shut down the old primary
  10. replace the hardware, install the existing disks and 10Gbps card
  11. boot the node with at least the 10Gbps connection active
  12. the node should sync with the primary
  13. when syncing finishes, redundancy is restored
  14. make sure all other networking connections are working. Since the main board was replaced some MAC-addresses changes. Update UDEV accordingly
  15. the old primary is now secondary
  16. If you want, initialize another fail-over (In our case we didn’t fail-over again, since both nodes are equal powerful)

Congratulations: the cluster is redundant again with the new hardware!

Using the above procedure, we replaced both nodes of our DRBD storage cluster without any downtime.

I’ve been working a lot with CloudStack Advanced networking and find it very flexible. Recently, I had another opportunity to test its flexibility when a customer called: “We want VM’s in your CloudStack cloud, but these VM’s are only allowed be reachable from our office, and not from the public internet”. Firewalling? No, they required us to use their VPN solution.

Is CloudStack flexible enough for this to work? Yes, it is. In this blog I’ll tell you how we did it. And it doesn’t even matter what VPN device you use. This will work regardless of brand and features, as lang as it supports a public ip-address to connect over the internet to another VPN device, and has a private network behind it. All VPN devices I know of support these basic features.

VPN (Virtual Private Networking)
The client’s office is connected to the internet and has a VPN device. We received another device as well to host in our data center and the two talk to each other over the public internet in a secure way. Probably speaking IPsec or similar but that is beyond the scope of this blog.

The VPN device in the data center has a public ip-address on its WAN port but also has some ports for the internal network. We configured it to use the same network CIDR as we did in the CloudStack network we created for this customer. Let’s use 10.10.16.0/24 as an example in this blog. And now the problem: this cloud network is a tagged network and the VPN device we received is not VLAN-capable.

VLANs in the Cloud
CloudStack Advanded networking relies on VLANs. Every VLAN has its own unique ID. Switches use this VLAN ID to keep the VLAN networks apart and make sure they’re isolated from each other. Most switches support VLANs as well, and that’s were we’ll find the solution to this problem.

Configuring the switch

We connected the VPN device to our switch and set its port to UNTAGGED for the VLAN ID the CloudStack network uses. In other words, devices connected to this port now do not need to know about the VLAN. The switch will add it as traffic flows. This means the VPN device will use an ip-address in the 10.10.16.0/24 range and is able to communicate with the VM’s in the same network. The CloudStack compute nodes have their switch ports set to TAGGED and the switch makes communication between them possible.

Overview of ip-addresses:

  • 10.10.16.1 – the internal VPN device ip-address in the data center
  • 10.10.16.11 – the first VM’s ip-address
  • 10.10.16.12 – the second VM’s ip-address

The VM’s have their default gateway set to the VPN device’s 10.10.16.1 address. Also, the office needs to be configured in a way it knows the 10.10.16.0/24 network is handled by the VPN device located there. Users in the office will now be able to access the VM’s on the 10.10.16.0/24 network.

Conclusion
While the VM’s are hosted on our CloudStack cloud on the internet, they do not have public ip-addresses and thus are not reachable. The only public ip-address for this customer is the one configured on the VPN device in the data center. This provides the same level of security as you’d have with physical servers but adds the power of a cloud solution.

Thanks to the flexibility of the CloudStack Advanced Networking this cloud be done!

When migrating an ip-address to another server, you will notice it will take anywhere between 1 and 15 minutes for the ip-address to work on the new server. This is caused by the arp cache of the switch or gateway on the network. But don’t worry: you don’t just have to wait for it to expire.

Why it happens
ARP (Address Resolution Protocol) provides a translation between ip-addresses and mac-addresses. Since the new server has another mac-address and the old one stays in the cache for some time, connections will not yet work. The cache usually only exists for some minutes and prevents asking for the mac-address of a certain ip-address over and over again.

One solution to this problem is to send a command to the gateway to tell it to update its cached mac-address. You need the ‘arping’ utility for this.

Installing arping
There are two packages in Debian that contain arping:

arping - sends IP and/or ARP pings (to the mac address)
iputils-arping - Tool to send ICMP echo requests to an ARP address

I’ve had best results with the ‘iputils’ one, so I recommend to install that one. This is mainly because the other package’s command does not implement the required -U flag.

aptitude install iputils-arping

I haven’t installed arping on CentOS yet, but was told the package is in the RPMForge repository.

Using arping
The command looks like this:

arping -s ip_address -c1 -U ip_addresss_of_gateway

Explanation:
-s is the source ip-address, the one you want to update the mac-address of
-c1 sends just one arping
-U is Unsolicited arp mode to update neighbours’ arp caches
This is followed by the ip-address of the gateway you want to update. In most cases this is your default gateway for this network.

Example: you moved 192.168.0.100 to a new server and your gateway is 192.168.0.254, you’d run:

arping -s 192.168.0.100 -c1 -U 192.168.0.254

After you’ve send the arping, the gateway will update the mac-address it knows of the ip-address and traffic for this ip-address will start flowing to the new server.

Bottom line: whenever you migrate an ip-address to another server, use arping to minimize downtime.

Push a specific commit (actually: everything up and including this commit):

git push origin dc97ad23ab79a2538d1370733aec984fc0dd83e1:master

Push everything exept the last commit:

git push origin HEAD~:master

The same, now the last two commits:

git push origin HEAD~2:master

Reorder commits, aka rebasing:

git rebase -i origin

Pulling commits from repo to local

git pull --rebase

When a conflict occurs, solve it, then continue:

git rebase --continue

Put local changes apart (shash them)

git stash save stashname

Show all stashes

git stash list

Retrieve a shash

git stash stashname

When you have committed a change and want to revert it:

Make sure all work is committed or stashed!

git checkout 748796f8f2919de87f4b60b7abd7923adda4f835^ file.pp
git commit
git revert HEAD
git rebase -i
git commit --amend

Explanation:
– Checkout the file as it was before your change (line 1)
– commit it (line 2)
– Revert this commit (line 3)
– Using rebase merge (fixup) this commit with the previous commit that contained a change that you want to remove (line 4)
– Finally, rewrite the commit message and you’re done (line 5)

Git rocks!

A few days ago we installed a ‘Battery Backup Unit’ in our secondary storage server. This allows us to turn on the ‘Write Back Cache’. The performance impact was impressive..

Enabling the Write Back Cache means writes are committed to the raid controler’s cache (which is much faster) so you don’t have to wait for the data to be written to disk. Normally, this is a risky operation because when the power goes down unexpectedly the data in the raid controller’s cache is lost. Thanks to the battery the raid controller can finish all of its writes to the disk even there is no more power.

Have a look at the below graph. It shows the load dropped significantly after we’ve installed this battery.

Starting on the left you see normal operations until we switched off the server around midnight. All services kept working by the way, but more about that redundancy magic in another post. The big spike around 1am was caused by syncing the data with the primary storage again after the server came online again. We had not turned on the Write Back Cache at that time. When it finished syncing, we rebooted the server once again, upgraded firmware and activated the Write Back Cache. We immediately saw an performance boost of around 20 times! The small spike around 2am was syncing with the primary storage again, but this time with Write Back Cache enabled. Our load averages now peak at 1 or 2 instead of >20.

Lesson learned: always install a Battery Backup Unit so you can safely turn the Write Back Cache on!