Archives For 31 May 2012

This is what our Cloud looks like in the CloudStack Dashboard. Pretty powerful 🙂


These are the boxes the hardware was in.


This image is taken in our lab while testing CloudStack.


Very handy: a tray for spares in your rack!

 

This is the front: 6 compute nodes, 2 management servers and 2 Linux routers that manage all traffic. We also have 2 big storage servers that you cannot see on this image.

This is how the back of the rack in the data center looks like after we’ve built everything in.


Close up of power management and storage network.


And the final image shows the serial, public, and manage networks.

We’ve labelled and documented every cable and used a separate color for each connection type (i.e. mgt network, storage network, uplinks, cross links, serial connections, etc).

I’m running Keepalived on our loadbalancers for many years and I’m really happy with it. Today I run into an issue that took me some time to solve. I thought I’d share it 🙂

In my current setup I have a pair of Debian Squeeze boxes running version 1.1.20. Since I’m rolling out ipv6 at the moment, I need to upgrade to 1.2.2. Fortunately, Debian provides this version in Squeeze-Backports.

So, I decided to upgrade the Backup loadbalancer first (lb-1). It was a simple ‘apt-get’ procedure to get it installed. But soon these errors popped up in my syslog:

Jun 16 21:12:25 lb-1 Keepalived_vrrp: bogus VRRP packet received on bond1 !!!
Jun 16 21:12:25 lb-1 Keepalived_vrrp: VRRP_Instance(CLOUD_MGT_GW) ignoring received advertisment...
Jun 16 21:12:25 lb-1 Keepalived_vrrp: receive an invalid passwd!

No messages were to be found in the primary loadbalancer, lb-0. The two loadbalancers weren’t talking to each other any more. I did’t try a failover, as this apparently wouldn’t work. To be sure, I stopped keepalived on the lb-1.

Using tcpdump I found the problem: version 1.1.20 uses a password in its broadcast advertisement that was truncated to the first 7 characters, while the version 1.2.2 uses the full length password, as configured in /etc/keepalived/keepalived.conf. This of course did not match, and so they refused to talk to each other.

The solution was simple: I changed the password in the version 1.2.2 loadbalancer, to be 7 characters long. Then restarted keepalived, and all was working again. After upgrading both loadbalancers, I changed back the password to the longer version and since the versions are now both 1.2.2 it still worked 🙂