Archives For August 2012

Back in March I wrote a blog on how to create a network without a Virtual Router.  I received a lot of questions about it. It’s also a question that pops up now and then on the CloudStack forums. In the meanwhile I’ve worked hard to implement this setup at work. In this blog I’ll describe the concept of working with a CloudStack setup that has no Virtual Router.

First some background. In Advanced Networking, VLAN’s are used for isolation. This way, multiple separated networks can exist over the same wire. More about VLAN technology in general on this wikipedia page. For VLAN’s to work, you need to configure your switch so it knows about the VLAN you use. VLAN’s have an unique id between 1 and 4096. CloudStack configures this all automatically, except for the switch. Communication between Virtual Machines in the same CloudStack network (aka VLAN) is done using the corresponding VLAN-id. This all works out-of-the-box.

It took me some time to realize how powerful this actually is. One can now combine both VM’s and physical servers in the same network, by using the same VLAN for both. Think about it for a moment. You’re now able to replace the Virtual Router with a Linux router simply by having it join the same VLAN(s) and using the Linux routing tools.

Time for an example. Say we have a CloudStack network using VLAN-id 1234, and this network is created without a Virtual Router (see instructions here). Make sure you have at least 2 VM’s deployed and make sure they’re able to talk to each other over this network. Don’t forget to configure your switch. If both VM’s are on the same compute node, networking between the VM’s works, but you won’t be able to reach the Linux router later on if the switch doesn’t know the VLAN-id.

Have a separate physical server available running Linux and connect it to the same physical network as your compute nodes are connected to. Make sure the ip’s used here are private addresses. In this example I use:

compute1: 10.0.0.1
compute2: 10.0.0.2
router1: 10.0.0.10
vm1: 10.1.1.1
vm2: 10.1.1.2

The Linux router needs two network interfaces: one to the public internet (eth0 for example) and one to the internal network, where it connects to the compute nodes (say eth1). The eth1 interface on the router has ip-address 10.0.0.10 and it should be able to ping the compute node(s). When this works, add a VLAN interface on the router called eth1.1234 (where 1234 is the VLAN-id CloudStack uses). Like this:

ifconfig eth1.1234 10.1.1.10/24 up

Make sure you use the correct ip-address range and netmask. They should match the ones CloudStack uses for the network. Also, note the ‘.’ between the eth1 and the VLAN-id. Don’t confuse this with ‘:’ which just adds an alias ip.

To check if the VLAN was added, run:

cat /proc/net/vlan/eth1.1234

It should return something like this:

eth1.1234 VID: 1234 REORDER_HDR: 1 dev->priv_flags: 1
 total frames received 14517733268
 total bytes received 8891809451162
 Broadcast/Multicast Rcvd 264737
 total frames transmitted 6922695522
 total bytes transmitted 1927515823138
 total headroom inc 0
 total encap on xmit 0
Device: eth1
INGRESS priority mappings: 0:0 1:0 2:0 3:0 4:0 5:0 6:0 7:0
 EGRESS priority mappings:

Tip: if this command does not work, make sure the VLAN software is installed. In Debian you’d simply run:

apt-get install vlan

Another check:

ifconfig eth1.1234

It should return something like this:

eth1.1234 Link encap:Ethernet HWaddr 00:15:16:66:36:ee 
 inet addr:10.1.1.10 Bcast:0.0.0.0 Mask:255.255.255.0
 inet6 addr: fe80::215:17ff:fe69:b63e/64 Scope:Link
 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
 RX packets:14518848183 errors:0 dropped:0 overruns:0 frame:0
 TX packets:6925460628 errors:0 dropped:15 overruns:0 carrier:0
 collisions:0 txqueuelen:0 
 RX bytes:8892566186128 (8.0 TiB) TX bytes:1927937684747 (1.7 TiB)

Now, the most interesting tests: ping vm1 and vm2 from the linux router, and vice versa. It should work, because they are all using the same VLAN-id. Isn’t this cool? You just connected a physical server to a virtual one! 🙂

You now have two options to go from here:

1. Use a LoadBalancer (like Keepalived) and keep the ip’s on the VLAN private using Keepalived’s NAT routing. The configuration is exactly the same as if you had all physical servers or all virtual servers.

2. Directly route public ip’s to the VM’s. This is the most interesting one to explain a bit further. In the example above we’ve used private ip’s for the VLAN. Imagine you’d use public ip addresses instead. For example:

vm1: 8.254.123.1
vm2: 8.254.123.2
router1: 8.254.123.10 (eth1.1234; eth1 itself remains private)

This also works: vm1, vm2 and router1 are now able to ping each other. A few more things need to be done on the Linux router to allow it to route the traffic:

echo 1 > /proc/sys/net/ipv4/ip_forward
echo 1 > /proc/sys/net/ipv4/conf/eth1/proxy_arp

Finally, on vm1 and vm2, set the default gateway to router1; 8.254.123.10 in this example.

How does this work? The Linux router also answers arp requests for the ip’s in the VLAN. Whenever traffic comes by for vm1, router1 answers the arp request and routes the traffic over the VLAN to vm1. When you’d run a traceroute, you’ll see the Linux router appear as well. Of course you need to have a subnet of routable public ip’s assigned by your provider for this to work properly.

To me this setup has two major advantages:

1. No wasted resources for Virtual Routers (one for each network)
2. Public ip’s can be assigned directly to VM’s; you can even assign multiple if you like.

The drawbacks? Well, this is not officially supported nor documented. And since you are not using the Virtual Router, you’ll have to implement a lot of services on your own that were normally provided by the Virtual Router. Also, deploying VM’s in a network like this only works using the API. To me these are all challenges that make my job more interesting 😉

I’ve implemented this in production at work and we successfully run over 25 networks like this with about 100-125 VM’s. It was a lot of work to configure it all properly and to come up with a working solution. Now that it is live, I’m really happy with it!

I realize this is not a complete step-by-step howto. But I do hope this blog will serve as inspiration for others to come up with great solutions build on top of the awesome CloudStack software. Please let me know in the comments what you’ve come up with! Also, feel free to ask questions: I’ll do my best to give you some directions.

Enjoy!

Now that I have some time off, I enjoy spending some of it on my blog. I played with some WordPress settings and redesigned the right column. My most important goal here is to make sharing easy.

Why? Because I benefit a lot from what others post and want to give back a bit. That’s why I write my blogs. If you find them useful, it’s now easy to share them on Twitter, Facebook, Linked-In and others. The more people benefit, the better, don’t you think?

Finally, to give my blog a personal touch, I changed the address bar to: blog.remibergsma.com

How do you think I can improve my blog? I appreciate your feedback 🙂

After all the hard work in the past months it’s now time for some weeks off 🙂 One last thing to solve was the Zimbra auto responder: I had enabled it through the web interface but while testing it, I found it only replied mails sent to my main e-mail address. There is no way to fix this in the web interface, so I had a look at the command line options and found a way to add extra e-mail addresses. Ssh to the machine, become user ‘zimbra’ and run this command: 

zmprov ma user@domain.com +zimbraPrefOutOfOfficeDirectAddress user@otherdomain.com

By running this command you tell Zimbra to also reply to user@otherdomain.com instead of only user@domain.com. By specifying a + before the command, you can run this command multiple times to add more e-mail addresses, if you have them. Otherwise you’ll overwrite the previous setting. I have added all 3 e-mail addresses I use, so they’ll get auto responses while I’m out of the office. Hope this is helpful to others as well.

Sometimes files may be filled up with null characters that look like ^@ when you open them in a text editor. This may happen when a disk becomes full, or when you rename a logfile while an application is still writing to it.

I ran into this problem today, and I fixed it using a command called ‘tr’. This is a utility capable of translating or deleting characters from standard input/output. It means you can use it to ‘pipe’ input to it, and send the output to a new file. For example:

cat file.log | tr -d '\000'  > new_file.log 

Note: when using this in a script, you might need to escape that backslash.

What does this command do? Using the -d switch we delete a character. A backslash followed by three 0’s represents the null character. This just deletes these characters and writes the result to a new file. Problem solved!

At work, it is of my tasks to make sure we keep the mailboxes of our clients free of spam. Some weeks ago the number of spam mails went up massively and we worked hard to update the filters to keep unwanted mails out. In this blog I’ll describe a few of the things we did.

Using the famous SpamAssassin tool it is possible to score e-mails. One can score on the contents of subject, body, headers etcetera. A lot of good rules are already supplied and it’s possible to write your own. When a new spam run comes in, we used to create new rules for the spam mails that slipped trough. That works, and afterwards the mails are tagged as spam.

As you can imagine, this procedure is both time consuming and a bit late: only after we see mails slipping through we can create rules to catch them. Of course this procedure will always be some sort of a last resort if all else fails but I wanted to setup something more proactive.

To start from the beginning, how is all this spam sent?
It can’t be send from one or a few locations only, because then it’d be easy to block. Instead, most spam is sent by botnets these days. Botnets usually have hundreds of thousands pc’s under control and one of the main things they do is sending spam. For example to advertise online casino’s, fake banking sites or other scams. Because there are so many infected PC’s, it’s not easy to block them all. Or is it?

When thinking about this, I realized most (if not all) members of these botnets are infected Windows pc’s. Also, these mails are often directly sent from the PC to the final destination mailserver (instead of using the SMTP server of their ISP).

If we could detect the OS of the client that connects to our mailserver, we could then apply certain actions based on the OS. The idea here is that most ISP’s use Linux, Unix or Mac servers. And if they are using Windows, it is likely to be some Server version instead of ‘Windows Vista’, ‘Windows XP’, etcetera. Interesting!

What we want to do here is known as Passive OS Fingerprinting. A tool that implements this is for example P0f. You run P0f as a deamon on the mailserver that accepts the incoming connections. Based on the traffic that flows by, P0f is able to guess the OS of the client that connects. It is passive, so the client never knows we’re doing this. Nothing is in between the client and the mailserver, P0f is just analyzing the traffic. Now that P0f knows the OS of the client, we can decide what to do with this information. In our setup it works like this:

1. When the OS is Windows, but not Windows Server, activate Greylisting. When another or unknown OS is detected, start mail delivery immediately;

The idea behind this is that mail sent from infected Windows PC’s is usually poorly written. They cannot handle the fact the mailserver sends a 400 series temporary error message and most give up after just one attempt. This technique is called Greylisting and it alone reduces the number of spam mails significantly. But, Greylisting has drawbacks as well. The biggest drawback in my opinion is that it can delay mail up to 30 minutes or more. Most customers we serve find this annoying.

2. At the time the connection is accepted and the mail is delivered, we set a ‘X-P0f-OS:’-header with the detected OS;

Combining Greylisting and P0f creates a more ideal solution: Windows PC’s should not send mail directly to the recipients mailserver, but use the Provider SMTP server instead. One could say that when such a PC is sending mails directly, it is at least suspicious. That is, in my opinion, enough reason to Greylist them. There must be some spam software that understands Greylisting (now or in the future) and that will eventually connect again after some time and deliver the mail. That’s why there is an action #3:

3. This header is consulted later on in the delivery process, and when Windows appears (again, not Windows Server), Spamassassin assigns some points to the spam-score.

Because mail that is sent directly from a Windows PC is suspicious to me. The OS score helps reaching the score needed for Spamassassin to tag it. The interesting thing is that this is proactive: you just don’t know what new mail spammers will send, but what you do know is that the next mail is probably send by an infected Windows PC.

This setup is now up and running, so I’ll let you all know what my experiences are after some time. When I find the time, I will also write some blogs in more detail on how to setup such a system.

Any other methods you use to stop spam effectively?

I had an interesting problem lately regarding AWStats. Due to some delay, the log files weren’t processed in the right order and then AWStats ignored all old logs. This resulted in some days being blank in the stats and of course this is not something we want. Since we also have multiple web servers in our cluster, things started to get a bit complicated.

The log files from each of the web servers were concatenated and then split to a separate log file for each virtual host using the Apache2 split-logfile script.

The logs for an example virtual host looked like this:

1.2.2.1 - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"

As you can see, AWStats processes August 1 and then refuses the older July records. To resort the log files, I ran:

cat website.unsorted.log | sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n > website.log

As an alternative the AWStats scriptlogresolvemerge.pl can be used as well. Since I already had concatenated the log files and split them, the sort option above was faster to implement.

Now the log file looks like this:

1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"

One last thing to solve was the AWStats history file. Since it had run before but with the wrong ordenend logfile, it had a wrong ‘LastLine’ setting. Experimenting with this showed it was best to remove the line, and replace it with a newline (so we won’t break the indexes). I used sed to fix it:

sed -i \
-e 's/^LastLine .*//' \
awstats072012.*

AWStats now updates the stats correctly and everybody is happy! Thanks to my colleagues Pim, Vincent and Mischa because they all helped solving some pieces of the puzzle. Yes, it’s nice having some technically skilled colleagues 🙂

 

My mother’s iPhone had issues and was replaced by a new one. That did Apple handle very well 🙂 The down side of a new iPhone is, of course, that all settings and photo’s were lost. Fortunately, I was smart enough to enable iCloud backups last year when I setup her iPhone.

So, when the new iPhone came in today, all I did was restore the iCloud backup in a few swipes and clicks. An hour or so later (due to downloading) everything was as before. Very cool!

I recommend everybody to setup iCloud backups. You find the settings in the iCloud screen. It’s easy and you never know…