As a sysadmin I’ve many things to take care of. One of the most important is backups. As websites and mailarchives become larger and lager, it is an ongoing challenge to fit as many backups in the available backup space.
In the early days we’re backupping using rsync, tar and gzip. The biggest drawback was it takes a lot of space. On the bright side, it’s plain simple and just always works. All you’ve to do is untar an archive and everything is there again (i.e.: happy customer!). It helped me on many occasions. So I kept this old method for a long time and looked around for alternatives.
I’ve experimented with tools like rdiff-backup, but didn’t feel comfortable with it. Rdiff-backup just had disappointed me too many times. The version of client and server needs to be exactly the same. So during an upgrade from say Debian Lenny to Debian Squeeze, you either have no backups of the freshly upgrades machines, or, when you’ve upgraded the backupserver too, no more backups of the not-yet-upgraded machines. May be no problem for a few servers, but I’m managing many servers and this just doesn’t work. Another problem was that the rdiff-backup would got corrupt on some cases. In that case, only the last backup was usable, the others were gone. So the rdiff-backup experiment didn’t work.
Last week, when googling about ‘snapshots’ for another project, I just run into rsnapshot backup.Wow, that looked cool and simple! And since our backup server was suffering from low available disk space, which takes a lot of time to resolve each time, I decided to implement rsnapshot and see if it’d work for my environment.
Installation is simple:
aptitude install rsnapshot
Then edit /etc/rsnapshot.conf and tell the program what to backup, how many times, what to include/exclude and some more details. I found it very simple and powerful. The only thing you’ll need to know is that values are separated by tabs (not spaces) and paths have a trailing slash.
The magical thing rsnapshot uses is called ‘hardlinks‘. So, when rsnapshot finds two files in two backups are the same (i.e.: unchanged) it just makes a hardlink instead of saving two copies. This saves a lot of backup space!
This is how it looks like after rsnapshot has been running for some time:
This website is 215MB. Saving 6 backups would normally cost 6x215MB = 1290MB which is 1.2GB. When using rsnapshot, only the changed (added, deleted, updated) files are saved, the rest are hardlinks. That turns out to be a great idea, since the backups now uses only 219MB instead of 1.2GB!
Using less space per backup means we’re able to save more backups for our customers 🙂