Sorting Apache2 logfiles from multiple webservers by date so AWStats processes them correctly

11 August 2012 — Leave a comment

I had an interesting problem lately regarding AWStats. Due to some delay, the log files weren’t processed in the right order and then AWStats ignored all old logs. This resulted in some days being blank in the stats and of course this is not something we want. Since we also have multiple web servers in our cluster, things started to get a bit complicated.

The log files from each of the web servers were concatenated and then split to a separate log file for each virtual host using the Apache2 split-logfile script.

The logs for an example virtual host looked like this:

1.2.2.1 - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"

As you can see, AWStats processes August 1 and then refuses the older July records. To resort the log files, I ran:

cat website.unsorted.log | sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n > website.log

As an alternative the AWStats scriptlogresolvemerge.pl can be used as well. Since I already had concatenated the log files and split them, the sort option above was faster to implement.

Now the log file looks like this:

1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"
1.2.2.1 - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"
1.2.2.1 - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"

One last thing to solve was the AWStats history file. Since it had run before but with the wrong ordenend logfile, it had a wrong ‘LastLine’ setting. Experimenting with this showed it was best to remove the line, and replace it with a newline (so we won’t break the indexes). I used sed to fix it:

sed -i \
-e 's/^LastLine .*//' \
awstats072012.*

AWStats now updates the stats correctly and everybody is happy! Thanks to my colleagues Pim, Vincent and Mischa because they all helped solving some pieces of the puzzle. Yes, it’s nice having some technically skilled colleagues 🙂

No Comments

Be the first to start the conversation!

What do you think?