Archives For sed

Sed is very powerful, I use it a lot on Linux servers I manage. Today I was working on a local git repository on Mac OSX Mountain Lion when I run into some trouble.

Usually, to replace text in a file with new text I run:

sed -i 's/Find this text/Replace with this/' file_to_replace_in.txt

While this works on Linux, it does not on Mac OSX:

sed: -i may not be used with stdin

The manpage on OSX says:

Edit files in-place, saving backups with the specified extension.
If a zero-length extension is given, no backup will be saved.

Aha, it wants to save a backup file. So I changed my command to:

sed -i '.bak' 's/Find this text/Replace with this/' file_to_replace_in.txt

This works, although it leaves a backup file ‘file_to_replace_in.txt.bak’ behind. This is great if you’re not sure, but can be annoying as well. To stop it making backups you specify an empty extension, like so:

sed -i '' 's/Find this text/Replace with this/' file_to_replace_in.txt

This allows me to quickly find & replace again, like when working on Linux 🙂

I had an interesting problem lately regarding AWStats. Due to some delay, the log files weren’t processed in the right order and then AWStats ignored all old logs. This resulted in some days being blank in the stats and of course this is not something we want. Since we also have multiple web servers in our cluster, things started to get a bit complicated.

The log files from each of the web servers were concatenated and then split to a separate log file for each virtual host using the Apache2 split-logfile script.

The logs for an example virtual host looked like this: - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04" - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04" - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32" - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32"

As you can see, AWStats processes August 1 and then refuses the older July records. To resort the log files, I ran:

cat website.unsorted.log | sort -t ' ' -k 4.9,4.12n -k 4.5,4.7M -k 4.2,4.3n -k 4.14,4.15n -k 4.17,4.18n -k 4.20,4.21n > website.log

As an alternative the AWStats can be used as well. Since I already had concatenated the log files and split them, the sort option above was faster to implement.

Now the log file looks like this: - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32" - - [28/Jul/2012:04:02:06 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_32" - - [01/Aug/2012:05:50:50 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04" - - [01/Aug/2012:05:50:51 +0200] "GET /nonexistent_page.html HTTP/1.1" 404 224 "-" "Java/1.6.0_04"

One last thing to solve was the AWStats history file. Since it had run before but with the wrong ordenend logfile, it had a wrong ‘LastLine’ setting. Experimenting with this showed it was best to remove the line, and replace it with a newline (so we won’t break the indexes). I used sed to fix it:

sed -i \
-e 's/^LastLine .*//' \

AWStats now updates the stats correctly and everybody is happy! Thanks to my colleagues Pim, Vincent and Mischa because they all helped solving some pieces of the puzzle. Yes, it’s nice having some technically skilled colleagues 🙂