Archives For 29 February 2012

After restoring an OpenLDAP server I found these lines in the logs:

Mar  5 06:50:03 ldap slapd[4815]: <= bdb_equality_candidates: (uidNumber) index_param failed (13)
Mar  5 06:50:04 ldap slapd[4815]: <= bdb_equality_candidates: (uid) index_param failed (13)

This means OpenLDAP is query’ing its database, but found no index for fields it often uses. In this case ‘uid’ and ‘uidNumber’. It seems due to restoring the backup, these indexes got lost. Here is how to add the indexes again:

Stop the OpenLDAP server:

/etc/init.d/slapd stop

Open the config file where we’ll add the indexes:

vim /etc/ldap/slapd.d/cn\=config/olcDatabase\=\{1\}hdb.ldif

Add the new indexes, after the first ‘olcDbIndex: objectClass eq in’ line. In my case this was in the file:

...
olcDbIndex: objectClass eq
...

And I changed that to:

...
olcDbIndex: objectClass eq
olcDbIndex: uid eq
olcDbIndex: uidNumber eq
olcDbIndex: uniqueMember eq
olcDbIndex: gidNumber eq
...

Be sure not to touch other settings in that file. Just add the lines after the first index. After that, make sure to reindex the database:

slapindex -F /etc/ldap/slapd.d/

Since I ran that as root user, I need to fix permissions afterwards:

chown -R openldap:openldap /var/lib/ldap

Make sure when you do a ‘ls -la’ on /var/lib/ldap, all files (including the folder itself) are owner and group ‘openldap’, otherwise OpenLDAP will not start.

Now it’s time to start OpenLDAP again:

/etc/init.d/slapd start

And all should be well again! When it does not start and look like this:

PANIC: fatal region error detected; run recovery

Be sure to check the permissions as stated above!

Just a quick instruction on howto restore an OpenLDAP server using a ‘ldif’-backup-file:

1. Setup the server

2. Configure the ‘slapd’ package, be sure to use the right database name. It’s a bit confusing: you enter it as ldaptree.company.com, but this will be used to create an empty database. Make sure it matches your backup ldif structure.

3. Make sure OpenLDAP is stopped

4. Read the backupdata into the OpenLDAP database using slapadd program

slapadd -c -l backup.ldif

The -c continues on errors, which might be necessary for example because the ‘root’ is already created. You can also run without it, and fix any errors by hand in the backup.ldif file. -l specifies the file to read from.

5. Fix permissions, make sure ‘openldap’ is both user/group owner

6. Start OpenLDAP and the server should be up & running again!

Perl processes that got stock kept me from sleep last night. I’m not sure what happened, but probably they were waiting for the database that reached a max_connections limit for this particular user. When Apache reached its max processes, monitoring paged me. Fortunately this is all running in a cluster, so no service was interrupted.

A few days ago I found myself killing Apache processes from a user that created an endless loop. Also, some scripts our users upload generate a lot of load (which is ok for a while, but not too long). So I thought I’d write something in bash to help me with all this 🙂

The script below is what I’ve come up with so far and it seems to work pretty well. What it does, is allowing user processes to run for 10 minutes and then kick in and kill the process if it’s still there.

Apache processes are running as the owner of the account (using mod_ruid) so I can actually see to what user the Apache processes belong. Perl, of course, is also run as the account user. Making use of this is the base idea of the script. Sample ‘ps aux’:

user1        24891  1.1  1.8 112132 61000 ?        S    18:54   0:02 /usr/sbin/apache2 -k start
www-data     24894  0.1  0.8  82624 30240 ?        S    18:54   0:00 /usr/sbin/apache2 -k start
user2        24900  0.2  0.9  82984 31552 ?        S    18:54   0:00 /usr/sbin/apache2 -k start
www-data     25201  4.8  1.2  95540 43296 ?        S    18:57   0:00 /usr/sbin/apache2 -k start
www-data     25202  1.0  0.8  82972 30016 ?        S    18:57   0:00 /usr/sbin/apache2 -k start
user2        25213  6.0  0.1   8992  5692 ?        S    18:57   0:00 /usr/local/bin/perl -- /var/www/site.com/HTML/script.cgi

For safety, I ignore FTP (pure-ftp in my case) and SSH processes. And of course only UID’s from 1000 and up are taken into account. So the ‘www-data’ processes never gets killed’; ‘user1’ and ‘user2’ processes will be killed when they do not complete within 10 minutes.

#!/bin/bash

# 2012-03-04, remi: Kill processes from users (id 1000 and up) that have been running for more than 600 seconds
# For safety, I exclude some processes that users might run that are OK (ftp,ssh,etc)
#
# get processes
ps -eo uid,pid,cmd:9,lstart --no-heading |
    tail -n+2 |
    # we never want to kill pure-ftp
    grep -v "pure-ftpd" |
    #  nor sshd
    grep -v "sshd" |
    #  nor bash
    grep -v "bash" |
    # ignore all sbin processes, incl Apache
    grep -v "/usr/sbin" |
    # loop remaining processes
    while read PROC_UID PROC_PID PROC_CMD PROC_LSTART; do
        # only interested in user processes, so ignore system processes
        if [ $PROC_UID -ge 1000 ]; then
                # how long is this process running?
                SECONDS=$[$(date +%s) - $(date -d"$PROC_LSTART" +%s)]
                # 600 seconds should be more than enough
                if [ $SECONDS -gt 600 ]; then
                        # now, output pid's to be killed on the final line of this script
                        echo $PROC_PID
                        # do save log for debugging
                        cat /proc/$PROC_PID/cmdline >> /var/log/killed.log 2>&1
                        echo ", details: " >> /var/log/killed.log
                        date >> /var/log/killed.log
                        ls -la /proc/$PROC_PID/ >> /var/log/killed.log 2>&1
                fi
        fi
     done |
     # finally, kill them!
     xargs kill

Problems occur when the ‘cmd’ argument has spaces. Since the script delimiters its parameters by spaces also, a syntax error occurs. I worked around that by limiting the output to only 9 characters ‘cmd:9’. In my specific case that did the trick but I’d like to know if there’s a better way to handle it 🙂

Just run this script from cron every minute:

* * * * *  /usr/local/bin/killSlowUserProcesses.sh > /dev/null 2>&1

I hope this will bring me a good night’s sleep tonight 😉

Update: I’ve disabled killing of Apache processes since mod_ruid switches users around including back/forth to www-data. The process might be running for some time, it is not certain the current user has been running it all the time. I need to think about this one for some more time 😉 Updated the above script, added this line:

    grep -v "/usr/sbin" |

Adding a route manually can be necessary sometimes. When on Linux, I know the command by head:

sudo route add -net 10.67.0.0/16 gw 192.168.120.254

On the Mac the command is similar, but a bit different 🙂 Just as a note to myself and anyone else interested:

sudo route -n add -net 10.67.0.0/16  192.168.120.254

This sets up a route to the 10.67.0.0/16 net through gateway 192.168.120.254. First one on Linux, second one on Mac OSX.