Allowing user processes to run for 10 minutes max on Linux – a first approach

4 March 2012 — 2 Comments

Perl processes that got stock kept me from sleep last night. I’m not sure what happened, but probably they were waiting for the database that reached a max_connections limit for this particular user. When Apache reached its max processes, monitoring paged me. Fortunately this is all running in a cluster, so no service was interrupted.

A few days ago I found myself killing Apache processes from a user that created an endless loop. Also, some scripts our users upload generate a lot of load (which is ok for a while, but not too long). So I thought I’d write something in bash to help me with all this 🙂

The script below is what I’ve come up with so far and it seems to work pretty well. What it does, is allowing user processes to run for 10 minutes and then kick in and kill the process if it’s still there.

Apache processes are running as the owner of the account (using mod_ruid) so I can actually see to what user the Apache processes belong. Perl, of course, is also run as the account user. Making use of this is the base idea of the script. Sample ‘ps aux’:

user1        24891  1.1  1.8 112132 61000 ?        S    18:54   0:02 /usr/sbin/apache2 -k start
www-data     24894  0.1  0.8  82624 30240 ?        S    18:54   0:00 /usr/sbin/apache2 -k start
user2        24900  0.2  0.9  82984 31552 ?        S    18:54   0:00 /usr/sbin/apache2 -k start
www-data     25201  4.8  1.2  95540 43296 ?        S    18:57   0:00 /usr/sbin/apache2 -k start
www-data     25202  1.0  0.8  82972 30016 ?        S    18:57   0:00 /usr/sbin/apache2 -k start
user2        25213  6.0  0.1   8992  5692 ?        S    18:57   0:00 /usr/local/bin/perl -- /var/www/site.com/HTML/script.cgi

For safety, I ignore FTP (pure-ftp in my case) and SSH processes. And of course only UID’s from 1000 and up are taken into account. So the ‘www-data’ processes never gets killed’; ‘user1’ and ‘user2’ processes will be killed when they do not complete within 10 minutes.

#!/bin/bash

# 2012-03-04, remi: Kill processes from users (id 1000 and up) that have been running for more than 600 seconds
# For safety, I exclude some processes that users might run that are OK (ftp,ssh,etc)
#
# get processes
ps -eo uid,pid,cmd:9,lstart --no-heading |
    tail -n+2 |
    # we never want to kill pure-ftp
    grep -v "pure-ftpd" |
    #  nor sshd
    grep -v "sshd" |
    #  nor bash
    grep -v "bash" |
    # ignore all sbin processes, incl Apache
    grep -v "/usr/sbin" |
    # loop remaining processes
    while read PROC_UID PROC_PID PROC_CMD PROC_LSTART; do
        # only interested in user processes, so ignore system processes
        if [ $PROC_UID -ge 1000 ]; then
                # how long is this process running?
                SECONDS=$[$(date +%s) - $(date -d"$PROC_LSTART" +%s)]
                # 600 seconds should be more than enough
                if [ $SECONDS -gt 600 ]; then
                        # now, output pid's to be killed on the final line of this script
                        echo $PROC_PID
                        # do save log for debugging
                        cat /proc/$PROC_PID/cmdline >> /var/log/killed.log 2>&1
                        echo ", details: " >> /var/log/killed.log
                        date >> /var/log/killed.log
                        ls -la /proc/$PROC_PID/ >> /var/log/killed.log 2>&1
                fi
        fi
     done |
     # finally, kill them!
     xargs kill

Problems occur when the ‘cmd’ argument has spaces. Since the script delimiters its parameters by spaces also, a syntax error occurs. I worked around that by limiting the output to only 9 characters ‘cmd:9’. In my specific case that did the trick but I’d like to know if there’s a better way to handle it 🙂

Just run this script from cron every minute:

* * * * *  /usr/local/bin/killSlowUserProcesses.sh > /dev/null 2>&1

I hope this will bring me a good night’s sleep tonight 😉

Update: I’ve disabled killing of Apache processes since mod_ruid switches users around including back/forth to www-data. The process might be running for some time, it is not certain the current user has been running it all the time. I need to think about this one for some more time 😉 Updated the above script, added this line:

    grep -v "/usr/sbin" |

2 responses to Allowing user processes to run for 10 minutes max on Linux – a first approach

  1. 

    HI,

    I’ve had a similar issue, though in our case, I want to kill ANY apache process that is more than 2 mins old. In our case, the user is therefore “nobody” – how can I tweak this please?

    • 

      Hi Tim,
      Please be careful, as killing any Apache process after 2 minutes is a little ‘wild’. Is there something specific you can grep for? Like a script name or so. Look at ‘ps aux’ and see if there’s such a thing.

      You’d then use a one-liner like this:

      find /proc -maxdepth 1 -user nobody -type d -mmin +2 -exec basename {} \; | xargs ps h | grep 'some_script.pl' | awk '{ print $1 }' | xargs kill

      This generates a list of PIDs of processes owned by ‘nobody’ that are run for >2 minutes (-mmin +2). It does some filtering, and finally killes the remaining PIDs.
      Replace ‘some_script.pl’ with something to filter the results for.

      Test this first by removing the last ‘xargs kill’ part. It will then only print PIDs you can verify. If you’re sure, run this in cron every minute.

      Good luck,
      Remi

Leave a reply to Remi Bergsma Cancel reply