How I ensure CFEngine is always running while also allowing an easy way to disable it

14 January 2014 — 2 Comments

I’m currently finalizing the CFEngine 3 setup at my $current_work because by the end of the month I will start a new job. In a little over a year, I fully automated the Linux sysadmin team. From now on, only 2 sysadmins are needed to keep everything running. Since almost everything is automated using CFEngine 3, it’s very important that CFEngine is running at all times so it can keep an eye on the systems and thus prevent problems from happening.

I’ve developed an init script, that makes sure CFEngine is installed and bootstrapped to the production CFEngine policy server. This init script is added in the post-install phase of the automatic installation. This gets everything started and from there on CFEngine kicks in and takes control. That same init script is also maintained with CFEngine. This is done so it cannot easily be removed or disabled.

Also, when CFEngine is not running (anymore) it should be restarted. A cron job is setup to do this. This cron job is also setup using CFEngine. It is using regular cron on the OS, of course. If all else fails, this cron job can also install CFEngine in the event it might be removed. Last thing it does, is automatically recover from ‘SIGPIPE’ bug we sometimes encounter on SLES 11.

To summarize:
– an init script (runs every boot) makes sure CFEngine is installed and bootstrapped
– a hourly cron job makes sure the CFEngine daemons are actually running
– CFEngine itself ensures both the cron job and init script are properly configured

This makes it a bit harder to (accidentally) remove CFEngine, don’t you think?!

Reporting servers that do not talk to the Policy server anymore
Now, imagine someone figures a way to disable CFEngine anyway. How would we know? The CFEngine Policy server can report this using a promise. It reports it via syslog, so it will show up in Logstash. The bundle looks like this:

bundle agent notseenreport
{
        classes:
                "display_report" expression => "Hr08.Min00_05";

        vars:
                # Default to empty list
                "myhosts" slist => { };

                display_report::
                        "myhosts" slist => { hostsseen("24","notseen","name") };

        reports:
                "CFHub-Production: Did not talk to $(myhosts) for over 24 hours now";
}

We’ve set this up on both Production and Pre-Production Policy servers.

How to temporary disable CFEngine?
On the other side, sometimes you want to temporary disable CFEngine. For example to debug an issue on a test server. After a discussion in our team, we came up with an easy solution: when a so-called ‘Do Not Run‘ file exists on a server, we should instruct CFEngine to do nothing. We use the file ‘/etc/CFEngine_donotrun‘ for this, so you’d need ‘root‘ privileges or equal to use it.

In ‘promise.cf‘ a class is set when the file is found:

"donotrun" expression => fileexists("/etc/CFEngine_donotrun");

For our setup we’re using a method detailed in ‘A CFEngine Case Study‘. We added the new class:

!donotrun::
        "sequence"  slist => getindices("bundles");
        "inputs" slist => getvalues("bundles");

donotrun::
        "sequence"  slist => {};
        "inputs" slist => {};

reports:
   donotrun::
        "The 'DoNotRun' file was found at /etc/CFEngine_donotrun, exiting.";

In other words, when the ‘Do Not Run‘ file is found, this is reported to syslog and no bundles are included for execution: CFEngine then does effectively nothing.

An overview of servers that have a ‘Do Not Run‘ file appears in our Logstash dashboard. This makes them visible and we look into then on a regular basis. It’s good practice to put the reason why in the ‘Do Not Run‘ file, so you know why it was disabled and when. Of course, this should only be used for a small period of time.

Making sure CFEngine runs at all times makes your setup more robust, because CFEngine fixes a lot of common problems that might occur. On the other hand, having an easy way to temporary disable CFEngine also prevents all kind of hacks to ‘get rid of it’ while debugging stuff (and then forgetting to turn it back on). I’ve found this approach to work pretty good.

Update:
After publishing this post, I got some nice feedback. Both Nick Anderson (in the comments) and Brian Bennett (via twitter) pointed me into the direction of CFEngine’s so called ‘abortclasses‘ feature. The documentation can be found on the CFEngine site. To implement it, you need to add the following to a ‘body agent control‘ statement. There’s one defined in ‘cf_agent.cf‘, so you could simply add:

abortclasses => { "donotrun" };

Another nice thing to note, is that others have also implemented similar solutions. Mitch Lewandowski told me via twitter he uses a filed simply called ‘/nocf‘ for this purpose and Nick Anderson (in the comments) came up with an even funnier name: ‘/COWBOY‘.

Thanks for all the nice feedback! 🙂

2 responses to How I ensure CFEngine is always running while also allowing an easy way to disable it

  1. 

    Nice, I hope you will consider submitting a Pull Request against masterfiles[0] or contributing a sketch to design-center[1] for your cron job entries to ensure cfengine is continually running. I have used the same method you describe as /etc/CFEngine_donotrun (abortclasses), we called it /COWBOY. In fact there is an aborclasses bundle to make this very easy. It really helped in emergencies when other team members needed to make a maual change for a short period until it could be integrated back into the larger policy set. Indeed you have to be careful with it, as others have noted (namely Uncle Ben from SpiderMan), “With great power comes great responsibility”.

    I hope you reported your SIGPIPE issue in the bug tracker.

    Congratulations on our new position! I hope you continue to be a part of the CFEngine community, your blog post contributions did not go un-noticed, and are greatly appriciated by the community. I hope to see more contributions from you in the future.

    [0] https://github.com/cfengine/masterfiles
    [1] https://github.com/cfengine/design-center

  2. 

    Hi Remi, all the best at your new job! Hopefully you’ll be able to continue using CFEngine 🙂 Thank you for all your contributions to the CFEngine community!

Leave a reply to nickanderson042 Cancel reply