High cpu load on CloudStack management servers after leap second 30/6/2012@23:59:60 UTC

1 July 2012 — 5 Comments

After the leap second insert last night, my CloudStack 3.0 servers (or its Java processes actually) started to use a lot of CPU. Here’s how to fix it (sets the date):

date ; date -s "`date -u`" ; date

Just run this on your management- or compute node. On my CloudStack system it only occurred on the management servers. Confirmed: it occurred both on management- and compute nodes on our CloudStack system. No restart required afterwards . The load will drop immediately.

Note: restarting cloud-management alone does not fix the issue. Rebooting the machine does, however, but I’d prefer not to reboot them 🙂

MySQL seems to be affected as well, though I didn’t experience problems with it.  Thanks to the guys @Mozilla for blogging about this problem and suggesting a fix.

I’ve also posted this to the CloudStack forums,so there might be some discussion as well.

5 responses to High cpu load on CloudStack management servers after leap second 30/6/2012@23:59:60 UTC

  1. 

    Thanks! Works great. Debian, 2.6.32-5

  2. 

    Thanks for this information. Fixing this problem without restarting the server is definitely preferred.

    Still I would like to know what the root cause for this was. We are managing our own java installations (because we want complete control over the java env). Do we need to apply for instance regular timezone updates for Java? Looking at the updates that yum provides for the java environment from centos 6.2 there are regular timezone updates. Would that have prevented the problem?

  3. 

    “Me too!” I am half relieved it is not just me, half concerned. Off to follow your links now to discover the root cause as I have to explain this all to the pointy haired boss.

  4. 

    As I understood it, this happened (please correct me if I’m wrong):
    1. The NTP daemon scheduled a ‘leap second’ somewhere on June 30th, so the kernel could handle it at midnight
    2. Due to a bug in the kernel, it didn’t handle it properly
    3. This resulted in certain cpu tasks to enter an endless loop
    4. Java and MySQL for example had this problem, where they’d use 100% cpu
    5. Setting the date once again cleared the ‘leap second’ bit, and also stopped the endless loops. This is what I describe in this article.
    6. Everything returned to normal operations

    I’ll do some more research and will probably write another post if I have more in depth information.

    Also, the next ‘leap second’ [1] will be on December 31st, 2012 again at midnight, so we better prepare for this one 🙂

    [1] ftp://tycho.usno.navy.mil/pub/ntp/leap-seconds.3535142400

Leave a reply to Erik Cancel reply