Remote Linux systems usually have to be available all the time. Although Linux is rock solid and stable, an occasional crash can occur. For example when there’s a problem with hardware or software. The last thing you want, is that a Linux server got a kernel panic and then waits forever for someone to reboot it.
Of course there’s monitoring, and there are APC’s too, which can reboot the server. This is like pulling the plug. It usually takes some time for monitoring to notify a sysadmin and then for the sysadmin to reboot the server. And, when using IPMI devices (especially the older ones that share a network connection with the server), a kernel panic could make them unavailable, too. I’ve had that on many occasions. Then you end up driving to the data center or call someone to reset the server. I hate that
In our setup, most servers are in clusters. This means losing one server should not be a problem. But you still want a server to be available again as soon as possible, to be able to handle future problems.
There’s a way to configure Linux to reboot automatically, say 10, seconds after a kernel panic occurs. This will quickly and automatically have the server up again. Should rebooting not help, then bad luck and there’s probably some failing hardware part. You’ve then to drive to the data center anyway.
So, how to do that? Well, there are several options to set this parameter. To test this out, use this command:
/sbin/sysctl -w kernel.panic=10
Note that this setting will not survive a reboot. If you want it to remain active, add this line to /etc/sysctl.conf:
To check the current setting, issue:
It is wise to implement some sort of monitoring on reboots or uptime. You definitely want to read the logfiles and find out what exactly happened that led to the kernel panic.