[BBLISA] simpler alternative to Nagios

Bill Bogstad bogstad at pobox.com
Wed Sep 1 13:19:48 EDT 2010


On Wed, Sep 1, 2010 at 12:26 PM, Brian O'Neill <oneill at oinc.net> wrote:
> I haven't experienced this, but there are many reasons why a linux box
> (or solaris, etc.) can respond to a ping but nothing else is working -
> out of process slots, file descriptors, etc.
>
> ping only means your network connectivity between here and there works,
> and the box is at least powered on and the network stack is functioning.

As you said, all kinds of things can be broken and the basic network
stack can still be up.   Even without a kernel crash, there is no
reason to assume that because pings work that any
user process is running.   What if init went on a rampage, killed all
user processes, and went into a busy loop?  This isn't the same thing
as a kernel crash, but is close enough to be the same
for most purposes.  I vaguely recall reading about people doing this
deliberately on Linux based routes which were being configured with
static configurations.   Everything was stored on a floppy.  When you
wanted to change the config you would pull the floppy, modify it, and
reboot the router.  Everything ran in memory after the boot, so the
floppy wasn't needed except at boot time.
Not good for remote management, but made for a damn secure router config.

Anecdotally, I had to wipe the disks on some Linux machines a long
time ago and did a "dd < /dev/zero > /dev/root &" from the console.
It was kind of interesting to watch as the system slowly lost all
knowledge of files, etc.  The shell was already in memory so it was
happy to execute any command that was built in right up until dd
finished.  I think I even did some of them via a remote ssh login.
Still worked fine as the ssh daemon was already in memory as well.

Bill Bogstad



More information about the bblisa mailing list