[BBLISA] simpler alternative to Nagios

Dean Anderson dean at av8.com
Wed Sep 1 15:10:43 EDT 2010


Oh yeah. I forgot to point out in my last message that by responding in 
the interrupt handler, response to ping _does_not_ mean the network 
stack is functioning.  

The kernel is traditionally divided into a lower half (interrupt
handlers and code to support interrupt handlers) and an upper half
(everything else).  The modularity of the kernel depended on the machine
independent code all being upper-half and drivers handling the relation
between upper/lower.  When the kernel crashes, it is a catastrophic
failure of code either in the upper half or lower half.  If the upper
half fails and e.g., holds a critical lock and goes into an infinite
loop as opposed to panic(), the lower half interrupt handlers may
continue to operate.  The network stack is in the upper half. Nothing 
can run, but linux will still ping.

In non-linux systems, ping has more implication about the non-crash
state of the kernel.  But it was never a good idea to depend on ping as
an indication of system state. The IBM mainframe has a 3172(#?) IP
offload device that handles all kinds of IP-brainless activity (upto TCP
windows, retransmission) automatically without intervention of the main
processor. So the processor can be powered off while the 3172 continues
to do things.  Recent Intel architecture is moving in much the same
direction.  The PC network card can, for a long time now, wake up the
PC--though that isn't in the interrupt handler.

		--Dean



On Wed, 1 Sep 2010, Bill Bogstad wrote:

> On Wed, Sep 1, 2010 at 12:26 PM, Brian O'Neill <oneill at oinc.net> wrote:
> > I haven't experienced this, but there are many reasons why a linux box
> > (or solaris, etc.) can respond to a ping but nothing else is working -
> > out of process slots, file descriptors, etc.
> >
> > ping only means your network connectivity between here and there works,
> > and the box is at least powered on and the network stack is functioning.
> 
> As you said, all kinds of things can be broken and the basic network
> stack can still be up.   Even without a kernel crash, there is no
> reason to assume that because pings work that any
> user process is running.   What if init went on a rampage, killed all
> user processes, and went into a busy loop?  This isn't the same thing
> as a kernel crash, but is close enough to be the same
> for most purposes.  I vaguely recall reading about people doing this
> deliberately on Linux based routes which were being configured with
> static configurations.   Everything was stored on a floppy.  When you
> wanted to change the config you would pull the floppy, modify it, and
> reboot the router.  Everything ran in memory after the boot, so the
> floppy wasn't needed except at boot time.
> Not good for remote management, but made for a damn secure router config.
> 
> Anecdotally, I had to wipe the disks on some Linux machines a long
> time ago and did a "dd < /dev/zero > /dev/root &" from the console.
> It was kind of interesting to watch as the system slowly lost all
> knowledge of files, etc.  The shell was already in memory so it was
> happy to execute any command that was built in right up until dd
> finished.  I think I even did some of them via a remote ssh login.
> Still worked fine as the ssh daemon was already in memory as well.
> 
> Bill Bogstad
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
> 
> 

-- 
Av8 Internet   Prepared to pay a premium for better service?
www.av8.net         faster, more reliable, better service
617 256 5494




More information about the bblisa mailing list