[BBLISA] bblisa Digest, Vol 117, Issue 1

Marc Chiarini (school) chiarini at seas.harvard.edu
Fri Aug 9 16:19:29 EDT 2013


There is a very important academic & practical discussion to be had about
this.  In fact Alva Couch and I and others have been examining similar
topics for years.  Unfortunately I don't have the bandwidth right now to
get into it, perhaps in a few months.  I'll leave you with these two
tidbits:  thresholds are no good in these circumstances (except as a coarse
lower/upper bound)...you need to combine learning (small amounts of
hysteresis) and highly reactive management.  Second, one might be able to
obtain unrefined but useful estimates of performance in various components
(e.g., cpu, disk, network, etc) without an agent -- via analysis of
response-time and other statistics...essentially building a black-box model
over time of how the system is *expected* to work.

Regards,
Marc

On Sun, Aug 4, 2013 at 12:00 PM, <bblisa-request at bblisa.org> wrote:

> Send bblisa mailing list submissions to
>         bblisa at bblisa.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.bblisa.org/mailman/listinfo/bblisa
> or, via email, send a message with subject or body 'help' to
>         bblisa-request at bblisa.org
>
> You can reach the person managing the list at
>         bblisa-owner at bblisa.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of bblisa digest..."
>
>
> Today's Topics:
>
>    1. statistics-based zero config network management: why doesnt
>       this exist? (Alex Aminoff)
>    2. Re: statistics-based zero config network management: why
>       doesnt this exist? (Brian O'Neill)
>    3. Re: statistics-based zero config network management: why
>       doesnt this exist? (kurin at delete.org)
>    4. Re: statistics-based zero config network management: why
>       doesnt this exist? (Matt Simmons)
>    5. Re: statistics-based zero config network management: why
>       doesnt this exist? (Edward Ned Harvey (bblisa4))
>    6. Re: statistics-based zero config network management: why
>       doesnt this exist? (Brian O'Neill)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 03 Aug 2013 15:52:41 -0400
> From: Alex Aminoff <alex at basespace.net>
> Subject: [BBLISA] statistics-based zero config network management: why
>         doesnt this exist?
> To: bblisa at bblisa.org
> Message-ID: <51FD5F89.2010907 at basespace.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> I'm looking at SNMP-based network monitoring systems: cacti, zabbix,
> some other similar ones. All of them seem to require you to configure
> your devices on the system. There are some auto-discovery functions, but
> they only work if you have loaded up the "profile" or "template" for
> your particular network hardware.
>
> So why is this necessary? Suppose instead there was a network monitoring
> system that worked like this:
>
>   - Find any SNMP device on your subnet
>   - Walk its SNMP tree, collecting all data, no matter what it is:
> interface counters, manufacturer's serial number, I dont care
>   - Save this data in some sort of time series storage, like RRD
>   - Then use statistics to throw an alert when a new value (or more
> likely a group of new values) differs sufficiently in statistical terms
> from the history of that value.
>
> The great thing about this plan is you don't need to configure in
> advance the MIBs and OIDs. When an alert happens, the system can include
> the OID in the message. A human can then look it up or otherwise deal.
>
> There will be false positives, but one should be able to filter those
> out once they happen. A real network problem in my experience involved
> some values jumping from 0-1-2-0 to 1,234,567 so you can dial the
> sensitivity way down on the statistical tests.
>
> My question is, why does this not exist? Is there some reason I have
> overlooked why this would be impractical? Or does it exist and I just
> have not found it?
>
>   - Alex
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 03 Aug 2013 16:09:05 -0400
> From: Brian O'Neill <oneill at oinc.net>
> Subject: Re: [BBLISA] statistics-based zero config network management:
>         why doesnt this exist?
> To: bblisa at bblisa.org
> Message-ID: <51FD6361.8020606 at oinc.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> 90% of the data available in SNMP isn't generally relevant, and it can
> be massive on some systems and take a long time to poll.
>
> Zenoss seems to have a decent auto-discovery system, and will do its
> best to detect the type of system and apply a template. That template
> defines what is relevant to monitor. I personally didn't like it as it
> was rather complicated to do anything "out of the box", and the kind of
> monitoring I generally dealt with needed more flexibility (based on my
> evaluation).
>
> Aside from auto-discovery, I like cacti for its presentation and seeing
> trend data. I like nagios for its flexibility and notification handling,
> but I think it requires a lot of scripting to use effectively. I may be
> investigating soon some autodiscovery-type systems, or design one on my
> own...I do like that nagios can be provisioned by outside systems - even
> multiple ones.
>
>
> On 8/3/2013 3:52 PM, Alex Aminoff wrote:
> >
> > I'm looking at SNMP-based network monitoring systems: cacti, zabbix,
> > some other similar ones. All of them seem to require you to configure
> > your devices on the system. There are some auto-discovery functions, but
> > they only work if you have loaded up the "profile" or "template" for
> > your particular network hardware.
> >
> > So why is this necessary? Suppose instead there was a network monitoring
> > system that worked like this:
> >
> >    - Find any SNMP device on your subnet
> >    - Walk its SNMP tree, collecting all data, no matter what it is:
> > interface counters, manufacturer's serial number, I dont care
> >    - Save this data in some sort of time series storage, like RRD
> >    - Then use statistics to throw an alert when a new value (or more
> > likely a group of new values) differs sufficiently in statistical terms
> > from the history of that value.
> >
> > The great thing about this plan is you don't need to configure in
> > advance the MIBs and OIDs. When an alert happens, the system can include
> > the OID in the message. A human can then look it up or otherwise deal.
> >
> > There will be false positives, but one should be able to filter those
> > out once they happen. A real network problem in my experience involved
> > some values jumping from 0-1-2-0 to 1,234,567 so you can dial the
> > sensitivity way down on the statistical tests.
> >
> > My question is, why does this not exist? Is there some reason I have
> > overlooked why this would be impractical? Or does it exist and I just
> > have not found it?
> >
> >    - Alex
> >
> > _______________________________________________
> > bblisa mailing list
> > bblisa at bblisa.org
> > http://www.bblisa.org/mailman/listinfo/bblisa
> >
>
>
>
> ------------------------------
>
> Message: 3
> Date: Sat, 3 Aug 2013 20:13:26 +0000
> From: kurin at delete.org
> Subject: Re: [BBLISA] statistics-based zero config network management:
>         why doesnt this exist?
> To: Alex Aminoff <alex at basespace.net>
> Cc: bblisa at bblisa.org
> Message-ID: <20130803201326.GC4490 at delete.org>
> Content-Type: text/plain; charset=us-ascii
>
> I've toyed with the idea of applying machine learning to syslog alerts,
> trying to predict failures, but I never got off the ground.  The whole
> thing has to be unsupervised, unless you're willing to sit there
> classifying every event.
>
> On Sat, Aug 03, 2013 at 03:52:41PM -0400, Alex Aminoff wrote:
> >
> > I'm looking at SNMP-based network monitoring systems: cacti, zabbix,
> > some other similar ones. All of them seem to require you to configure
> > your devices on the system. There are some auto-discovery functions, but
> > they only work if you have loaded up the "profile" or "template" for
> > your particular network hardware.
> >
> > So why is this necessary? Suppose instead there was a network monitoring
> > system that worked like this:
> >
> >   - Find any SNMP device on your subnet
> >   - Walk its SNMP tree, collecting all data, no matter what it is:
> > interface counters, manufacturer's serial number, I dont care
> >   - Save this data in some sort of time series storage, like RRD
> >   - Then use statistics to throw an alert when a new value (or more
> > likely a group of new values) differs sufficiently in statistical terms
> > from the history of that value.
> >
> > The great thing about this plan is you don't need to configure in
> > advance the MIBs and OIDs. When an alert happens, the system can include
> > the OID in the message. A human can then look it up or otherwise deal.
> >
> > There will be false positives, but one should be able to filter those
> > out once they happen. A real network problem in my experience involved
> > some values jumping from 0-1-2-0 to 1,234,567 so you can dial the
> > sensitivity way down on the statistical tests.
> >
> > My question is, why does this not exist? Is there some reason I have
> > overlooked why this would be impractical? Or does it exist and I just
> > have not found it?
> >
> >   - Alex
> >
> > _______________________________________________
> > bblisa mailing list
> > bblisa at bblisa.org
> > http://www.bblisa.org/mailman/listinfo/bblisa
> >
>
>
>
> ------------------------------
>
> Message: 4
> Date: Sat, 3 Aug 2013 19:10:32 -0400
> From: Matt Simmons <bandman at gmail.com>
> Subject: Re: [BBLISA] statistics-based zero config network management:
>         why doesnt this exist?
> To: kurin at delete.org
> Cc: bblisa at bblisa.org, Alex Aminoff <alex at basespace.net>
> Message-ID:
>         <CAL0sVA_FZ-Qbo8iZMe-NrFKOQjodrGPW2r1mC1B-mBwar=
> CgZw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Have you looked into any of the Windows-based solutions like Spiceworks
> (free ad-supported)? They do an amazing job with autodiscovery, not just of
> SNMP-enabled devices, but also UNIX/Linux and other Windows machines. I've
> been impressed, although I've never actually found the tools fit into my
> workflow, I appreciate what they do.
>
> --Matt
>
>
>
> On Sat, Aug 3, 2013 at 4:13 PM, <kurin at delete.org> wrote:
>
> > I've toyed with the idea of applying machine learning to syslog alerts,
> > trying to predict failures, but I never got off the ground.  The whole
> > thing has to be unsupervised, unless you're willing to sit there
> > classifying every event.
> >
> > On Sat, Aug 03, 2013 at 03:52:41PM -0400, Alex Aminoff wrote:
> > >
> > > I'm looking at SNMP-based network monitoring systems: cacti, zabbix,
> > > some other similar ones. All of them seem to require you to configure
> > > your devices on the system. There are some auto-discovery functions,
> but
> > > they only work if you have loaded up the "profile" or "template" for
> > > your particular network hardware.
> > >
> > > So why is this necessary? Suppose instead there was a network
> monitoring
> > > system that worked like this:
> > >
> > >   - Find any SNMP device on your subnet
> > >   - Walk its SNMP tree, collecting all data, no matter what it is:
> > > interface counters, manufacturer's serial number, I dont care
> > >   - Save this data in some sort of time series storage, like RRD
> > >   - Then use statistics to throw an alert when a new value (or more
> > > likely a group of new values) differs sufficiently in statistical terms
> > > from the history of that value.
> > >
> > > The great thing about this plan is you don't need to configure in
> > > advance the MIBs and OIDs. When an alert happens, the system can
> include
> > > the OID in the message. A human can then look it up or otherwise deal.
> > >
> > > There will be false positives, but one should be able to filter those
> > > out once they happen. A real network problem in my experience involved
> > > some values jumping from 0-1-2-0 to 1,234,567 so you can dial the
> > > sensitivity way down on the statistical tests.
> > >
> > > My question is, why does this not exist? Is there some reason I have
> > > overlooked why this would be impractical? Or does it exist and I just
> > > have not found it?
> > >
> > >   - Alex
> > >
> > > _______________________________________________
> > > bblisa mailing list
> > > bblisa at bblisa.org
> > > http://www.bblisa.org/mailman/listinfo/bblisa
> > >
> >
> > _______________________________________________
> > bblisa mailing list
> > bblisa at bblisa.org
> > http://www.bblisa.org/mailman/listinfo/bblisa
> >
>
>
>
> --
> "Today, vegetables... Tomorrow, the world!"
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.bblisa.org/pipermail/bblisa/attachments/20130803/2a007a0d/attachment.html
>
> ------------------------------
>
> Message: 5
> Date: Sun, 4 Aug 2013 12:21:43 +0000
> From: "Edward Ned Harvey (bblisa4)" <bblisa4 at nedharvey.com>
> Subject: Re: [BBLISA] statistics-based zero config network management:
>         why doesnt this exist?
> To: Alex Aminoff <alex at basespace.net>, "bblisa at bblisa.org"
>         <bblisa at bblisa.org>
> Message-ID:
>         <
> 54dc388cccfc4970a726fdc3bad7c891 at BLUPR04MB040.namprd04.prod.outlook.com>
>
> Content-Type: text/plain; charset="us-ascii"
>
> > From: bblisa-bounces at bblisa.org [mailto:bblisa-bounces at bblisa.org] On
> > Behalf Of Alex Aminoff
> >
> > I'm looking at SNMP-based network monitoring systems: cacti, zabbix,
> > some other similar ones. All of them seem to require you to configure
> > your devices on the system. There are some auto-discovery functions, but
> > they only work if you have loaded up the "profile" or "template" for
> > your particular network hardware.
>
> I don't think that's correct.  I think SNMP auto discover does exactly as
> you said.  It just walks the system, discovers whatever it can discover,
> and there you have it.
>
> The thing is:  Very rarely is SNMP sufficient.  For most devices, it
> counts no more than a ping monitor.  If you want reliable statistics of
> cpu, disk, network, memory usage, you have to install an agent.  I
> emphasize reliable.  Because although SNMP technically supports all that,
> I've never seen it usable for that purpose.
>
>
>
> ------------------------------
>
> Message: 6
> Date: Sun, 04 Aug 2013 10:14:24 -0400
> From: Brian O'Neill <oneill at oinc.net>
> Subject: Re: [BBLISA] statistics-based zero config network management:
>         why doesnt this exist?
> To: bblisa at bblisa.org
> Message-ID: <51FE61C0.2040600 at oinc.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 8/4/2013 8:21 AM, Edward Ned Harvey (bblisa4) wrote:
> > The thing is:  Very rarely is SNMP sufficient.  For most devices, it
> counts no more than a ping monitor.  If you want reliable statistics of
> cpu, disk, network, memory usage, you have to install an agent.  I
> emphasize reliable.  Because although SNMP technically supports all that,
> I've never seen it usable for that purpose.
> >
>
> I use it all the time for network, disk, CPU and memory monitoring on my
> Linux boxes using Net-SNMP.
>
> Windows, on the other hand, is more of a problem. SNMP out of the box on
> Windows only exposes network info. You can add SNMP-Informant - the free
> version adds disk space, CPU and memory, but the memory monitoring isn't
> terribly useful from what I've found. And I'm running into problems with
> reliability, not due to SNMP itself, but Windows implementation of it.
> On some systems, it is just really slow at times getting even a small
> amount of info, like space used on a single volume system. It also
> doesn't appear to be a complete version 2 implementation (does not
> support getbulkrequest). Windows doesn't seem to want to support
> anything for monitoring but their own, like WMI, and even then they seem
> non-committal - we investigated Exchange 2010 (or maybe 2007 - I forget
> how long ago) monitoring via WMI, and there were indications they didn't
> plan on providing the data...I think they did eventually provide it. But
> WMI can also be slow when accessing remotely, and sometimes requires
> elevated credentials that depending on the local bureaucracy might not
> be possible.
>
> On network devices, it depends on the manufacturer, but most of the big
> ones will give you decent info. Finding the MIBs to know what the info
> actually is can be a challenge.
>
>
>
> ------------------------------
>
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
>
> End of bblisa Digest, Vol 117, Issue 1
> **************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.bblisa.org/pipermail/bblisa/attachments/20130809/527a5187/attachment-0001.htm 


More information about the bblisa mailing list