[BBLISA] My notes from Nagios and SEC talk

johnmalloy at comcast.net johnmalloy at comcast.net
Thu Jan 11 14:12:18 EST 2007


I appreciate it!

Are there any slides available for the talk?

--
Thanks!

John Malloy
johnmalloy AT comcast DOT net

 -------------- Original message ----------------------
From: "Daniel Clark" <dclark at pobox.com>
> Sort of garbled, but thought they might be of use to someone
> 
> Are the slides available anywhere?
> 
> = John Rouillard - Nagios and SEC =
> 
> == Nagios ==
> 
> Nagios, unlike some other FLOSS software, has Correlation - parents and others
>  * Limited cause/effect detection
> 
> Don't use host_name in "define service" stanza -- use hostgroup_name instead!
>  * Has test on each host where it looks up it's own name to make sure
> dns is working on that host
>  * Flap detection is problematical - he leaves it turned off
>  * Nagios can put performance data "somewhere" - DB, RRD etc.
>  * is_volitile useful in special cases
>  * read the manual - twice
> 
> Correlation - find the fingerprint - only be notified of things that matter
> 
> Nagios 3 will support defining own variables - write up on hack to do
> this now how to monitor SSL is on nagios-users list (find post)
> check_ldaps
>  * I think this post is:
> http://article.gmane.org/gmane.network.nagios.user/40093/match=ldaps
> 
> Servicegroup - bundle of group of services that provide a
> customer-visable server (e.g. db2, websphere app server, apache)
> 
> Serviceextifo/Hostextinfo going away in Nagios 3 -- info shifts to
> becoming attributes of service and host objects
> 
> Nagios 3 in alpha now.
> 
>  * Nagios really a service monitoring program, not a host monitoring system
> 
> Many other monitoring projects are missing correlation.
> 
> Nagios 2 - host checks are done in series (In Nagios 3, they will be in paralel)
> 
> Correlation includes (slide) Topology, Thresholds, Service, Cluster
> (meta) plugin, Flap detection (doesn't quite work, but SEC replaces
> it)
> 
> Tricks:
>  * Links to TWiki for a knowlege base for services, hosts, addl commands
>  * Can change html pages - he has "Unack Svc Probs" - on call person
> lives in this screen
>  * Downtime scheduling
> 
>  * He uses cacti and rt integrated with twiki - interesting feature -
> find last ticket in RT that mentions system
>    * connect via (ajaxterm?)
>    * look at nagios definitiaons
>    * (cacti not from nagios - he doesn't like nagios for rrd suff - he
> uses drraw instead)
>    * Also have wiki pages for services
>    * Nagios just has link - no dual-way automation, but don't really
> need it in this case - wiki-side template for hosts and services do
> exist however
> 
> == SEC ==
>  * Is very passive
>  * often times you may need to hook rule types together -- in groups
>  * only useable in real time at the moment
>  * can do everything that nagios does except topoplgy
> 
> Plugin talks to device, sec determines severity level, gives data back
> to nagios (nagios not time aware, sec is)
> 
>  * He has created patch to Nagios that allows te active events to be
> passed through to sec - patch is in beta this month, still 2 open
> slots for more beta testers - beta period will last at least 2 months.
> 
> When used with nagios his patch adds:
>  * counting ok states before reamrming
>  * differeent triggers or polling interval on analysis of error not
> just non-ok severity
>  * changing trouble thresholds per time period/activity
> 
>  * SEC also monitors nagios log file - often this file will show
> nagios configuration errors
> 
> Contexts
>  * See ssh example in 2004 lisa paper (http://www.cs.umb.edu/~rouilj/sec/)
> 
> Nagios is good at "what is hapening now"; sec is good at figuring out
> "how I got to now"
>  * His patch will be released under GPL
> 
>  * Personal Website: www.cs.umb.edu/~rouilj
> 
>  * easy: passive service event -> nagios
>  * trick here is getting active stream from Nagios
> 
> OpenNMS (in 2004) - didn't have good correlation compared to nagios,
> and certainly not comperable to SEC
>  * Does it have correlation now?
>  * It used to have thresholding issues as well, and may still
> 
> ZenOSS:
>  * He couldn't see correlation aspects that he really needed.
> 
> Temperature censors - lmsensors and smartcontrol can be used instead
> of stand-alone devices in some cases
> 
> Some tricks:
>  * Rack as host - if 3 boxes in rack have high temp, rack is overheated
>  * Room as host - "room is on fire' alert if 3 racks have high temp
>    * But really needed "room is underwater" alert :-)
> 
>  * Q: lots of host - does he manual edit? A: Yes, but working towards
> defining every host once in config (his config mgmt app, akin to
> cfengine/puppet/bcfg2/lcfg)
>    * automation issue: Think of a host group as a set, nagios only has
> set subtraction - makes automation very difficult
>    * could just not use hostgroups, but then that makes the nagios web GUI suck
>    * hostgroups for admin data
> 
>  * Groundworks stuff may be pretty good for automating config for lots
> if machines - http://www.groundworkopensource.com/products/os-overview.html
>    * Nagios 3 isn't going to push config into DB - Nagios 4 might.
> 
>  * Oreon graphical interface for nagios - out of france - might be
> nice - http://www.oreon-project.org/
> 
> -- 
> Daniel Clark # http://dclark.us # http://opensysadmin.com
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa




More information about the bblisa mailing list