[BBLISA] My notes from Nagios and SEC talk

Daniel Clark dclark at pobox.com
Thu Jan 11 12:34:13 EST 2007


Sort of garbled, but thought they might be of use to someone

Are the slides available anywhere?

= John Rouillard - Nagios and SEC =

== Nagios ==

Nagios, unlike some other FLOSS software, has Correlation - parents and others
 * Limited cause/effect detection

Don't use host_name in "define service" stanza -- use hostgroup_name instead!
 * Has test on each host where it looks up it's own name to make sure
dns is working on that host
 * Flap detection is problematical - he leaves it turned off
 * Nagios can put performance data "somewhere" - DB, RRD etc.
 * is_volitile useful in special cases
 * read the manual - twice

Correlation - find the fingerprint - only be notified of things that matter

Nagios 3 will support defining own variables - write up on hack to do
this now how to monitor SSL is on nagios-users list (find post)
check_ldaps
 * I think this post is:
http://article.gmane.org/gmane.network.nagios.user/40093/match=ldaps

Servicegroup - bundle of group of services that provide a
customer-visable server (e.g. db2, websphere app server, apache)

Serviceextifo/Hostextinfo going away in Nagios 3 -- info shifts to
becoming attributes of service and host objects

Nagios 3 in alpha now.

 * Nagios really a service monitoring program, not a host monitoring system

Many other monitoring projects are missing correlation.

Nagios 2 - host checks are done in series (In Nagios 3, they will be in paralel)

Correlation includes (slide) Topology, Thresholds, Service, Cluster
(meta) plugin, Flap detection (doesn't quite work, but SEC replaces
it)

Tricks:
 * Links to TWiki for a knowlege base for services, hosts, addl commands
 * Can change html pages - he has "Unack Svc Probs" - on call person
lives in this screen
 * Downtime scheduling

 * He uses cacti and rt integrated with twiki - interesting feature -
find last ticket in RT that mentions system
   * connect via (ajaxterm?)
   * look at nagios definitiaons
   * (cacti not from nagios - he doesn't like nagios for rrd suff - he
uses drraw instead)
   * Also have wiki pages for services
   * Nagios just has link - no dual-way automation, but don't really
need it in this case - wiki-side template for hosts and services do
exist however

== SEC ==
 * Is very passive
 * often times you may need to hook rule types together -- in groups
 * only useable in real time at the moment
 * can do everything that nagios does except topoplgy

Plugin talks to device, sec determines severity level, gives data back
to nagios (nagios not time aware, sec is)

 * He has created patch to Nagios that allows te active events to be
passed through to sec - patch is in beta this month, still 2 open
slots for more beta testers - beta period will last at least 2 months.

When used with nagios his patch adds:
 * counting ok states before reamrming
 * differeent triggers or polling interval on analysis of error not
just non-ok severity
 * changing trouble thresholds per time period/activity

 * SEC also monitors nagios log file - often this file will show
nagios configuration errors

Contexts
 * See ssh example in 2004 lisa paper (http://www.cs.umb.edu/~rouilj/sec/)

Nagios is good at "what is hapening now"; sec is good at figuring out
"how I got to now"
 * His patch will be released under GPL

 * Personal Website: www.cs.umb.edu/~rouilj

 * easy: passive service event -> nagios
 * trick here is getting active stream from Nagios

OpenNMS (in 2004) - didn't have good correlation compared to nagios,
and certainly not comperable to SEC
 * Does it have correlation now?
 * It used to have thresholding issues as well, and may still

ZenOSS:
 * He couldn't see correlation aspects that he really needed.

Temperature censors - lmsensors and smartcontrol can be used instead
of stand-alone devices in some cases

Some tricks:
 * Rack as host - if 3 boxes in rack have high temp, rack is overheated
 * Room as host - "room is on fire' alert if 3 racks have high temp
   * But really needed "room is underwater" alert :-)

 * Q: lots of host - does he manual edit? A: Yes, but working towards
defining every host once in config (his config mgmt app, akin to
cfengine/puppet/bcfg2/lcfg)
   * automation issue: Think of a host group as a set, nagios only has
set subtraction - makes automation very difficult
   * could just not use hostgroups, but then that makes the nagios web GUI suck
   * hostgroups for admin data

 * Groundworks stuff may be pretty good for automating config for lots
if machines - http://www.groundworkopensource.com/products/os-overview.html
   * Nagios 3 isn't going to push config into DB - Nagios 4 might.

 * Oreon graphical interface for nagios - out of france - might be
nice - http://www.oreon-project.org/

-- 
Daniel Clark # http://dclark.us # http://opensysadmin.com




More information about the bblisa mailing list