[BBLISA] UPS relative merits

Rich Braun richb at pioneer.ci.net
Thu Sep 1 11:04:38 EDT 2011


Edward Ned Harvey observed:
> In the case of backups, I noticed two things.  Failure modes.
>  (1) ... some backups that couldn't
> maintain 5 seconds of power, and still didn't alert me to bad batteries.
>...In other words, the backups caused more power outages than
> they prevented for me.

Alas that's what I've concluded over a long career in data center management. 
If you want high-quality battery backup, you have to go for high-end units
(usually of the rack-mount or central hardwired variety costing $3000 to
$300,000) and keep the batteries maintained far more often than most of us
ever bother with.  Just a few weeks ago, I had a similar episode:  an
APC-branded unit abruptly died with no advance warning, just a fault light,
high-pitched alarm and loss of power.

These days I'm leaning more toward the Google/Facebook route:  go cheap on the
hardware with consumer-grade stuff, design two-of-everything (or 3 or 4 or
more) a la "Hal 9000" style with diverse cable routing and geographic
separation so you can go into any data center, start yanking cables, and have
utterly no impact on operations.  Expect lots of failures behind the scenes
each year, but it costs a heckuva lot less and provides equivalent overall
reliability.  My last dev/QA lab design included enough rack-mount UPS to
operate only about 30% of the servers, letting the others die during power
outages and forcing users/administrators to decide which machines are actually
mission-critical.

Disk drives (even consumer-grade ones) are still sold with standard 5-year
warranties so if you make the machines double-redundant (i.e. RAID1/RAID10 on
each machine, plus disk clustering across 2 or more machines in separate
locations, all of which can be done with open-source software and obsolete
hardware if you have near-zero budget or with high-end new gear if you have a
big budget) then you just keep shipping cartons handy to RMA failed drives as
they crap out--swapping out the failed units very little labor effort, and if
you standardize your drive capacities and keep some spares, it's even easier. 
Works whether you have 10 machines or 10,000.

My larger point is that hardware redundancy and battery-backup serve two very
different needs.  If you need to maintain all your machines through power
outages then you need standby generation and high-end UPS.  If you have a home
or desktop computer then you can get by with a low-end UPS but you should
probably at the very least install software RAID1 on it.  If you're looking
for the most cost-effective way to keep a roomful of computers in good repair,
the cost of UPS (whether high-end or low-end) outweighs the cost of spares, if
you set up an efficient fault-tolerant design and keep track of equipment
warranties.

-rich





More information about the bblisa mailing list