[BBLISA] UPS relative merits

Grant Young grant at toaster-repair.com
Thu Sep 1 17:59:44 EDT 2011


My experience is that you tend to get better resilience out of enterprise gear, presumably that's what you're paying the premium for.  On the other hand that can get expensive and so the cheap redundant hardware route is reasonable if you have the resources to swap things out to maintain your redundancy. You're trading some labor for hardware.  Philosophically, cheap hardware, redundancy, and open source are an agreeable combination.

As for disk reliability, clean power is your friend.  I worked in one datacenter where we had persistent disk failures in one array that went away when they installed a facility UPS.  They discovered that the array was on street power when wiring things up.  Clean power is one of the nice things about facility UPSes.  Saves a lot of wear and tear on your hardware.  When you got generators backing it up you get peace of mind.  (Our datacenter lost commercial power last weekend and it was a non-event for our business.)  

I haven't had much luck with the little UPSes.  Battery conditioning is kind of important and I don't think they do that very well and then they crap out when you need them most.  

My 2 cents… 

On Sep 1, 2011, at 11:04 AM, Rich Braun wrote:

> Edward Ned Harvey observed:
>> In the case of backups, I noticed two things.  Failure modes.
>> (1) ... some backups that couldn't
>> maintain 5 seconds of power, and still didn't alert me to bad batteries.
>> ...In other words, the backups caused more power outages than
>> they prevented for me.
> 
> Alas that's what I've concluded over a long career in data center management. 
> If you want high-quality battery backup, you have to go for high-end units
> (usually of the rack-mount or central hardwired variety costing $3000 to
> $300,000) and keep the batteries maintained far more often than most of us
> ever bother with.  Just a few weeks ago, I had a similar episode:  an
> APC-branded unit abruptly died with no advance warning, just a fault light,
> high-pitched alarm and loss of power.
> 
> These days I'm leaning more toward the Google/Facebook route:  go cheap on the
> hardware with consumer-grade stuff, design two-of-everything (or 3 or 4 or
> more) a la "Hal 9000" style with diverse cable routing and geographic
> separation so you can go into any data center, start yanking cables, and have
> utterly no impact on operations.  Expect lots of failures behind the scenes
> each year, but it costs a heckuva lot less and provides equivalent overall
> reliability.  My last dev/QA lab design included enough rack-mount UPS to
> operate only about 30% of the servers, letting the others die during power
> outages and forcing users/administrators to decide which machines are actually
> mission-critical.
> 
> Disk drives (even consumer-grade ones) are still sold with standard 5-year
> warranties so if you make the machines double-redundant (i.e. RAID1/RAID10 on
> each machine, plus disk clustering across 2 or more machines in separate
> locations, all of which can be done with open-source software and obsolete
> hardware if you have near-zero budget or with high-end new gear if you have a
> big budget) then you just keep shipping cartons handy to RMA failed drives as
> they crap out--swapping out the failed units very little labor effort, and if
> you standardize your drive capacities and keep some spares, it's even easier. 
> Works whether you have 10 machines or 10,000.
> 
> My larger point is that hardware redundancy and battery-backup serve two very
> different needs.  If you need to maintain all your machines through power
> outages then you need standby generation and high-end UPS.  If you have a home
> or desktop computer then you can get by with a low-end UPS but you should
> probably at the very least install software RAID1 on it.  If you're looking
> for the most cost-effective way to keep a roomful of computers in good repair,
> the cost of UPS (whether high-end or low-end) outweighs the cost of spares, if
> you set up an efficient fault-tolerant design and keep track of equipment
> warranties.
> 
> -rich
> 
> 
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa



More information about the bblisa mailing list