[BBLISA] PE2950 and Linux virtualization

John Stoffel john at stoffel.org
Fri Jun 6 10:50:24 EDT 2008


Scott> What are people's experiences with CentOS 5.0 64-bit installed
Scott> on dual 3 Ghz quad-core PE2950 systems with 32 GB RAM each,
Scott> high-performance computing (applications that tax both the CPUs
Scott> and RAM), not currently in a Beowolf cluster but could adapt to
Scott> that, and doing so with VMWare or other vitualization software
Scott> vs activity being done directly in the OS?

I work at a place which has racks of dual Opteron boxes with 16gb of
RAM, and others with 4 cpu, 4 core, 128gb memory machines, etc.  We're
doing ASIC design and simulations, so speed/memory is important to
us.  

Scott> How much of a performance hit, or gain (I'd presume hit), does
Scott> virtualization cause an application, resulting in what
Scott> percentage poorer or better (I'd presume poorer) performance vs
Scott> dealing directly with the OS?

Umm... why do you want to virtualize compute nodes?  What are you
trying to achieve?  

Scott> It would be nice to have a VM perform some work, and if a
Scott> person's code or application breaks, have it take down a VM
Scott> while keeping a machine up, and not affecting other people's
Scott> work.

Umm... generally, if code breaks in userspace, the OS won't crash.
We've never experienced user code taking down one of our boxes and we
do lots of runs here, with systems up and running for months at a
time, with hundreds or thousands of jobs running through them.

In general, they go down due to hardware problems, not software.  Esp
with compute jobs.  

Why do you think the entire system will go down when someone's code
breaks?  Or are you worried that someone will write code which fills
up all the memory on a machine due to a bug?  In that case, resource
limits and strict overcommit limits is the way to go.

Scott> It may also depend on if an application or code is written
Scott> directly with/for the physical cpu/hardware vs more general use
Scott> (VM).

Scott> Thanks for insights and experiences.

Personally, I wouldn't bother to Virtualize my compute cluster at
all.  I'd just put it all into a batch scheduling system (Condor, Sun
Grid Engine, LSF (if you have money)) and let the batch system load
level the resources.

John




More information about the bblisa mailing list