[BBLISA-jobs] Part time contracting in Marlborough, Massachusetts

Konrad Rzeszutek Wilk bblisa at darnok.org
Wed May 6 20:47:37 EDT 2015


The Xen Project has a number of internet-facing systems and also an
automated test facility.

We need someone to do our system administration.  We have several
weeks' worth of setup/fixup work, with more later in the year, and
probably ~0.5-1 day a week ongoing admin tasks.

We would like to contract a small company, ideally.  If we hire an
individual we will need some kind of backup for when they're
away/sick/whatever, because we don't have anyone else physically
nearby.

We need someone who is physically near enough Marlborough, MA, that
they can easily visit our rack there.


[...]

The resources to be managed in more detail:


1. Internet facing systems

About half a dozen Rackspace VMs (Rackspace give us complimetary
hosting), running Debian (mostly wheezy right now), performing a
variety of tasks:

 * source code hosting (shell accounts, git, gitweb, some hg,
   homegrown commit email generator)
 * blog hosting (wordpress)
 * wiki (mediawiki)
 * role mail aliases, dns, etc.
 * mailing lists (mailman, mhonarc)
 * a few other minor VMs

We currently have no automation (!) for these and due to a shortage of
effort have been doing essential things only.

Obvious tasks that need doing:

 * Review how everything is set up, fix any obvious problem
 * Set up some automation
 * Set up some monitoring
 * Backups have been done by Citrix IT staff which ought to be fixed
 * Ongoing maintenance (eg upgrades to jessie in due course),
firefighting, etc.


2. The test facility:

This is a single rack in Earthlink's data centre in Marlborough.
We plan to add a second rack this year.

In terms of hardware:

  - Two moderate-sized servers running Xen with VMs for
    infrastructure services, test controller, database, etc.
    These servers have Rocketport multiport serial cards to
    provide serial console logging etc.
  - External (8-port) and internal (48-port) managed switches
  - APC PDUs
  - 24 x86 test boxes: 12 different kinds of machine, in pairs
  - One 4U homebrew ARM crate containing 4x arndale and 4x cubietruck
    devboards, PDU relay board, etc. etc.
  - Rack is not very tidy; previous hardware installers were less than
    ideal.

Software:

  - VMs are Debian wheezy.
  - Controller VM runs homegrown test system called `osstest'
  - Also: postgresql, dhcp, apache2 (for publishing logs)
  - I have a minimal ansible setup to do the things that I
    needed to do

Tasks that need doing:

  - Help with the procurement of a second round of test hardware,
    including specifying support equipment (more switches, cable
    management, PDUs), and perhaps help with specification of
    test boxes.
  - Install the second round of test hardware in the second rack
  - Review the physical organisation of the rack and decide what needs
    to be fixed and how, and work with us to fix it (given that this
    is a live system).
  - Double-check the switch and pdu configurations
  - Review the software organisation and decide what needs fixing,
    etc.
  - Make sure that we have proper backups (!) and do a test restore.
  - Check our arrangements for manual failover to running on only one
    of the two servers if one of the servers should die.  (Our uptime
    requirement is fairly low but we wouldn't want to be down for days
    or weeks.)  Actually test the failover.



More information about the bblisa-jobs mailing list