[BBLISA] 10GBE NICS

Daniel Feenberg feenberg at nber.org
Wed Oct 10 09:51:52 EDT 2012



On Wed, 10 Oct 2012, John Stoffel wrote:

>>>>>> "Daniel" == Daniel Feenberg <feenberg at nber.org> writes:
>
>> We want to PXE boot a dozen compute and file servers with over 10GBE
>> ethernet. All of them boot fine with the motherboard NICs. We have
>> a Brocade Ironport switch and a dozen direct attach cables. We also
>> have  samples of 3 brands of 10GBE NICs to test.
>
> Why are you booting file servers via PXE?  Shouldn't the be your core
> servers which stay up all the time and provide services?
>

Until now we haven't had any trouble booting PXE, and it makes it
easy to keep the systems up to date and consistent. For FreeBSD we
posted some notes at http://www.nber.org/sys-admin/FreeBSD-diskless.html

>> 1) The HP card boots correctly when PXE is enabled on the NIC.
>
>> 2) Chelsio N320E gives a "PXE-E61 - Media Failure" error and
>>     the NIC link light never comes on.
>
> It could be a bad card, how many have you tested?
>

We actually tried several of the Chelsio cards.

>> 3) Brocade 1010 - says "Adaptor 1/0 link initialization failed.
>>     Disabling BIOS" or "No target devices or link down or init
>>     failed" depending on the NIC BIOS setting. Again, the NIC link
>>     light does not come on.
>
>> We discount a bad cable (it works with the HP, and we have tried
>> several) or a motherboard incompatibility (if we boot RHEL from
>> a local drive and enable eth2 we can use the Chelsio or Brocade
>> cards). Is there some configuration issue we are missing? Chelsio
>> support did not offer a solution, we haven't contacted Brocade yet.
>
> I'm not sure you can discount motherboard issues, since the PXE stuff
> is directly tied into the BIOS and how the BIOS initialized the card
> and how the on-card BIOS handles PXE support.
>

Something to try, but we tried the Chelsio card in many machines, and the 
Brocade in 2.

>> The motherboard is a Gigabyte GA-P55M-UB2. All the cards have the
>> latest firmware.
>
> Is the motherboard at the latest level of firmware?
>

no.

>> Since the failure occurs before any packets are sent it can't
>> be a dhcpd or tftp problem. Is the problem that some cards
>> offer less than full support for direct attach? Or is direct
>> attach not fully standardized? Should be try fiber optic
>> cables? The documentation for all the cards suggests that
>> their primary purpose is ethernet SANs. Perhaps the vendors
>> don't care about other uses?
>
>> We only need a single port per server, while the HP offers
>> 2. Because of heat, power and cost reasons, we would prefer a
>> single port card.
>
> How much have you spent already in terms of your time debugging this?
> I'd just got with the HP cards and get on with life.  Get the systems
> up and running and then keep playing with other cards in the lab.

Several days wasted.

>
> I don't think you're going to notice heat and power by just having a
> second port on the system which isn't used.  It will be in the noise I
> suspect.
>

The HP card is too hot to touch.

>> Any wisdom greatly appreciated. This is our first experience
>> with 10GBE.  We could boot over 1Gb and then switch to 10GBE
>> for file service, but we wish to reduce the amount of cabling.
>
> Are you also running a management network on these systems for IPMI or
> remote console stuff?  Could you boot off that?

We probably could if we knew how.

>
>> Two odd details - even booting from the local drive the
>> Chelsio card fails on intensive use if IRQPOLL is enabled. An
>> additional advantage of the HP card is that brief network
>> interuptions do not affect it, while the Chelsio card will
>> hang the computer if the switch reboots or a cable is moved
>> from one port to another.
>
> Sounds like you need to toss that Chelsio card out the window, it's
> not reliable and you'll just end up with all kinds of black magic
> voodoo.
>
> Now of course it could just be a motherboard interaction issue.  Have
> you tried another brand of motherboard?
>

We have tried multiple motherboards. I expect it is a configuration issue, 
but not one that support is willing to reveal. Or maybe Brocade will
fess up.

I found that someone with an Emulex card has the same problem we see (no 
packets move for PXE, otherwise works) - 
http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01574.html

So this seems to be a generic problem with most 10GBE NICS. Mysterious.

dan feenberg

> John
>
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
>



More information about the bblisa mailing list