[BBLISA] slow wan link

Edward Ned Harvey bblisa4 at nedharvey.com
Fri Jun 8 08:35:34 EDT 2012


> From: Bill Bogstad [mailto:bogstad at pobox.com]
> Sent: Thursday, June 07, 2012 11:35 AM
> 
> I'm going to have to disagree with this.   A congested link SHOULD
> drop TCP packets so that congestion control knows to slow down.
> It's actually this thinking which results in deploying equipment and
> software that creates buffer bloat.   

What are you calling buffer bloat?  You should indeed have some buffer, and
you should indeed use it to keep the pipe fully utilized.  When the buffer
gets full, it's because the total outbound traffic of all the LAN machines
is higher than the WAN bandwidth.  Assuming the LAN users are sending via
TCP, the router responds on the LAN with flow control, so the LAN TCP sender
slows down.  If the LAN sender is sending UDP, then packet drops, and
behavior is expected by UDP sender.

I have seen so many networks where somebody implemented QoS to drop packets
under certain circumstances, and I'm always called in to diagnose why some
application is so unreliable.  Let's imagine the packet you drop was the DNS
query for www.microsoft.com.  Then what happens is the user who's trying to
browse that webpage experiences either a 2 minute timeout, or a "Page Not
Found" error.  I can't tell you how many times I've been called in to figure
out this type of problem, and it's almost always when some newbie turns on
QoS on the WAN connection.  

My response is like this:  First of all, it's not a reproducible problem.
So the first time I encountered this, it took many hours of me randomly
encountering failures and timeouts, trying to find the cause of the problem.
The second time it happened, I still wasn't expecting it.  But now, I've
seen it so many times, that whenever I experience weird random timeouts or
various forms of network disconnections (such as ssh disconnections, X
disconnections, or the failure for webex or glance or VPN client to get
connected or stay connected) ... I have learned to always follow this
process:  I start up some transfer, and run either a ping monitor or a TCP
ping monitor (syn/synack/ack/disconnect, repeat) to some service.  I start
another file transfer, continue monitoring, start another one, continue
monitoring, etc.  As soon as I start seeing packet loss on any of these
metrics, I know the cause of the problem.

So far, I've never been wrong.  So far, every single time, I get into some
big argument with some network administrator somewhere, and we have to
escalate to his boss, and then to his boss's boss, and eventually we agree
to implement a test scenario, and after that, problem is fixed.


> if you don't drop packets you end up with
> retransmitted copies of the same TCP packet sitting in equipment
> buffers which doesn't help anyone.  

If your router is doing some flow control, you don't need to retransmit.
The TCP client knows the packet it sent last is in queue, until the
backpressure is released, and now it's free to send another packet.


> You not only get lousy latency
> (due to long queuing times in equipment buffers), 

Long queueing times in buffer, eh?  I suppose that depends on the size of
the buffer.  If you allow 10 clients to all queue up 1MB, and then another
client injects a 64byte packet, then I agree, your poor little 64byte packet
will see a long latency.  Maybe some routers are dumb and allow these huge
buffers, regardless of the apparent bandwidth on the WAN side, but I haven't
seen one yet.  Maybe I'm just lucky, at buying good brands of routers.


> but you also get
> lower effective bandwidth (since the replicated packets are a waste of
> bandwidth when you need it most  (under overload conditions)).

There should be no need to retransmit, unless you're dropping packets.



More information about the bblisa mailing list