Large number of http connections immediately dropped

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Large number of http connections immediately dropped

Bugzilla from astrange@ithinksw.com
We're running a rather high-load webserver using FreeBSD 7-RELEASE/
amd64/nginx on an Intel em gigabit connection.
Performance is good for our current bandwidth use (about 20Mbit and  
~2000 connections/sec at the moment), but a large number of HTTP  
requests are being immediately dropped before getting to nginx. I see  
complaints about this with earlier versions of FreeBSD - http://forum.lighttpd.net/topic/171 
  - but no solutions. Does anyone know what could be the problem, or  
anything we could do about it?

There are several other servers running earlier FreeBSDs on i386 which  
don't seem to have this problem, but I still haven't ruled out  
upstream hardware problems or Sandvine yet.

On the server:
-nginx's error log is full of "accept() failed (53: Software caused  
connection abort)", sometimes printing three or four at the same time.

-messages is full of:
Limiting open port RST response from 441 to 200 packets/sec
Limiting open port RST response from 488 to 200 packets/sec
Limiting open port RST response from 399 to 200 packets/sec
Limiting open port RST response from 434 to 200 packets/sec
Limiting open port RST response from 308 to 200 packets/sec
I'm not sure if that's related or not.

-sysctl.conf:

net.inet.tcp.tso=1
kern.ipc.somaxconn=10240
kern.ipc.nmbclusters=65536
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.rfc1323=1
kern.ipc.maxsockbuf=262144
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.tcp.msl=7500
net.inet.icmp.icmplim=400
net.inet.tcp.drop_synfin=1
net.inet.tcp.icmp_may_rst=0
net.inet.tcp.fast_finwait2_recycle=1

-netstat -m:
4677/6603/11280 mbufs in use (current/cache/total)
1017/2643/3660/65536 mbuf clusters in use (current/cache/total/max)
1017/1961 mbuf+clusters out of packet secondary zone in use (current/
cache)
9/514/523/12800 4k (page size) jumbo clusters in use (current/cache/
total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
3239K/8992K/12232K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
9204 requests for I/O initiated by sendfile
0 calls to protocol drain routines

nginx is not running any accept filters.

Locally, after sending an HTTP request, I get a normal connection  
close, then one RST with sequence 1, then another (possibly more than  
one) RST with sequence 2. I can post a tcpdump sequence if necessary,  
after I sanitize some cookies away.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

István-3
Hi!

Something to read:

http://people.freebsd.org/~hmp/utilities/satbl/sysctl-net.html

I have these in the sysctl.conf

kern.ipc.somaxconn=4096
net.inet.tcp.recvspace=78840
net.inet.tcp.sendspace=78840
kern.ipc.shmmax=67108864
kern.ipc.shmmni=200
kern.ipc.shmseg=128
kern.ipc.semmni=70

net.local.stream.sendspace=82320
net.local.stream.recvspace=82320
net.inet.tcp.local_slowstart_flightsize=10
net.inet.tcp.nolocaltimewait=1
net.inet.tcp.hostcache.expire=3900

and the loader.conf

kern.maxusers=512

kern.ipc.nmbclusters=32768
kern.ipc.maxsockets=81920
kern.ipc.maxsockbuf=1048576

net.inet.tcp.tcbhashsize=4096
net.inet.tcp.hostcache.hashsize=1024

Regards,
Istvan



Alexander Strange wrote:

> We're running a rather high-load webserver using FreeBSD
> 7-RELEASE/amd64/nginx on an Intel em gigabit connection.
> Performance is good for our current bandwidth use (about 20Mbit and
> ~2000 connections/sec at the moment), but a large number of HTTP
> requests are being immediately dropped before getting to nginx. I see
> complaints about this with earlier versions of FreeBSD -
> http://forum.lighttpd.net/topic/171 - but no solutions. Does anyone
> know what could be the problem, or anything we could do about it?
>
> There are several other servers running earlier FreeBSDs on i386 which
> don't seem to have this problem, but I still haven't ruled out
> upstream hardware problems or Sandvine yet.
>
> On the server:
> -nginx's error log is full of "accept() failed (53: Software caused
> connection abort)", sometimes printing three or four at the same time.
>
> -messages is full of:
> Limiting open port RST response from 441 to 200 packets/sec
> Limiting open port RST response from 488 to 200 packets/sec
> Limiting open port RST response from 399 to 200 packets/sec
> Limiting open port RST response from 434 to 200 packets/sec
> Limiting open port RST response from 308 to 200 packets/sec
> I'm not sure if that's related or not.
>
> -sysctl.conf:
>
> net.inet.tcp.tso=1
> kern.ipc.somaxconn=10240
> kern.ipc.nmbclusters=65536
> net.inet.tcp.sendspace=65536
> net.inet.tcp.recvspace=65536
> net.inet.tcp.rfc1323=1
> kern.ipc.maxsockbuf=262144
> net.inet.tcp.blackhole=2
> net.inet.udp.blackhole=1
> net.inet.tcp.msl=7500
> net.inet.icmp.icmplim=400
> net.inet.tcp.drop_synfin=1
> net.inet.tcp.icmp_may_rst=0
> net.inet.tcp.fast_finwait2_recycle=1
>
> -netstat -m:
> 4677/6603/11280 mbufs in use (current/cache/total)
> 1017/2643/3660/65536 mbuf clusters in use (current/cache/total/max)
> 1017/1961 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 9/514/523/12800 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 3239K/8992K/12232K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 9204 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> nginx is not running any accept filters.
>
> Locally, after sending an HTTP request, I get a normal connection
> close, then one RST with sequence 1, then another (possibly more than
> one) RST with sequence 2. I can post a tcpdump sequence if necessary,
> after I sanitize some cookies away.
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to
> "[hidden email]"

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

Ivan Voras
In reply to this post by Bugzilla from astrange@ithinksw.com
Alexander Strange wrote:

> We're running a rather high-load webserver using FreeBSD
> 7-RELEASE/amd64/nginx on an Intel em gigabit connection.
> Performance is good for our current bandwidth use (about 20Mbit and
> ~2000 connections/sec at the moment), but a large number of HTTP
> requests are being immediately dropped before getting to nginx. I see
> complaints about this with earlier versions of FreeBSD -
> http://forum.lighttpd.net/topic/171 - but no solutions. Does anyone know
> what could be the problem, or anything we could do about it?
>
> There are several other servers running earlier FreeBSDs on i386 which
> don't seem to have this problem, but I still haven't ruled out upstream
> hardware problems or Sandvine yet.
>
> On the server:
> -nginx's error log is full of "accept() failed (53: Software caused
> connection abort)", sometimes printing three or four at the same time.
>
> -messages is full of:
> Limiting open port RST response from 441 to 200 packets/sec
> Limiting open port RST response from 488 to 200 packets/sec
> Limiting open port RST response from 399 to 200 packets/sec
> Limiting open port RST response from 434 to 200 packets/sec
> Limiting open port RST response from 308 to 200 packets/sec
> I'm not sure if that's related or not.

It's almost certainly related - in addition to other suggested tuning by
Istvan, set net.inet.icmp.icmplim sysctl to something high - for example
2000 in your case.

Actually, in your sysctl.conf it's set to 400 - you do know you have to
run "/etc/rc.d/sysctl restart" to reaload sysctl.conf?

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

Bugzilla from astrange@ithinksw.com
In reply to this post by Bugzilla from astrange@ithinksw.com

On Jul 17, 2008, at 12:44 PM, Sean Chittenden wrote:

>> -messages is full of:
>> Limiting open port RST response from 441 to 200 packets/sec
>> Limiting open port RST response from 488 to 200 packets/sec
>> Limiting open port RST response from 399 to 200 packets/sec
>> Limiting open port RST response from 434 to 200 packets/sec
>> Limiting open port RST response from 308 to 200 packets/sec
>> I'm not sure if that's related or not.
>
> Likely not, but you want to set net.inet.icmp.icmplim=2000 or  
> something much higher.  ICMP is a good thing and an important part  
> of TCP.  For that much traffic, you need more ICMP packets.  
> net.inet.tcp.recvspace seems high, you probably only want it to be  
> 4096 or maybe double that.... unless your traffic is all HTTP  
> posts.  Why don't you want to run with accept filters?  Any  
> firewalls or rate filters in the way?  -sc

The httpready filter was just off for debugging (in case it solved our  
problem) - it didn't seem to affect it, so it's back on now.

There are a lot of large HTTP posts happening, and we don't seem to be  
low on memory, so recvspace should be ok. somaxconn is also much  
higher than necessary, though, so maybe that could be a problem.

Anyway, raising icmplim has emptied the system log, but there are  
still several errors per minute. I don't think any of the netstat -s  
counters are going up at the same rate, but I'll keep looking at those.

And there's no firewalls or packet shapers in front of it.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

Ivan Voras
Alexander Strange wrote:

> And there's no firewalls or packet shapers in front of it.

How about on it? Do you run ipfw?

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

Bugzilla from astrange@ithinksw.com

On Jul 21, 2008, at 3:53 PM, Ivan Voras wrote:

> Alexander Strange wrote:
>
>> And there's no firewalls or packet shapers in front of it.
>
> How about on it? Do you run ipfw?

No, I wouldn't answer a question so specifically like that.

We didn't see this problem after recompiling without SMP support and  
waiting for a day or two, but that immediately brought the load  
average up to around 50 and made it much slower, so that's clearly not  
a solution. It also really doesn't make me look forward to debugging  
it...

(Disabling net.isr.direct and some other things didn't seem to have  
any effect)

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Large number of http connections immediately dropped

Robert N. M. Watson-2
On Wed, 30 Jul 2008, Alexander Strange wrote:

> On Jul 21, 2008, at 3:53 PM, Ivan Voras wrote:
>
>> Alexander Strange wrote:
>>
>>> And there's no firewalls or packet shapers in front of it.
>>
>> How about on it? Do you run ipfw?
>
> No, I wouldn't answer a question so specifically like that.
>
> We didn't see this problem after recompiling without SMP support and waiting
> for a day or two, but that immediately brought the load average up to around
> 50 and made it much slower, so that's clearly not a solution. It also really
> doesn't make me look forward to debugging it...
>
> (Disabling net.isr.direct and some other things didn't seem to have any
> effect)

Turning off SMP is probably slowing the transaction rate down sufficiently
that you're not seeing the problem.  The reason to ask the firewall question
(ipfw, pf, etc) is that as the rate of TCP connections goes up, and if there
are a small number of addresses involved, the reuse rate for TCP/IP
port/address tuples becomes very high, which can cause connections to reuse
tuples too quickly.  Sometimes firewalls are more sensitive to this than the
stack -- especially if those firewalls are doing things like randomizing port
numbers, TCP sequence numbers, etc, so in the past there have been reports
(and bug fixes) along those lines.  I may have missed you answering this
already, but are there a large number of remote endpoints (unique IP
addresses) or a small one?  Such problems have come up in the past especially
when there is a load balancer or proxy in front, as that reduces what starts
out as a large number of hosts to a very small number (exactly one).

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"