ixgbe: Network performance tuning (#TCP connections)

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

ixgbe: Network performance tuning (#TCP connections)

Meyer, Wolfgang
Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.

Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://calomel.org/freebsd_network_tuning.html, https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.

Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.

Regards,
Wolfgang

<loader.conf>:
cc_htcp_load="YES"
hw.ix.txd="64"
hw.ix.rxd="64"
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"
hw.ix.num_queues="8"
#hw.ix.enable_aim="0"
#hw.ix.max_interrupt_rate="31250"

#net.isr.maxthreads="16"

<sysctl.conf>:
kern.ipc.soacceptqueue=1024

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.tso=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.syncache.rexmtlimit=0

#net.inet.tcp.syncookies=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.icmp_may_rst=0
net.inet.tcp.msl=5000
net.inet.tcp.path_mtu_discovery=0
net.inet.tcp.blackhole=1
net.inet.udp.blackhole=1

net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.cc.htcp.adaptive_backoff=1
net.inet.tcp.cc.htcp.rtt_scaling=1

net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
net.inet.ip.rtexpire=1
net.inet.ip.rtminexpire=1




________________________________

Follow HOB:

- HOB: http://www.hob.de/redirect/hob.html
- Xing: http://www.hob.de/redirect/xing.html
- LinkedIn: http://www.hob.de/redirect/linkedin.html
- HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
- Facebook: http://www.hob.de/redirect/facebook.html
- Twitter: http://www.hob.de/redirect/twitter.html
- YouTube: http://www.hob.de/redirect/youtube.html
- E-Mail: http://www.hob.de/redirect/mail.html


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH
AG Fuerth, HRB 3416
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Allan Jude-9
On 2016-02-03 08:37, Meyer, Wolfgang wrote:

> Hello,
>
> we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.
>
> The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.
>
> Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.
>
> Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://calomel.org/freebsd_network_tuning.html, https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.
>
> Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.
>
> Regards,
> Wolfgang
>
I wonder if this might be NUMA related. Specifically, it might help to
make sure that the 8 CPU cores that the NIC queues are pinned to, are on
the same CPU that is backing the PCI-E slot that the NIC is in.


--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

K. Macy
Also - check for txq overruns and rxq drops in sysctl. 64 is very low
on FreeBSD. You may also look in to increasing the size of your pcb
hash table.



On Wed, Feb 3, 2016 at 9:50 AM, Allan Jude <[hidden email]> wrote:

> On 2016-02-03 08:37, Meyer, Wolfgang wrote:
>> Hello,
>>
>> we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.
>>
>> The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.
>>
>> Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.
>>
>> Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://calomel.org/freebsd_network_tuning.html, https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.
>>
>> Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.
>>
>> Regards,
>> Wolfgang
>>
>
> I wonder if this might be NUMA related. Specifically, it might help to
> make sure that the 8 CPU cores that the NIC queues are pinned to, are on
> the same CPU that is backing the PCI-E slot that the NIC is in.
>
>
> --
> Allan Jude
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Adrian Chadd-4
In reply to this post by Meyer, Wolfgang
hi,

can you share your testing program source?


-a


On 3 February 2016 at 05:37, Meyer, Wolfgang <[hidden email]> wrote:

> Hello,
>
> we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.
>
> The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.
>
> Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.
>
> Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://calomel.org/freebsd_network_tuning.html, https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.
>
> Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.
>
> Regards,
> Wolfgang
>
> <loader.conf>:
> cc_htcp_load="YES"
> hw.ix.txd="64"
> hw.ix.rxd="64"
> hw.ix.tx_process_limit="-1"
> hw.ix.rx_process_limit="-1"
> hw.ix.num_queues="8"
> #hw.ix.enable_aim="0"
> #hw.ix.max_interrupt_rate="31250"
>
> #net.isr.maxthreads="16"
>
> <sysctl.conf>:
> kern.ipc.soacceptqueue=1024
>
> kern.ipc.maxsockbuf=16777216
> net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.recvbuf_max=16777216
>
> net.inet.tcp.tso=0
> net.inet.tcp.mssdflt=1460
> net.inet.tcp.minmss=1300
>
> net.inet.tcp.nolocaltimewait=1
> net.inet.tcp.syncache.rexmtlimit=0
>
> #net.inet.tcp.syncookies=0
> net.inet.tcp.drop_synfin=1
> net.inet.tcp.fast_finwait2_recycle=1
>
> net.inet.tcp.icmp_may_rst=0
> net.inet.tcp.msl=5000
> net.inet.tcp.path_mtu_discovery=0
> net.inet.tcp.blackhole=1
> net.inet.udp.blackhole=1
>
> net.inet.tcp.cc.algorithm=htcp
> net.inet.tcp.cc.htcp.adaptive_backoff=1
> net.inet.tcp.cc.htcp.rtt_scaling=1
>
> net.inet.ip.forwarding=1
> net.inet.ip.fastforwarding=1
> net.inet.ip.rtexpire=1
> net.inet.ip.rtminexpire=1
>
>
>
>
> ________________________________
>
> Follow HOB:
>
> - HOB: http://www.hob.de/redirect/hob.html
> - Xing: http://www.hob.de/redirect/xing.html
> - LinkedIn: http://www.hob.de/redirect/linkedin.html
> - HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
> - Facebook: http://www.hob.de/redirect/facebook.html
> - Twitter: http://www.hob.de/redirect/twitter.html
> - YouTube: http://www.hob.de/redirect/youtube.html
> - E-Mail: http://www.hob.de/redirect/mail.html
>
>
> HOB GmbH & Co. KG
> Schwadermuehlstr. 3
> D-90556 Cadolzburg
>
> Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic
>
> AG Fuerth, HRA 5180
> Steuer-Nr. 218/163/00107
> USt-ID-Nr. DE 132747002
>
> Komplementaerin HOB electronic Beteiligungs GmbH
> AG Fuerth, HRB 3416
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Allan Jude-9
On 2016-02-03 16:34, Adrian Chadd wrote:
> hi,
>
> can you share your testing program source?
>
>
> -a
>

I have a Dual E5-2630 v3 (2x8x 2.40ghz (+HT)) with the Intel X540-AT2
that I can try to replicate this one to help debug it.

--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Remy Nonnenmacher
In reply to this post by Meyer, Wolfgang


On 02/03/16 14:37, Meyer, Wolfgang wrote:

> Hello,
>
> we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.
>
> The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.
>
> Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.
>
> Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://calomel.org/freebsd_network_tuning.html, https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html, https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.
>
> Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.
>
> Regards,
> Wolfgang
>
[SNIP]

Hi Wolfgang,

hwpmc is your friend here if you need to investigate where are your processors wasting their time.

Either you will find them contending for network stack (probably the pcb hash table), either they are fighting each other in the scheduler's lock(s) trying to steal jobs from working ones.

Also check QPI links activity that may reveal interesting facts about PCI root-complexes geography vs processes locations and migration.

You have two options here: Either you persist in using a 4x10 core machine and you will have a long time rearranging stickyness of processes and interrupt to specific cores/packages (Driver, then isr rings, then userland) and police the whole thing (read peacekeeping the riot), either you go to the much simpler solution that is 1 (yes, one) socket machine, fastest available proc with low core (E5-1630v2/3 or 1650) that can handle 10G links hands down out-of-the-box.

Also note that there are specific and interesting optimization in the L2 generation on -head that you may want to try if the problem is stack-centered.

You may also have a threading problem (userland ones). In the domain of counting instructions per packets (you can practice that with netmap as a wonderfull mean of really 'sensing' what 40Gbps is), threading is bad (and Hyperthreading is evil).

Thanks.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

RE: ixgbe: Network performance tuning (#TCP connections)

Hongjiang Zhang
In reply to this post by Meyer, Wolfgang
Did you enable LRO on FreeBSD side (check 'ifconfig' output)? Linux default enables GRO (see the output of 'ethtool -k eth0').

Thanks
Hongjiang Zhang

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Meyer, Wolfgang
Sent: Wednesday, February 3, 2016 9:37 PM
To: '[hidden email]' <[hidden email]>
Cc: '[hidden email]' <[hidden email]>
Subject: ixgbe: Network performance tuning (#TCP connections)

Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.

Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcalomel.org%2ffreebsd_network_tuning.html%2c&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xsMoC%2b1ZcnoHBnPqhLUMDIr8VLBcLejnrXgkRyDWzYc%3d https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.freebsd.org%2fdoc%2fhandbook%2fconfigtuning-kernel-limits.html%2c&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=XNqvrYfTNzfe2btrip%2f5FoX3iTTpTSbNrDjbhtVBevo%3d https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fpleiades.ucsc.edu%2fhyades%2fFreeBSD_Network_Tuning&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=%2bQ66X%2frnqNakX%2fSGcK08QTTrsDjUUWBHOXu6%2fOBIBN
 Q%3d) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.

Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.

Regards,
Wolfgang

<loader.conf>:
cc_htcp_load="YES"
hw.ix.txd="64"
hw.ix.rxd="64"
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"
hw.ix.num_queues="8"
#hw.ix.enable_aim="0"
#hw.ix.max_interrupt_rate="31250"

#net.isr.maxthreads="16"

<sysctl.conf>:
kern.ipc.soacceptqueue=1024

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.tso=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.syncache.rexmtlimit=0

#net.inet.tcp.syncookies=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.icmp_may_rst=0
net.inet.tcp.msl=5000
net.inet.tcp.path_mtu_discovery=0
net.inet.tcp.blackhole=1
net.inet.udp.blackhole=1

net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.cc.htcp.adaptive_backoff=1
net.inet.tcp.cc.htcp.rtt_scaling=1

net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
net.inet.ip.rtexpire=1
net.inet.ip.rtminexpire=1




________________________________

Follow HOB:

- HOB: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fhob.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=%2fenM%2fs72BvfP9H6CnrFeXqhrZoetovqoIMB%2bk0RFfQM%3d
- Xing: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fxing.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Si0aQmWV%2ba%2fWx5%2f6tZtOWQDj5cmq9t57DhA2h0qsa7Y%3d
- LinkedIn: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2flinkedin.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Bc0iX1u2twaazr%2fzi2wuTRqTZOk2rwtu61lNqJrui14%3d
- HOBLink Mobile: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fhoblinkmobile.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=I1ViplWsQ2HMWM%2fLnhWRKseziJBoNlyVBj2wiWFx1wM%3d
- Facebook: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2ffacebook.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=1WliqTPX5YN3AdvK6DzAB7yDkQmjyC3jh%2f47PZ2uU7Y%3d
- Twitter: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2ftwitter.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=gaLO9OTaBis4F3IGoFY55nwMIOGPaZ0ri%2fK7N7nx7kI%3d
- YouTube: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fyoutube.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=T%2fhNogZgLHoTCOj%2fKjsSJhPDlBqyxCAD8tJj5fueiqw%3d
- E-Mail: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fmail.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=QnWAw2culSVFforLRWYgXfyHaj4PyB4Hn1rj4jVQvjU%3d


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH AG Fuerth, HRB 3416 _______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flists.freebsd.org%2fmailman%2flistinfo%2ffreebsd-net&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GU28hzKU5SSyA3hbP1PNtUNh7G5ut%2fMD6mdRmuJNkEs%3d
To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

RE: ixgbe: Network performance tuning (#TCP connections)

Hongjiang Zhang
In reply to this post by Meyer, Wolfgang
Please check whether LRO is enabled on your FreeBSD server with "ifconfig". Linux default enables GRO (see the output of 'ethtool -k eth0'), which covers LRO optimization.

Thanks
Hongjiang Zhang

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Meyer, Wolfgang
Sent: Wednesday, February 3, 2016 9:37 PM
To: '[hidden email]' <[hidden email]>
Cc: '[hidden email]' <[hidden email]>
Subject: ixgbe: Network performance tuning (#TCP connections)

Hello,

we are evaluating network performance on a DELL-Server (PowerEdge R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz) with 10 GbE-Cards. We use programs that on server side accepts connections on a IP-address+port from the client side and after establishing the connection data is sent in turns between server and client in a predefined pattern (server side sends more data than client side) with sleeps in between the send phases. The test set-up is chosen in such way that every client process initiates 500 connections handled in threads and on the server side each process representing an IP/Port pair also handles 500 connections in threads.

The number of connections is then increased and the overall network througput is observed using nload. On FreeBSD (on server side) roughly at 50,000 connections errors begin to occur and the overall throughput won't increase further with more connections. With Linux on the server side it is possible to establish more than 120,000 connections and at 50,000 connections the overall throughput ist double that of FreeBSD with the same sending pattern. Furthermore system load on FreeBSD is much higher with 50 % system usage on each core and 80 % interrupt usage on the 8 cores handling the interrupt queues for the NIC. In comparison Linux has <10 % system usage, <10 % user usage and about 15 % interrupt usage on the 16 cores handling the network interrupts for 50,000 connections.

Varying the numbers for the NIC interrupt queues won't change the performance (rather worsens the situation). Disabling Hyperthreading (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise all 80 cores won't improve compared to 64 cores, atkbd and uart had to be disabled to avoid kernel panics with increased MAXCPU (thanks to Andre Oppermann for investigating this). Initiallly the tests were made on 10.2 Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0) but that didn't change the numbers.

Some sysctl configurables were modified along the network performance guidelines found on the net (e.g. https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fcalomel.org%2ffreebsd_network_tuning.html%2c&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xsMoC%2b1ZcnoHBnPqhLUMDIr8VLBcLejnrXgkRyDWzYc%3d https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.freebsd.org%2fdoc%2fhandbook%2fconfigtuning-kernel-limits.html%2c&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=XNqvrYfTNzfe2btrip%2f5FoX3iTTpTSbNrDjbhtVBevo%3d https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fpleiades.ucsc.edu%2fhyades%2fFreeBSD_Network_Tuning&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=%2bQ66X%2frnqNakX%2fSGcK08QTTrsDjUUWBHOXu6%2fOBIBN
 Q%3d) but most of them didn't have any measuarable impact. Final sysctl.conf and loader.conf settings see below. Actually the only tunables that provided any improvement were identified to be hw.ix.txd, and hw.ix.rxd that were reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and hw.ix.rx_process_limit that were set to -1.

Any ideas what tunables might be changed to get a higher number of TCP connections (it's not a question of the overall throughput as changing the sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I determine where the kernel is spending its time that causes the high CPU load? Any pointers are highly appreciated, I can't believe that there is such a blatant difference in network performance compared to Linux.

Regards,
Wolfgang

<loader.conf>:
cc_htcp_load="YES"
hw.ix.txd="64"
hw.ix.rxd="64"
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"
hw.ix.num_queues="8"
#hw.ix.enable_aim="0"
#hw.ix.max_interrupt_rate="31250"

#net.isr.maxthreads="16"

<sysctl.conf>:
kern.ipc.soacceptqueue=1024

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.tso=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.syncache.rexmtlimit=0

#net.inet.tcp.syncookies=0
net.inet.tcp.drop_synfin=1
net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.icmp_may_rst=0
net.inet.tcp.msl=5000
net.inet.tcp.path_mtu_discovery=0
net.inet.tcp.blackhole=1
net.inet.udp.blackhole=1

net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.cc.htcp.adaptive_backoff=1
net.inet.tcp.cc.htcp.rtt_scaling=1

net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
net.inet.ip.rtexpire=1
net.inet.ip.rtminexpire=1




________________________________

Follow HOB:

- HOB: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fhob.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=%2fenM%2fs72BvfP9H6CnrFeXqhrZoetovqoIMB%2bk0RFfQM%3d
- Xing: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fxing.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Si0aQmWV%2ba%2fWx5%2f6tZtOWQDj5cmq9t57DhA2h0qsa7Y%3d
- LinkedIn: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2flinkedin.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=Bc0iX1u2twaazr%2fzi2wuTRqTZOk2rwtu61lNqJrui14%3d
- HOBLink Mobile: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fhoblinkmobile.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=I1ViplWsQ2HMWM%2fLnhWRKseziJBoNlyVBj2wiWFx1wM%3d
- Facebook: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2ffacebook.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=1WliqTPX5YN3AdvK6DzAB7yDkQmjyC3jh%2f47PZ2uU7Y%3d
- Twitter: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2ftwitter.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=gaLO9OTaBis4F3IGoFY55nwMIOGPaZ0ri%2fK7N7nx7kI%3d
- YouTube: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fyoutube.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=T%2fhNogZgLHoTCOj%2fKjsSJhPDlBqyxCAD8tJj5fueiqw%3d
- E-Mail: https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.hob.de%2fredirect%2fmail.html&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=QnWAw2culSVFforLRWYgXfyHaj4PyB4Hn1rj4jVQvjU%3d


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH AG Fuerth, HRB 3416 _______________________________________________
[hidden email] mailing list
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2flists.freebsd.org%2fmailman%2flistinfo%2ffreebsd-net&data=01%7c01%7chonzhan%40064d.mgd.microsoft.com%7cf827a05328ca4ca9781608d32c9f5b12%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=GU28hzKU5SSyA3hbP1PNtUNh7G5ut%2fMD6mdRmuJNkEs%3d
To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

RE: ixgbe: Network performance tuning (#TCP connections)

Meyer, Wolfgang
In reply to this post by K. Macy


> -----Original Message-----
> From: [hidden email] [mailto:owner-freebsd-
> [hidden email]] On Behalf Of K. Macy
> Sent: Mittwoch, 3. Februar 2016 20:31
> To: Allan Jude
> Cc: [hidden email]
> Subject: Re: ixgbe: Network performance tuning (#TCP connections)
>
> Also - check for txq overruns and rxq drops in sysctl. 64 is very low on
> FreeBSD. You may also look in to increasing the size of your pcb hash table.
>
>
>
> On Wed, Feb 3, 2016 at 9:50 AM, Allan Jude <[hidden email]> wrote:
> > On 2016-02-03 08:37, Meyer, Wolfgang wrote:
> >> Hello,
> >>
> >> we are evaluating network performance on a DELL-Server (PowerEdge
> R930 with 4 Sockets, hw.model: Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz)
> with 10 GbE-Cards. We use programs that on server side accepts connections
> on a IP-address+port from the client side and after establishing the
> connection data is sent in turns between server and client in a predefined
> pattern (server side sends more data than client side) with sleeps in
> between the send phases. The test set-up is chosen in such way that every
> client process initiates 500 connections handled in threads and on the server
> side each process representing an IP/Port pair also handles 500 connections
> in threads.
> >>
> >> The number of connections is then increased and the overall network
> througput is observed using nload. On FreeBSD (on server side) roughly at
> 50,000 connections errors begin to occur and the overall throughput won't
> increase further with more connections. With Linux on the server side it is
> possible to establish more than 120,000 connections and at 50,000
> connections the overall throughput ist double that of FreeBSD with the same
> sending pattern. Furthermore system load on FreeBSD is much higher with
> 50 % system usage on each core and 80 % interrupt usage on the 8 cores
> handling the interrupt queues for the NIC. In comparison Linux has <10 %
> system usage, <10 % user usage and about 15 % interrupt usage on the 16
> cores handling the network interrupts for 50,000 connections.
> >>
> >> Varying the numbers for the NIC interrupt queues won't change the
> performance (rather worsens the situation). Disabling Hyperthreading
> (utilising 40 cores) degrades the performance. Increasing MAXCPU to utilise
> all 80 cores won't improve compared to 64 cores, atkbd and uart had to be
> disabled to avoid kernel panics with increased MAXCPU (thanks to Andre
> Oppermann for investigating this). Initiallly the tests were made on 10.2
> Release, later I switched to 10 Stable (later with ixgbe driver version 3.1.0)
> but that didn't change the numbers.
> >>
> >> Some sysctl configurables were modified along the network performance
> guidelines found on the net (e.g.
> https://calomel.org/freebsd_network_tuning.html,
> https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html,
> https://pleiades.ucsc.edu/hyades/FreeBSD_Network_Tuning) but most of
> them didn't have any measuarable impact. Final sysctl.conf and loader.conf
> settings see below. Actually the only tunables that provided any
> improvement were identified to be hw.ix.txd, and hw.ix.rxd that were
> reduced (!) to the minimum value of 64 and hw.ix.tx_process_limit and
> hw.ix.rx_process_limit that were set to -1.
> >>
> >> Any ideas what tunables might be changed to get a higher number of TCP
> connections (it's not a question of the overall throughput as changing the
> sending pattern allows me to fully utilise the 10Gb bandwidth)? How can I
> determine where the kernel is spending its time that causes the high CPU
> load? Any pointers are highly appreciated, I can't believe that there is such a
> blatant difference in network performance compared to Linux.
> >>
> >> Regards,
> >> Wolfgang
> >>
> >
> > I wonder if this might be NUMA related. Specifically, it might help to
> > make sure that the 8 CPU cores that the NIC queues are pinned to, are
> > on the same CPU that is backing the PCI-E slot that the NIC is in.
> >
> >
> > --
> > Allan Jude
> >


As I was telling in my original message, the rxd and txd values were more or less the only ones that changed my numbers to the better when reducing them. Not that I understood that behaviour but a double-check now revealed that I stand corrected on this observation. Raising the value (to 1024) now did not only degrade througput to my original bad numbers but to the opposite slightly improved it (but only barely measurable compared to measurement variation). Don't know what cross interaction was leading to my original observation.

Concerning pcb hash table size I only found net.inet.sctp.pcbhashsize and that had no influence. Not sure whether sctp plays a role at all in my problem.

Regards,
Wolfgang Meyer


________________________________

Follow HOB:

- HOB: http://www.hob.de/redirect/hob.html
- Xing: http://www.hob.de/redirect/xing.html
- LinkedIn: http://www.hob.de/redirect/linkedin.html
- HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
- Facebook: http://www.hob.de/redirect/facebook.html
- Twitter: http://www.hob.de/redirect/twitter.html
- YouTube: http://www.hob.de/redirect/youtube.html
- E-Mail: http://www.hob.de/redirect/mail.html


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH
AG Fuerth, HRB 3416
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

RE: ixgbe: Network performance tuning (#TCP connections)

Meyer, Wolfgang
In reply to this post by Allan Jude-9


> -----Original Message-----
> From: [hidden email] [mailto:owner-freebsd-
> [hidden email]] On Behalf Of Allan Jude
> Sent: Mittwoch, 3. Februar 2016 22:50
> To: [hidden email]
> Subject: Re: ixgbe: Network performance tuning (#TCP connections)
>
> On 2016-02-03 16:34, Adrian Chadd wrote:
> > hi,
> >
> > can you share your testing program source?
> >
> >
> > -a
> >
>
> I have a Dual E5-2630 v3 (2x8x 2.40ghz (+HT)) with the Intel X540-AT2 that I
> can try to replicate this one to help debug it.
>
> --
> Allan Jude

I'll try to do some polishing and removal of cruft next week than I hope I will feel comfortable putting it to the public :-)

Not that they are overly sophisticated programs, just some test set-up created in the past and over time one gets used to use it as a sort of private "benchmark".

Regards,
Wolfgang Meyer


________________________________

Follow HOB:

- HOB: http://www.hob.de/redirect/hob.html
- Xing: http://www.hob.de/redirect/xing.html
- LinkedIn: http://www.hob.de/redirect/linkedin.html
- HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
- Facebook: http://www.hob.de/redirect/facebook.html
- Twitter: http://www.hob.de/redirect/twitter.html
- YouTube: http://www.hob.de/redirect/youtube.html
- E-Mail: http://www.hob.de/redirect/mail.html


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH
AG Fuerth, HRB 3416
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Allan Jude-9
In reply to this post by Meyer, Wolfgang
On 2016-02-05 13:05, Meyer, Wolfgang wrote:
>

>
> As I was telling in my original message, the rxd and txd values were more or less the only ones that changed my numbers to the better when reducing them. Not that I understood that behaviour but a double-check now revealed that I stand corrected on this observation. Raising the value (to 1024) now did not only degrade througput to my original bad numbers but to the opposite slightly improved it (but only barely measurable compared to measurement variation). Don't know what cross interaction was leading to my original observation.
>
> Concerning pcb hash table size I only found net.inet.sctp.pcbhashsize and that had no influence. Not sure whether sctp plays a role at all in my problem.
>
> Regards,
> Wolfgang Meyer
>
>

I think the one you are looking for is: net.inet.tcp.tcbhashsize

See if doubling that makes a difference.

--
Allan Jude
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

RE: ixgbe: Network performance tuning (#TCP connections)

Meyer, Wolfgang


> -----Original Message-----
> From: [hidden email] [mailto:owner-freebsd-
> [hidden email]] On Behalf Of Allan Jude
> Sent: Freitag, 5. Februar 2016 19:19
> To: [hidden email]
> Subject: Re: ixgbe: Network performance tuning (#TCP connections)
>
> On 2016-02-05 13:05, Meyer, Wolfgang wrote:
> >
>
> >
> > As I was telling in my original message, the rxd and txd values were more or
> less the only ones that changed my numbers to the better when reducing
> them. Not that I understood that behaviour but a double-check now
> revealed that I stand corrected on this observation. Raising the value (to
> 1024) now did not only degrade througput to my original bad numbers but to
> the opposite slightly improved it (but only barely measurable compared to
> measurement variation). Don't know what cross interaction was leading to
> my original observation.
> >
> > Concerning pcb hash table size I only found net.inet.sctp.pcbhashsize and
> that had no influence. Not sure whether sctp plays a role at all in my
> problem.
> >
> > Regards,
> > Wolfgang Meyer
> >
> >
>
> I think the one you are looking for is: net.inet.tcp.tcbhashsize
>
> See if doubling that makes a difference.
>
> --
> Allan Jude

Doesnt't seem to make a difference. Anyway as now system load of the cores handling the interrupt queues seem to max out, I'll probably have to look giving some relief to this parameter. On Linux there was also some manual setting of processor affinities for the interrupt queues necessary.

Regards,
Wolfgang

________________________________

Follow HOB:

- HOB: http://www.hob.de/redirect/hob.html
- Xing: http://www.hob.de/redirect/xing.html
- LinkedIn: http://www.hob.de/redirect/linkedin.html
- HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
- Facebook: http://www.hob.de/redirect/facebook.html
- Twitter: http://www.hob.de/redirect/twitter.html
- YouTube: http://www.hob.de/redirect/youtube.html
- E-Mail: http://www.hob.de/redirect/mail.html


HOB GmbH & Co. KG
Schwadermuehlstr. 3
D-90556 Cadolzburg

Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic

AG Fuerth, HRA 5180
Steuer-Nr. 218/163/00107
USt-ID-Nr. DE 132747002

Komplementaerin HOB electronic Beteiligungs GmbH
AG Fuerth, HRB 3416
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Adrian Chadd-4
In reply to this post by Meyer, Wolfgang
On 5 February 2016 at 10:15, Meyer, Wolfgang <[hidden email]> wrote:

>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:owner-freebsd-
>> [hidden email]] On Behalf Of Allan Jude
>> Sent: Mittwoch, 3. Februar 2016 22:50
>> To: [hidden email]
>> Subject: Re: ixgbe: Network performance tuning (#TCP connections)
>>
>> On 2016-02-03 16:34, Adrian Chadd wrote:
>> > hi,
>> >
>> > can you share your testing program source?
>> >
>> >
>> > -a
>> >
>>
>> I have a Dual E5-2630 v3 (2x8x 2.40ghz (+HT)) with the Intel X540-AT2 that I
>> can try to replicate this one to help debug it.
>>
>> --
>> Allan Jude
>
> I'll try to do some polishing and removal of cruft next week than I hope I will feel comfortable putting it to the public :-)
>
> Not that they are overly sophisticated programs, just some test set-up created in the past and over time one gets used to use it as a sort of private "benchmark".

Please do - it'd be good to see what you're doing and figure out
what's causing the poor behaviour.

Also having more public benchmarks for testing and reproducibility is
always good. :)



-adrian

>
> Regards,
> Wolfgang Meyer
>
>
> ________________________________
>
> Follow HOB:
>
> - HOB: http://www.hob.de/redirect/hob.html
> - Xing: http://www.hob.de/redirect/xing.html
> - LinkedIn: http://www.hob.de/redirect/linkedin.html
> - HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
> - Facebook: http://www.hob.de/redirect/facebook.html
> - Twitter: http://www.hob.de/redirect/twitter.html
> - YouTube: http://www.hob.de/redirect/youtube.html
> - E-Mail: http://www.hob.de/redirect/mail.html
>
>
> HOB GmbH & Co. KG
> Schwadermuehlstr. 3
> D-90556 Cadolzburg
>
> Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic
>
> AG Fuerth, HRA 5180
> Steuer-Nr. 218/163/00107
> USt-ID-Nr. DE 132747002
>
> Komplementaerin HOB electronic Beteiligungs GmbH
> AG Fuerth, HRB 3416
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ixgbe: Network performance tuning (#TCP connections)

Alexey Ivanov
In Linux/Intel I would use following methodology for performance analysis:

Hardware:
* turbostat
  Look for C/P states for cores, frequencies, number of SMIs. [1]
* cpufreq-info
  Look for current driver, frequencies, and governor.
* atop
  Look for interrupt distribution across cores
  Look for context switches, interrupts.
* ethtool
  -S for stats, look for errors, drops, overruns, missed interrupts, etc
  -k for offloads, enable GRO/GSO, rss/rps/rfs/xps[0]
  -g for ring sizes, increase
  -c for interrupt coalescing

Kernel:
* /proc/net/softirq [2] and /proc/interrupts [3]
  Again, distribution, missed, delayed interrupts
  (optional) NUMA-affinity
* perf top
  Look where kernel spends its time.
* iptables
  Look if there are rules (if any) that may affect performance.
* netstat -s, netstat -m
  Look for error counters and buffer counts
* sysctl / grub
  So much to tweak here. Try increasing hashtable sizes, playing with memory buffers and other limits.

BSD has alternatives to most of these, e.g. perf -> hwpmc, ethtool -> ifconfig, softirq -> netisr, menu.lst -> loader.conf


In your case I suppose turning on interrupt coalescing along with enabling hardware offloads may help.


[0] Comparing mutiqueue support Linux vs FreeBSD
        https://wiki.freebsd.org/201305DevSummit/NetworkReceivePerformance/ComparingMutiqueueSupportLinuxvsFreeBSD
[1] You can pin your processor to a specific C-state:
        https://gist.github.com/SaveTheRbtz/f5e8d1ca7b55b6a7897b
[2] You can analyze that data with:
        https://gist.github.com/SaveTheRbtz/172b2e2eb3cbd96b598d
[3] You can set affinity with:
        https://gist.github.com/SaveTheRbtz/8875474

PS. Sorry for so many Linuxisms on freebsd-performance@


> On Feb 5, 2016, at 2:57 PM, Adrian Chadd <[hidden email]> wrote:
>
> On 5 February 2016 at 10:15, Meyer, Wolfgang <[hidden email]> wrote:
>>
>>
>>> -----Original Message-----
>>> From: [hidden email] [mailto:owner-freebsd-
>>> [hidden email]] On Behalf Of Allan Jude
>>> Sent: Mittwoch, 3. Februar 2016 22:50
>>> To: [hidden email]
>>> Subject: Re: ixgbe: Network performance tuning (#TCP connections)
>>>
>>> On 2016-02-03 16:34, Adrian Chadd wrote:
>>>> hi,
>>>>
>>>> can you share your testing program source?
>>>>
>>>>
>>>> -a
>>>>
>>>
>>> I have a Dual E5-2630 v3 (2x8x 2.40ghz (+HT)) with the Intel X540-AT2 that I
>>> can try to replicate this one to help debug it.
>>>
>>> --
>>> Allan Jude
>>
>> I'll try to do some polishing and removal of cruft next week than I hope I will feel comfortable putting it to the public :-)
>>
>> Not that they are overly sophisticated programs, just some test set-up created in the past and over time one gets used to use it as a sort of private "benchmark".
>
> Please do - it'd be good to see what you're doing and figure out
> what's causing the poor behaviour.
>
> Also having more public benchmarks for testing and reproducibility is
> always good. :)
>
>
>
> -adrian
>
>>
>> Regards,
>> Wolfgang Meyer
>>
>>
>> ________________________________
>>
>> Follow HOB:
>>
>> - HOB: http://www.hob.de/redirect/hob.html
>> - Xing: http://www.hob.de/redirect/xing.html
>> - LinkedIn: http://www.hob.de/redirect/linkedin.html
>> - HOBLink Mobile: http://www.hob.de/redirect/hoblinkmobile.html
>> - Facebook: http://www.hob.de/redirect/facebook.html
>> - Twitter: http://www.hob.de/redirect/twitter.html
>> - YouTube: http://www.hob.de/redirect/youtube.html
>> - E-Mail: http://www.hob.de/redirect/mail.html
>>
>>
>> HOB GmbH & Co. KG
>> Schwadermuehlstr. 3
>> D-90556 Cadolzburg
>>
>> Geschaeftsfuehrung: Klaus Brandstaetter, Zoran Adamovic
>>
>> AG Fuerth, HRA 5180
>> Steuer-Nr. 218/163/00107
>> USt-ID-Nr. DE 132747002
>>
>> Komplementaerin HOB electronic Beteiligungs GmbH
>> AG Fuerth, HRB 3416
>> _______________________________________________
>> [hidden email] mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-performance
>> To unsubscribe, send any mail to "[hidden email]"
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "[hidden email]"


signature.asc (859 bytes) Download Attachment