performance issue within VNET jail

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

performance issue within VNET jail

Michael Grimm-4
Hi

[ I did recently migrate my servers from bare metal to cloud instances (OpenStack at OVH) ]
[ FreeBSD 11.1-STABLE #0 r327055                                                          ]


My setup is as follows and didn't change for the last couple of years:

        extIF0/pf/NAT <—> epairXa (bridge0) epairXb <-> jail

Downloading a file (by wget) at the host is around 30 MB/s, and an example tcpdump at extIF0 looks as follows:

19:32:10.711769 IP (tos 0x20, ttl 56, id 37539, offset 0, flags [DF], proto TCP (6), length 8680)
    remote.http > myhost.14367: Flags [.], cksum 0x64ed (incorrect -> 0x3223), seq 5753:14381, ack 146, win 235, options [nop,nop,TS val 1007145732 ecr 3995852], length 8628: HTTP
19:32:10.713851 IP (tos 0x20, ttl 56, id 37545, offset 0, flags [DF], proto TCP (6), length 1490)
    remote.http > myhost.14367: Flags [.], cksum 0x48d7 (incorrect -> 0x8d1e), seq 14381:15819, ack 146, win 235, options [nop,nop,TS val 1007145732 ecr 3995852], length 1438: HTTP
19:32:10.713899 IP (tos 0x20, ttl 56, id 37546, offset 0, flags [DF], proto TCP (6), length 1490)
    remote.http > myhost.14367: Flags [.], cksum 0x48d7 (incorrect -> 0x6ade), seq 15819:17257, ack 146, win 235, options [nop,nop,TS val 1007145732 ecr 3995852], length 1438: HTTP
19:32:10.713934 IP (tos 0x20, ttl 56, id 37547, offset 0, flags [DF], proto TCP (6), length 1490)
    remote.http > myhost.14367: Flags [.], cksum 0x48d7 (incorrect -> 0x1173), seq 17257:18695, ack 146, win 235, options [nop,nop,TS val 1007145732 ecr 3995852], length 1438: HTTP
19:32:10.713962 IP (tos 0x20, ttl 56, id 37548, offset 0, flags [DF], proto TCP (6), length 1490)
    remote.http > myhost.14367: Flags [.], cksum 0x48d7 (incorrect -> 0xcf7a), seq 18695:20133, ack 146, win 235, options [nop,nop,TS val 1007145732 ecr 3995852], length 1438: HTTP


When downloading the very same file within a VIMAGE jail the performance drops to around 80 KB/s, quite a dramatic loss. An example tcpdump at exitIF0 looks as follows:

19:34:36.284175 IP (tos 0x0, ttl 56, id 28618, offset 0, flags [DF], proto TCP (6), length 2948)
    remote.http > myhost.63382: Flags [.], cksum 0x5df6 (incorrect -> 0x4478), seq 1449:4345, ack 146, win 235, options [nop,nop,TS val 1007182125 ecr 4141429], length 2896: HTTP
19:34:36.481904 IP (tos 0x0, ttl 56, id 28620, offset 0, flags [DF], proto TCP (6), length 1500)
    remote.http > myhost.63382: Flags [.], cksum 0xd11d (correct), seq 1449:2897, ack 146, win 235, options [nop,nop,TS val 1007182175 ecr 4141429], length 1448: HTTP
19:34:36.484109 IP (tos 0x0, ttl 56, id 28621, offset 0, flags [DF], proto TCP (6), length 2948)
    remote.http > myhost.63382: Flags [.], cksum 0x5df6 (incorrect -> 0x2e5b), seq 15929:18825, ack 146, win 235, options [nop,nop,TS val 1007182175 ecr 4141629], length 2896: HTTP
19:34:36.682006 IP (tos 0x0, ttl 56, id 28623, offset 0, flags [DF], proto TCP (6), length 1500)
    remote.http > myhost.63382: Flags [.], cksum 0x4ab6 (correct), seq 2897:4345, ack 146, win 235, options [nop,nop,TS val 1007182225 ecr 4141629], length 1448: HTTP
19:34:36.684159 IP (tos 0x0, ttl 56, id 28624, offset 0, flags [DF], proto TCP (6), length 2948)
    remote.http > myhost.63382: Flags [.], cksum 0x5df6 (incorrect -> 0xd7db), seq 18825:21721, ack 146, win 235, options [nop,nop,TS val 1007182225 ecr 4141829], length 2896: HTTP

A tcpdump at epairXa looks comparable.

I did reduce all MTU settings at the involved interfaces from their initial settings (1490) to an experimental setting of 1400, just to be on the save side, to no avail. (FYI: I did have to reduce from 1500 to 1490 to please IPSec after migration from bare metal to cloud infrastructure.)

Then, I did test the following settings found in the Net, to no avail either:

        sysctl net.inet.tcp.tso=0
        sysctl net.link.bridge.pfil_onlyip=0
        sysctl net.link.bridge.pfil_bridge=0
        sysctl net.link.bridge.pfil_member=0
        sysctl net.add_addr_allfibs=0


I do have to admit that I am lost here, and that I cannot think about what is going wrong. The last download I did try at my old severs has been some weeks ago. Ever since I did upgrade FreeBSD 11.1-STABLE, and I did move my infrastructure from bare metal to cloud, thus I cannot test anymore if my old servers would have shown that performance issue in the meantime.

Thus any feedback is highly recommended!

Thanks in advance and regards,
Michael

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Michael Grimm-4
Kristof Provost <[hidden email]> wrote:
>
> On 21 Dec 2017, at 21:24, Michael Grimm wrote:

>> I do have to admit that I am lost here, and that I cannot think about what is going wrong. The last download I did try at my old severs has been some weeks ago. Ever since I did upgrade FreeBSD 11.1-STABLE, and I did move my infrastructure from bare metal to cloud, thus I cannot test anymore if my old servers would have shown that performance issue in the meantime.
>>
>> Thus any feedback is highly recommended!

> Can you try turning off TSO? (`ifconfig $ifname -tso`)
>
> There have been issues with pf and TSO checksums, which looked a lot like this (i.e. bad TCP performance). Those problems should be fixed, but this is easy to test.
>

I did try it, but without success.

This only worked for the external interface, though. Both epairX interfaces didn't accept that command:
        ifconfig: -tso: Invalid argument

I did mention that I previously tried "sysctl net.inet.tcp.tso=0". That shoukld do the same, right?

Thanks and regards,
Michael

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Michael Grimm-4
Kristof Provost <[hidden email]> wrote
> On 21 Dec 2017, at 21:50, Michael Grimm wrote:
>> Kristof Provost <[hidden email]> wrote:

>>> Can you try turning off TSO? (`ifconfig $ifname -tso`)
>>>
>>> There have been issues with pf and TSO checksums, which looked a lot like this (i.e. bad TCP performance). Those problems should be fixed, but this is easy to test.

>> I did try it, but without success.

> Hmm. I’ve got no ideas at the moment. I run a very similar setup (although on CURRENT), and see no performance issues from my jails.
> Can you test a performance test without pf? Perhaps from the local LAN for example? That should help narrow it down a bit, at least.

Well I prepared on of my webservers running at hostB/jailX to serve a sample file for local downloading tests:

1) hostA wget from hostB/jailX sample file: about  30 MB/s
2) hostA/jailY wget from hostB/jailX sample file: about  30 MB/s
3) hostB wget from hostB/jailX sample file: about 190 MB/s
4) hostB/jailY wget from hostB/jailX sample file: about 190 MB/s

Hmm. At least tests 3) and 4) omit the pf firewall. Tests 1) qnd 2) include passing two firewalls, one at each host. BUT: Both hosts are connected via an IPSec tunnel, and that's esp not tcp.

Can anyone draw conclusions from this test?
I cannot ;-)

Thanks and regards,
Michael

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Eugene Grosbein-10
22.12.2017 4:42, Michael Grimm wrote:

> Well I prepared on of my webservers running at hostB/jailX to serve a sample file for local downloading tests:
>
> 1) hostA wget from hostB/jailX sample file: about  30 MB/s
> 2) hostA/jailY wget from hostB/jailX sample file: about  30 MB/s
> 3) hostB wget from hostB/jailX sample file: about 190 MB/s
> 4) hostB/jailY wget from hostB/jailX sample file: about 190 MB/s
>
> Hmm. At least tests 3) and 4) omit the pf firewall. Tests 1) qnd 2) include passing two firewalls, one at each host. BUT: Both hosts are connected via an IPSec tunnel, and that's esp not tcp.
>
> Can anyone draw conclusions from this test?
> I cannot ;-)

Make sure and double check that your ESP packets do not get fragmented.


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Michael Grimm-4


> On 21. Dec 2017, at 22:48, Eugene Grosbein <[hidden email]> wrote:
>
> 22.12.2017 4:42, Michael Grimm wrote:
>
>> Well I prepared on of my webservers running at hostB/jailX to serve a sample file for local downloading tests:
>>
>> 1) hostA wget from hostB/jailX sample file: about  30 MB/s
>> 2) hostA/jailY wget from hostB/jailX sample file: about  30 MB/s
>> 3) hostB wget from hostB/jailX sample file: about 190 MB/s
>> 4) hostB/jailY wget from hostB/jailX sample file: about 190 MB/s
>>
>> Hmm. At least tests 3) and 4) omit the pf firewall. Tests 1) qnd 2) include passing two firewalls, one at each host. BUT: Both hosts are connected via an IPSec tunnel, and that's esp not tcp.
>>
>> Can anyone draw conclusions from this test?
>> I cannot ;-)
>
> Make sure and double check that your ESP packets do not get fragmented.


Hmm, I do not know how to achieve that. May the following tcpdump excerpts answer your question, or do you want me to look somewhere else?

At hostA while downloading from hostB/jailX and "tcpdump -i extIF esp -vv"

22:52:42.341023 IP (tos 0x0, ttl 64, id 40481, offset 0, flags [none], proto ESP (50), length 140)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5fe699), length 120
22:52:42.341079 IP (tos 0x0, ttl 53, id 64310, offset 1480, flags [none], proto ESP (50), length 100)
    hostB > hostA: ip-proto-50
22:52:42.341151 IP (tos 0x0, ttl 64, id 40483, offset 0, flags [none], proto ESP (50), length 140)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5fe69a), length 120
22:52:42.341169 IP (tos 0x0, ttl 53, id 64312, offset 1480, flags [none], proto ESP (50), length 100)
    hostB > hostA: ip-proto-50
22:52:42.341238 IP (tos 0x0, ttl 53, id 64314, offset 1480, flags [none], proto ESP (50), length 100)
    hostB > hostA: ip-proto-50

At hostB the same dump looks like:

22:52:42.463511 IP (tos 0x0, ttl 53, id 41153, offset 0, flags [none], proto ESP (50), length 124)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5feaa8), length 104
22:52:42.463518 IP (tos 0x0, ttl 53, id 41155, offset 0, flags [none], proto ESP (50), length 124)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5feaa9), length 104
22:52:42.463593 IP (tos 0x0, ttl 53, id 41157, offset 0, flags [none], proto ESP (50), length 124)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5feaaa), length 104
22:52:42.463601 IP (tos 0x0, ttl 53, id 41159, offset 0, flags [none], proto ESP (50), length 124)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5feaab), length 104
22:52:42.463673 IP (tos 0x0, ttl 53, id 41161, offset 0, flags [none], proto ESP (50), length 124)
    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5feaac), length 104


Thanks and regards,
Michael





>
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Eugene Grosbein-10
22.12.2017 4:59, Michael Grimm wrote:

>> Make sure and double check that your ESP packets do not get fragmented.
>
>
> Hmm, I do not know how to achieve that. May the following tcpdump excerpts answer your question, or do you want me to look somewhere else?
>
> At hostA while downloading from hostB/jailX and "tcpdump -i extIF esp -vv"
>
> 22:52:42.341023 IP (tos 0x0, ttl 64, id 40481, offset 0, flags [none], proto ESP (50), length 140)
>     hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5fe699), length 120
> 22:52:42.341079 IP (tos 0x0, ttl 53, id 64310, offset 1480, flags [none], proto ESP (50), length 100)
>     hostB > hostA: ip-proto-50

It shows non-zero offsets, so your ESP packets *are* fragmented.
I guess, this is the reason of your problems as fragmented ESP packets are known to cause problems
due to different reasons. Simpliest way to avoid such issues is to decrease MTU of IPSEC tunnel
and/or TCP MSS so that incapsulated ESP packets do not get fragmented.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Michael Grimm-4
Eugene Grosbein <[hidden email]> wrote:
> 22.12.2017 4:59, Michael Grimm wrote:

>>> Make sure and double check that your ESP packets do not get fragmented.
>>
>>
>> Hmm, I do not know how to achieve that. May the following tcpdump excerpts answer your question, or do you want me to look somewhere else?
>>
>> At hostA while downloading from hostB/jailX and "tcpdump -i extIF esp -vv"
>>
>> 22:52:42.341023 IP (tos 0x0, ttl 64, id 40481, offset 0, flags [none], proto ESP (50), length 140)
>>    hostA > hostB: ESP(spi=0x01d9ec34,seq=0x5fe699), length 120
>> 22:52:42.341079 IP (tos 0x0, ttl 53, id 64310, offset 1480, flags [none], proto ESP (50), length 100)
>>    hostB > hostA: ip-proto-50
>
> It shows non-zero offsets, so your ESP packets *are* fragmented.
> I guess, this is the reason of your problems as fragmented ESP packets are known to cause problems
> due to different reasons. Simpliest way to avoid such issues is to decrease MTU of IPSEC tunnel
> and/or TCP MSS so that incapsulated ESP packets do not get fragmented.

Well, you already helped me out with IPSEC very recently, and I already did decrease my MTU from 1500 to 1490. That increased my tunnel performance dramatically, already. Thanks, I will decrease MTU further.

BUT: In this thread I did report that I already had decreased MTU for testing purposes on all involved interfaces down to 1400 to no avail, and that my performance issue is regarding downloads within VNET jails using TCP, not ESP. The very same external interfaces do not show a performance drop if connected via ESP tunnel, but when trying to download files from the internet, and only when the download is started within a VNET jail. At the host downloads are only limited by the bandwidth provided by the hosting company.

BUT: It might well be that I did completely misunderstood your reply instead ;-)

Thanks and regards,
Michael

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Michael Grimm-4
In reply to this post by Michael Grimm-4
Kristof Provost <[hidden email]> wrote:

> I run a very similar setup (although on CURRENT), and see no performance issues from my jails.

In utter despair I did upgrade one server to CURRENT (#327076) today, but that hasn't been successful :-(

Ok, right now I do know:

(#) there is *no* performance loss (TCP) when:

        (-) fetching files from outside through PF/extIF to host
        (-) fetching files from partner server host via IPSEC tunnel bound to extIF (ESP) to host
        (-) fetching files from partner server host via IPSEC tunnel bound to extIF (ESP) to jail via bridge
        (-) fetching files from partner server jail via bridge and then via IPSEC tunnel bound to extIF (ESP) to host
        (-) fetching files from partner server jail via bridge and then via IPSEC tunnel bound to extIF (ESP) and then via bridge to jail

(#) there is a *dramatic* performance loss (TCP) when:

        (-) fetching files from outside through PF/extIF via bridge to jail

(#) I did try to tweak the following settings *without* success:

        (-) sysctl net.inet.tcp.tso=0
        (-) sysctl net.link.bridge.pfil_onlyip=0
        (-) sysctl net.link.bridge.pfil_bridge=0
        (-) sysctl net.link.bridge.pfil_member=0
        (-) reducing mtu to 1400 (1490 before) on all interfaces extIF, bridge, epairXs
        (-) deactivating "scrub in all" and "scrub out on $extIF all random-id" in /etc/pf.conf
        (-) setting "set require-order yes" and "set require-order no" in /etc/pf.conf [1]

[1] I do see more a lot of out-of-order packages within a jail "netstat -s -p tcp" after those slow downloads, but not after downloads via IPSEC tunnel from partner host.

That leads me to the conclusions:

        (#) the bridge is not to blame
        (#) it's either the PF/NATing or something else, right?

Thanks for your suggestions so far, but I am lost here. Any ideas?

Regards,
Michael

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: performance issue within VNET jail

Eugene Grosbein-10
23.12.2017 2:11, Michael Grimm wrote:

> Kristof Provost <[hidden email]> wrote:
>
>> I run a very similar setup (although on CURRENT), and see no performance issues from my jails.
>
> In utter despair I did upgrade one server to CURRENT (#327076) today, but that hasn't been successful :-(
>
> Ok, right now I do know:
>
> (#) there is *no* performance loss (TCP) when:
>
> (-) fetching files from outside through PF/extIF to host
> (-) fetching files from partner server host via IPSEC tunnel bound to extIF (ESP) to host
> (-) fetching files from partner server host via IPSEC tunnel bound to extIF (ESP) to jail via bridge
> (-) fetching files from partner server jail via bridge and then via IPSEC tunnel bound to extIF (ESP) to host
> (-) fetching files from partner server jail via bridge and then via IPSEC tunnel bound to extIF (ESP) and then via bridge to jail
>
> (#) there is a *dramatic* performance loss (TCP) when:
>
> (-) fetching files from outside through PF/extIF via bridge to jail
>
> (#) I did try to tweak the following settings *without* success:
>
> (-) sysctl net.inet.tcp.tso=0
> (-) sysctl net.link.bridge.pfil_onlyip=0
> (-) sysctl net.link.bridge.pfil_bridge=0
> (-) sysctl net.link.bridge.pfil_member=0
> (-) reducing mtu to 1400 (1490 before) on all interfaces extIF, bridge, epairXs
> (-) deactivating "scrub in all" and "scrub out on $extIF all random-id" in /etc/pf.conf
> (-) setting "set require-order yes" and "set require-order no" in /etc/pf.conf [1]
>
> [1] I do see more a lot of out-of-order packages within a jail "netstat -s -p tcp" after those slow downloads, but not after downloads via IPSEC tunnel from partner host.
>
> That leads me to the conclusions:
>
> (#) the bridge is not to blame
> (#) it's either the PF/NATing or something else, right?
>
> Thanks for your suggestions so far, but I am lost here. Any ideas?

It seems to me some kind of bug in the PF.
I personally never tried it, I use ipfw and it works just fine.

Maybe, you should try to switch to it too, at least for a test.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"