iflib/bridge kernel panic

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

iflib/bridge kernel panic

Shawn Webb-3
From latest HEAD on a Dell Precision 7550 laptop:

https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2

The last working boot environment was 14 Aug 2020. If I get some time to
bisect commits, I'll try to figure out the culprit.

Thanks,

Shawn Webb

(Sorry for the brevity. Only partially working system due to above
breakage.)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Conrad Meyer-2
Hi Shawn,

Is it possible to reproduce the issue on FreeBSD?  The excerpt you've
linked to is not on FreeBSD.

Conrad

On Sun, Sep 20, 2020 at 5:53 PM Shawn Webb <[hidden email]> wrote:

>
> From latest HEAD on a Dell Precision 7550 laptop:
>
> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>
> The last working boot environment was 14 Aug 2020. If I get some time to
> bisect commits, I'll try to figure out the culprit.
>
> Thanks,
>
> Shawn Webb
>
> (Sorry for the brevity. Only partially working system due to above
> breakage.)
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost
In reply to this post by Shawn Webb-3
On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>> From latest HEAD on a Dell Precision 7550 laptop:
>
> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>
> The last working boot environment was 14 Aug 2020. If I get some time to
> bisect commits, I'll try to figure out the culprit.
>
Try https://reviews.freebsd.org/D26418

Best regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Shawn Webb-3
On Mon, Sep 21, 2020 at 09:57:40AM +0200, Kristof Provost wrote:
> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
> >> From latest HEAD on a Dell Precision 7550 laptop:
> >
> > https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
> >
> > The last working boot environment was 14 Aug 2020. If I get some time to
> > bisect commits, I'll try to figure out the culprit.
> >
> Try https://reviews.freebsd.org/D26418

That seems to fix the kernel panic. dmesg gets spammed with a freak
ton of these LOR messages now:

==== BEGIN LOG 01 ====
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] Sleeping on "e1000_delay" with the following non-sleepable locks held:
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] exclusive sleep mutex if_bridge (if_bridge) r = 0 (0xfffff8001ea07218) locked @ /usr/src/sys/net/if_bridge.c:827
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] stack backtrace:
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #0 0xffffffff80c6c4a1 at witness_debugger+0x71
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #1 0xffffffff80c6d5bd at witness_warn+0x40d
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #2 0xffffffff80c09b8b at _sleep+0x5b
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #3 0xffffffff80c0a38e at pause_sbt+0xfe
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #4 0xffffffff80652b2d at e1000_write_phy_reg_mdic+0xed
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #5 0xffffffff80656bde at __e1000_write_phy_reg_hv+0x1ce
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #6 0xffffffff80640ea9 at e1000_lv_jumbo_workaround_ich8lan+0x799
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #7 0xffffffff8062329e at em_if_init+0x151e
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #8 0xffffffff80d347a9 at iflib_init_locked+0x2d9
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #9 0xffffffff80d36b08 at iflib_if_ioctl+0x1b8
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #10 0xffffffff83c582ac at bridge_set_ifcap+0x8c
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #11 0xffffffff83c544c8 at bridge_ioctl_add+0x4c8
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #12 0xffffffff83c560ff at bridge_ioctl+0x2df
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #13 0xffffffff80d9f1a1 at in_control+0x341
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #14 0xffffffff80d16266 at ifioctl+0x766
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #15 0xffffffff80c715a0 at kern_ioctl+0x290
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #16 0xffffffff80c71267 at sys_ioctl+0x127
Sep 21 08:08:28 hbsd-laptop-02 kernel: [25] #17 0xffffffff8122bf4c at amd64_syscall+0x14c
==== END LOG 01 ====

==== BEGIN LOG 02 ====
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] lock order reversal: (sleepable after non-sleepable)
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29]  1st 0xfffff800374616a0 ure0 (ure0, sleep mutex) @ /usr/src/sys/dev/usb/usb_request.c:714
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29]  2nd 0xffffffff81eb1ab8 sysctl lock (sysctl lock, sleepable rm) @ /usr/src/sys/kern/kern_sysctl.c:837
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] lock order ure0 -> sysctl lock attempted at:
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #0 0xffffffff80c6c1dc at witness_checkorder+0xdcc
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #1 0xffffffff80bf76bb at _rm_wlock_debug+0x6b
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #2 0xffffffff80c0c7a6 at sysctl_add_oid+0x46
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #3 0xffffffff83c64ea1 at ure_attach_post+0x1a91
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #4 0xffffffff83c6a1af at ue_attach_post_task+0x2f
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #5 0xffffffff80a2b749 at usb_process+0xf9
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #6 0xffffffff80bb9fe5 at fork_exit+0x85
Sep 21 08:08:28 hbsd-laptop-02 kernel: [29] #7 0xffffffff81200a9e at fork_trampoline+0xe
==== END LOG 02 ====

At work, I have two ethernet interfaces: the onboard em0 and a usb ethernet dongle.

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:          0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

xtouqh-2
In reply to this post by Kristof Provost
Kristof Provost wrote:
> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>  From latest HEAD on a Dell Precision 7550 laptop:
>>
>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>
>> The last working boot environment was 14 Aug 2020. If I get some time to
>> bisect commits, I'll try to figure out the culprit.
>>
> Try https://reviews.freebsd.org/D26418

Anything stopping this from being integrated?
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost
On 23 Sep 2020, at 19:37, [hidden email] wrote:

> Kristof Provost wrote:
>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>>  From latest HEAD on a Dell Precision 7550 laptop:
>>>
>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>
>>> The last working boot environment was 14 Aug 2020. If I get some time to
>>> bisect commits, I'll try to figure out the culprit.
>>>
>> Try https://reviews.freebsd.org/D26418
>
> Anything stopping this from being integrated?

Yes, it’s not correct.

I’ve got this on my todo list. I think I know how to fix it better.

Best regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Sergey V. Dyatko
In reply to this post by Kristof Provost
On Mon, 21 Sep 2020 09:57:40 +0200
"Kristof Provost" <[hidden email]> wrote:

> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
> >> From latest HEAD on a Dell Precision 7550 laptop:  
> >
> > https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
> >
> > The last working boot environment was 14 Aug 2020. If I get some time to
> > bisect commits, I'll try to figure out the culprit.
> >  
> Try https://reviews.freebsd.org/D26418
>
> Best regards,
> Kristof

I'm not sure, but doesn't this panic have the same root as mine?
Sorry, but I haven't text console and can post only screenshot[s]
 from IP-KVM
https://gyazo.com/fee41c5267e9fc543d43901e498b7c94

rc.conf have something like:
clonned_interfaces="lagg0 vlan101"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"

without VLAN part all works fine.
Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso




--
wbr, Sergey

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

xtouqh-2
Sergey V. Dyatko wrote:

> On Mon, 21 Sep 2020 09:57:40 +0200
> "Kristof Provost" <[hidden email]> wrote:
>
>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>>  From latest HEAD on a Dell Precision 7550 laptop:
>>>
>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>
>>> The last working boot environment was 14 Aug 2020. If I get some time to
>>> bisect commits, I'll try to figure out the culprit.
>>>  
>> Try https://reviews.freebsd.org/D26418
>>
>> Best regards,
>> Kristof
>
> I'm not sure, but doesn't this panic have the same root as mine?
> Sorry, but I haven't text console and can post only screenshot[s]
>   from IP-KVM
> https://gyazo.com/fee41c5267e9fc543d43901e498b7c94
>
> rc.conf have something like:
> clonned_interfaces="lagg0 vlan101"
> ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
> ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"
>
> without VLAN part all works fine.
> Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso

Yes, same panic.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

S.N. Trigub
Hi!

There is some serios issue in kernel related to network interfaces.
See my message
"speedtest.net in multi connections mode causes the FreeBSD 13-CURRENT
router to crash"
from 28 aug 2020.

I noticed that kernel every time goes into panic if users run on client
computers in browser
speedtest.net in multi connections mode.
If my external network interface use VLAN this panic occurs when uplink has
speed 100Mbits per second.
Without VLAN speedtest passes without any problems at 100Mbits channel but
every time goes into panic
at 1Gbits outer channel.
During crash, the console screen goes out and the server (router) stops
responding to the keyboard.
Can anyone do this test on their machine?

Sergei.


From: xt
Sent: Friday, September 25, 2020 8:46 PM
To: Sergey V. Dyatko ; Kristof Provost
Cc: FreeBSD Current
Subject: Re: iflib/bridge kernel panic

Sergey V. Dyatko wrote:

> On Mon, 21 Sep 2020 09:57:40 +0200
> "Kristof Provost" <[hidden email]> wrote:
>
>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>>  From latest HEAD on a Dell Precision 7550 laptop:
>>>
>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>
>>> The last working boot environment was 14 Aug 2020. If I get some time to
>>> bisect commits, I'll try to figure out the culprit.
>>>
>> Try https://reviews.freebsd.org/D26418
>>
>> Best regards,
>> Kristof
>
> I'm not sure, but doesn't this panic have the same root as mine?
> Sorry, but I haven't text console and can post only screenshot[s]
>   from IP-KVM
> https://gyazo.com/fee41c5267e9fc543d43901e498b7c94
>
> rc.conf have something like:
> clonned_interfaces="lagg0 vlan101"
> ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 x.x.x.x/mask"
> ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"
>
> without VLAN part all works fine.
> Installed from FreeBSD-13.0-CURRENT-amd64-20200924-3c514403bef-disc1.iso

Yes, same panic.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost
In reply to this post by Shawn Webb-3
On 21 Sep 2020, at 14:16, Shawn Webb wrote:

> On Mon, Sep 21, 2020 at 09:57:40AM +0200, Kristof Provost wrote:
>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>> From latest HEAD on a Dell Precision 7550 laptop:
>>>
>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>
>>> The last working boot environment was 14 Aug 2020. If I get some
>>> time to
>>> bisect commits, I'll try to figure out the culprit.
>>>
>> Try https://reviews.freebsd.org/D26418
>
> That seems to fix the kernel panic. dmesg gets spammed with a freak
> ton of these LOR messages now:
>
Here’s an early version of a task queue based approach:
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch

That still needs to be cleaned up, but this should resolve the sleep
issue and the LOR.

Best regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Alexander Leidinger

Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020  
17:51:32 +0200):

> On 21 Sep 2020, at 14:16, Shawn Webb wrote:
>> On Mon, Sep 21, 2020 at 09:57:40AM +0200, Kristof Provost wrote:
>>> On 21 Sep 2020, at 2:52, Shawn Webb wrote:
>>>>> From latest HEAD on a Dell Precision 7550 laptop:
>>>>
>>>> https://gist.github.com/lattera/a0803f31f58bcf8ead51ac1ebbc447e2
>>>>
>>>> The last working boot environment was 14 Aug 2020. If I get some time to
>>>> bisect commits, I'll try to figure out the culprit.
>>>>
>>> Try https://reviews.freebsd.org/D26418
>>
>> That seems to fix the kernel panic. dmesg gets spammed with a freak
>> ton of these LOR messages now:
>>
> Here’s an early version of a task queue based approach:  
> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>
> That still needs to be cleaned up, but this should resolve the sleep  
> issue and the LOR.
There are some issues... seems like inside a jail I can't ping systems  
outside of the hardware.

Bridge setup:
     - member jail A
     - member jail B
     - member external_if of host

If I ping the router from the host, it works. If I ping from one jail  
to another, it works. If I ping from the jail to the IP of the  
external_if, it works. If I ping from a jail to the router, I do not  
get a response.

Bye,
Alexander.

--
http://www.Leidinger.net [hidden email]: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    [hidden email]  : PGP 0x8F31830F9F2772BF

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost
On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:

> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020
> 17:51:32 +0200):
>> Here’s an early version of a task queue based approach:
>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>
>> That still needs to be cleaned up, but this should resolve the sleep
>> issue and the LOR.
>
> There are some issues... seems like inside a jail I can't ping systems
> outside of the hardware.
>
> Bridge setup:
>     - member jail A
>     - member jail B
>     - member external_if of host
>
> If I ping the router from the host, it works. If I ping from one jail
> to another, it works. If I ping from the jail to the IP of the
> external_if, it works. If I ping from a jail to the router, I do not
> get a response.
>
Can you check for 'failed ifpromisc' error messages in dmesg? And verify
that all bridge member interfaces are in promiscuous mode?

Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Alexander Leidinger

Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020  
13:53:16 +0200):

> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020  
>> 17:51:32 +0200):
>>> Here’s an early version of a task queue based approach:  
>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>>
>>> That still needs to be cleaned up, but this should resolve the  
>>> sleep issue and the LOR.
>>
>> There are some issues... seems like inside a jail I can't ping  
>> systems outside of the hardware.
>>
>> Bridge setup:
>>    - member jail A
>>    - member jail B
>>    - member external_if of host
>>
>> If I ping the router from the host, it works. If I ping from one  
>> jail to another, it works. If I ping from the jail to the IP of the  
>> external_if, it works. If I ping from a jail to the router, I do  
>> not get a response.
>>
> Can you check for 'failed ifpromisc' error messages in dmesg? And  
> verify that all bridge member interfaces are in promiscuous mode?
I have a panic for you...:
  - startup still in progress = 22 jails in startup, somewhere after a  
few jails started the panic happened
  - tcpdump was running on the external interface
  - a ping to a jail IP from another system was running, the first  
ping went through, then it paniced

First regarding your questions about promisc mode: no error, but the  
promisc mode is directly disabled again on all interfaces.

Data (external_if = igb0, jail epairs are j_X_Yif with X the ID of the  
jail and Y either h like host-side or j like jail-side):
---snip---
Host:

# ifconfig -a
igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         
options=4a520b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,NOMAP>
         ether [...]:a4
         inet 192.168.1.x netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]a4%igb0 prefixlen 64 scopeid 0x1
         inet6 fd73:[...] prefixlen 64
         inet6 2003:[...] prefixlen 64 autoconf
         inet6 fd73:[...] prefixlen 64 autoconf
         media: Ethernet autoselect (1000baseT <full-duplex>)
         status: active
         nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
igb1: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
         
options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
         ether [...]:a5
         media: Ethernet autoselect
         status: no carrier
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vswitch0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
         ether [...]:a3
         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
         maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200
         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
         member: j_weather_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 9 priority 128 path cost 2000
         member: j_web_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 8 priority 128 path cost 2000
         member: j_commit_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 7 priority 128 path cost 2000
         member: j_video_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 6 priority 128 path cost 2000
         member: j_dns_hif flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 5 priority 128 path cost 2000
         member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                 ifmaxaddr 0 port 1 priority 128 path cost 20000
         groups: bridge
         nd6 options=9<PERFORMNUD,IFDISABLED>
j_dns_hif: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0  
mtu 1500
         options=8<VLAN_MTU>
         ether [...]:0a
         hwaddr [...]:0a
         inet6 fe80::[...]0a%j_dns_hif prefixlen 64 scopeid 0x5
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
[... some more jail interfaces ...]

# dmesg | grep promis
igb0: promiscuous mode enabled
igb0: promiscuous mode disabled
j_dns_hif: promiscuous mode enabled
j_dns_hif: promiscuous mode disabled
[... some more like this ...]

# jexec 2 ifconfig -a
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
         options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
         inet6 ::1 prefixlen 128
         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
         inet 127.0.0.1 netmask 0xff000000
         groups: lo
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
j_dns_jif: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0  
mtu 1500
         options=8<VLAN_MTU>
         ether [...]:0b
         hwaddr [...]:0b
         inet 192.168.1.y netmask 0xffffff00 broadcast 192.168.1.255
         inet6 fe80::[...]0b%j_dns_jif prefixlen 64 scopeid 0x2
         inet6 fd73:[...]:y prefixlen 64
         groups: epair
         media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
         status: active
         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
---snip---

And here the backtrace of the panic:
---snip---
panic: if_setflag: decrement non-positive refcount 0 for flag 256
cpuid = 4
time = 1601300532
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0378ea3920
vpanic() at vpanic+0x182/frame 0xfffffe0378ea3970
panic() at panic+0x43/frame 0xfffffe0378ea39d0
if_setflag() at if_setflag+0x137/frame 0xfffffe0378ea3a30
ifpromisc() at ifpromisc+0x2a/frame 0xfffffe0378ea3a60
bpf_detachd_locked() at bpf_detachd_locked+0x280/frame 0xfffffe0378ea3ab0
bpf_dtor() at bpf_dtor+0x87/frame 0xfffffe0378ea3ad0
devfs_destroy_cdevpriv() at devfs_destroy_cdevpriv+0xa1/frame  
0xfffffe0378ea3af0
devfs_close_f() at devfs_close_f+0x6a/frame 0xfffffe0378ea3b20
_fdrop() at _fdrop+0x20/frame 0xfffffe0378ea3b40
closef() at closef+0x1ea/frame 0xfffffe0378ea3bd0
closefp() at closefp+0x90/frame 0xfffffe0378ea3c10
amd64_syscall() at amd64_syscall+0x13e/frame 0xfffffe0378ea3d30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0378ea3d30


__curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"  
(offsetof(struct pcpu,
(kgdb) #0  __curthread () at  
/space/system/usr_src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:394
#2  0xffffffff8051fb46 in kern_reboot (howto=260)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:481
#3  0xffffffff8051ff8a in vpanic (fmt=<optimized out>, ap=<optimized out>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:913
#4  0xffffffff8051fcf3 in panic (fmt=<unavailable>)
     at /space/system/usr_src/sys/kern/kern_shutdown.c:839
#5  0xffffffff806321f7 in if_setflag (ifp=0xfffff800036cc000,
     flag=<unavailable>, pflag=<optimized out>, refcount=0xfffff800036cc3a8,
     onswitch=<unavailable>) at /space/system/usr_src/sys/net/if.c:3135
#6  0xffffffff8063206a in ifpromisc (ifp=0xfffff800036cc000,
     pswitch=<unavailable>) at /space/system/usr_src/sys/net/if.c:3196
#7  0xffffffff80626450 in bpf_detachd_locked (d=<optimized out>,
     detached_ifp=<optimized out>) at /space/system/usr_src/sys/net/bpf.c:882
#8  0xffffffff80629277 in bpf_detachd (d=0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:836
#9  bpf_dtor (data=0xfffff8074cf42800)
     at /space/system/usr_src/sys/net/bpf.c:913
#10 0xffffffff80487531 in devfs_destroy_cdevpriv (p=0xfffff8074cf29c40)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:197
#11 0xffffffff8048b16a in devfs_fpdrop (fp=0xfffff8074cebaaf0)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:211
#12 devfs_close_f (fp=0xfffff8074cebaaf0, td=<optimized out>)
     at /space/system/usr_src/sys/fs/devfs/devfs_vnops.c:787
#13 0xffffffff804c4d70 in fo_close (fp=0xfffff8074cebaaf0, td=<unavailable>)
     at /space/system/usr_src/sys/sys/file.h:364
#14 _fdrop (fp=0xfffff8074cebaaf0, td=<unavailable>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:3120
#15 0xffffffff804c7eca in closef (fp=0xfffff8074cebaaf0,  
td=0xfffffe0382567500)
     at /space/system/usr_src/sys/kern/kern_descrip.c:2606
#16 0xffffffff804c51e0 in closefp (fdp=0xfffffe0307cbd950, fd=3,
     fp=0xfffff8074cebaaf0, td=0xfffffe0382567500, holdleaders=<optimized out>)
     at /space/system/usr_src/sys/kern/kern_descrip.c:1263
#17 0xffffffff808000ae in syscallenter (td=<optimized out>)
     at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:162
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net [hidden email]: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    [hidden email]  : PGP 0x8F31830F9F2772BF

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost


On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:

> Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020
> 13:53:16 +0200):
>
>> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>>> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020
>>> 17:51:32 +0200):
>>>> Here’s an early version of a task queue based approach:
>>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>>>
>>>> That still needs to be cleaned up, but this should resolve the
>>>> sleep issue and the LOR.
>>>
>>> There are some issues... seems like inside a jail I can't ping
>>> systems outside of the hardware.
>>>
>>> Bridge setup:
>>>    - member jail A
>>>    - member jail B
>>>    - member external_if of host
>>>
>>> If I ping the router from the host, it works. If I ping from one
>>> jail to another, it works. If I ping from the jail to the IP of the
>>> external_if, it works. If I ping from a jail to the router, I do not
>>> get a response.
>>>
>> Can you check for 'failed ifpromisc' error messages in dmesg? And
>> verify that all bridge member interfaces are in promiscuous mode?
>
> I have a panic for you...:
>  - startup still in progress = 22 jails in startup, somewhere after a
> few jails started the panic happened
>  - tcpdump was running on the external interface
>  - a ping to a jail IP from another system was running, the first ping
> went through, then it paniced
>
> First regarding your questions about promisc mode: no error, but the
> promisc mode is directly disabled again on all interfaces.
>
I think I see why you had issues with the promiscuous setting. I’ve
updated the patch to be even more horrific than it was before.

I can’t explain the panic, and the backtrace also doesn’t appear to
be directly related to this patch. Not sure what’s going on with that.

Krsitof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Shawn Webb-3
On Tue, Sep 29, 2020 at 11:20:44PM +0200, Kristof Provost wrote:

>
>
> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
>
> > Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020 13:53:16
> > +0200):
> >
> > > On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
> > > > Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020
> > > > 17:51:32 +0200):
> > > > > Here???s an early version of a task queue based approach: http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
> > > > >
> > > > > That still needs to be cleaned up, but this should resolve
> > > > > the sleep issue and the LOR.
> > > >
> > > > There are some issues... seems like inside a jail I can't ping
> > > > systems outside of the hardware.
> > > >
> > > > Bridge setup:
> > > >    - member jail A
> > > >    - member jail B
> > > >    - member external_if of host
> > > >
> > > > If I ping the router from the host, it works. If I ping from one
> > > > jail to another, it works. If I ping from the jail to the IP of
> > > > the external_if, it works. If I ping from a jail to the router,
> > > > I do not get a response.
> > > >
> > > Can you check for 'failed ifpromisc' error messages in dmesg? And
> > > verify that all bridge member interfaces are in promiscuous mode?
> >
> > I have a panic for you...:
> >  - startup still in progress = 22 jails in startup, somewhere after a
> > few jails started the panic happened
> >  - tcpdump was running on the external interface
> >  - a ping to a jail IP from another system was running, the first ping
> > went through, then it paniced
> >
> > First regarding your questions about promisc mode: no error, but the
> > promisc mode is directly disabled again on all interfaces.
> >
> I think I see why you had issues with the promiscuous setting. I???ve
> updated the patch to be even more horrific than it was before.
>
> I can???t explain the panic, and the backtrace also doesn???t appear to be
> directly related to this patch. Not sure what???s going on with that.
I should have time to test the new patch this weekend. ${LIFE} is
keeping me busy the past few weeks. I'm gonna add an event in my
calendar to remind me to test the patch. heh.

Thanks,

--
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

GPG Key ID:          0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2
https://git-01.md.hardenedbsd.org/HardenedBSD/pubkeys/src/branch/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Alexander Leidinger
In reply to this post by Kristof Provost

Quoting Kristof Provost <[hidden email]> (from Tue, 29 Sep 2020  
23:20:44 +0200):

> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
>
>> Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020  
>> 13:53:16 +0200):
>>
>>> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>>>> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020  
>>>> 17:51:32 +0200):
>>>>> Here’s an early version of a task queue based approach:  
>>>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>>>>
>>>>> That still needs to be cleaned up, but this should resolve the  
>>>>> sleep issue and the LOR.
>>>>
>>>> There are some issues... seems like inside a jail I can't ping  
>>>> systems outside of the hardware.
>>>>
>>>> Bridge setup:
>>>>   - member jail A
>>>>   - member jail B
>>>>   - member external_if of host
>>>>
>>>> If I ping the router from the host, it works. If I ping from one  
>>>> jail to another, it works. If I ping from the jail to the IP of  
>>>> the external_if, it works. If I ping from a jail to the router, I  
>>>> do not get a response.
>>>>
>>> Can you check for 'failed ifpromisc' error messages in dmesg? And  
>>> verify that all bridge member interfaces are in promiscuous mode?
>>
>> I have a panic for you...:
>> - startup still in progress = 22 jails in startup, somewhere after  
>> a few jails started the panic happened
>> - tcpdump was running on the external interface
>> - a ping to a jail IP from another system was running, the first  
>> ping went through, then it paniced
>>
>> First regarding your questions about promisc mode: no error, but  
>> the promisc mode is directly disabled again on all interfaces.
>>
> I think I see why you had issues with the promiscuous setting. I’ve  
> updated the patch to be even more horrific than it was before.
Hmmm.... same behavior as before.
I haven't kept the old version of the patch, so I can't compare if I  
somehow downloaded the old version again, or if I got the updated one...

# md5 0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
MD5 (0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch) =  
9f107739e29fad5c9bb5e75e2dae7bcc

> I can’t explain the panic, and the backtrace also doesn’t appear to  
> be directly related to this patch. Not sure what’s going on with that.

Then let's hope for now it is some kind of defect which is not showing  
up when it works as it should... we can have a look at it again in  
case it reproduces with the final patch.

Bye,
Alexander.


--
http://www.Leidinger.net [hidden email]: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    [hidden email]  : PGP 0x8F31830F9F2772BF

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Dustin Marquess
In reply to this post by Kristof Provost
On Tue, Sep 29, 2020 at 4:21 PM Kristof Provost <[hidden email]> wrote:

>
> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
>
> > Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020
> > 13:53:16 +0200):
> >
> >> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
> >>> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020
> >>> 17:51:32 +0200):
> >>>> Here’s an early version of a task queue based approach:
> >>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
> >>>>
> >>>> That still needs to be cleaned up, but this should resolve the
> >>>> sleep issue and the LOR.
> >>>
> >>> There are some issues... seems like inside a jail I can't ping
> >>> systems outside of the hardware.

So similar to the others, kind of.  Using the original
https://reviews.freebsd.org/D26418 patch, everything seems to work
fine.  Using the newer
http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
patch, byhve VMs on the bridge attached to the igb/em(5) interfaces
don't pass traffic.  The bhyve VMs on the bridge attached to the
cxgbe(4) interfaces, however, work fine.

-Dustin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Kristof Provost
In reply to this post by Alexander Leidinger
On 30 Sep 2020, at 13:52, Alexander Leidinger wrote:

> Quoting Kristof Provost <[hidden email]> (from Tue, 29 Sep 2020
> 23:20:44 +0200):
>
>> On 28 Sep 2020, at 16:44, Alexander Leidinger wrote:
>>
>>> Quoting Kristof Provost <[hidden email]> (from Mon, 28 Sep 2020
>>> 13:53:16 +0200):
>>>
>>>> On 28 Sep 2020, at 12:45, Alexander Leidinger wrote:
>>>>> Quoting Kristof Provost <[hidden email]> (from Sun, 27 Sep 2020
>>>>> 17:51:32 +0200):
>>>>>> Here’s an early version of a task queue based approach:
>>>>>> http://people.freebsd.org/~kp/0001-bridge-Cope-with-if_ioctl-s-that-sleep.patch
>>>>>>
>>>>>> That still needs to be cleaned up, but this should resolve the
>>>>>> sleep issue and the LOR.
>>>>>
>>>>> There are some issues... seems like inside a jail I can't ping
>>>>> systems outside of the hardware.
>>>>>
>>>>> Bridge setup:
>>>>>   - member jail A
>>>>>   - member jail B
>>>>>   - member external_if of host
>>>>>
>>>>> If I ping the router from the host, it works. If I ping from one
>>>>> jail to another, it works. If I ping from the jail to the IP of
>>>>> the external_if, it works. If I ping from a jail to the router, I
>>>>> do not get a response.
>>>>>
>>>> Can you check for 'failed ifpromisc' error messages in dmesg? And
>>>> verify that all bridge member interfaces are in promiscuous mode?
>>>
>>> I have a panic for you...:
>>> - startup still in progress = 22 jails in startup, somewhere after a
>>> few jails started the panic happened
>>> - tcpdump was running on the external interface
>>> - a ping to a jail IP from another system was running, the first
>>> ping went through, then it paniced
>>>
>>> First regarding your questions about promisc mode: no error, but the
>>> promisc mode is directly disabled again on all interfaces.
>>>
>> I think I see why you had issues with the promiscuous setting. I’ve
>> updated the patch to be even more horrific than it was before.
>
> Hmmm.... same behavior as before.
> I haven't kept the old version of the patch, so I can't compare if I
> somehow downloaded the old version again, or if I got the updated
> one...
>
Okay, let’s abandon that patch. It’s ugly and it doesn’t work.

Here’s a different approach that I’m much happier with.
https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch

It passes the regression tests with WITNESS and INVARIANTS enabled, and
a hack in the epair ioctl() handler to make it sleep (to look a bit like
the Intel ioctl() handler that currently trips up if_bridge).

Best,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Alexander Leidinger
Quoting Kristof Provost <[hidden email]> (from Sat, 03 Oct 2020  
16:06:43 +0200):

> Okay, let’s abandon that patch. It’s ugly and it doesn’t work.
>
> Here’s a different approach that I’m much happier with.
> https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch
>
> It passes the regression tests with WITNESS and INVARIANTS enabled,  
> and a hack in the epair ioctl() handler to make it sleep (to look a  
> bit like the Intel ioctl() handler that currently trips up if_bridge).

Works for me.

No crash, no LOR, promisc-mode stays enabled, jails are reachable.

Bye,
Alexander.

--
http://www.Leidinger.net [hidden email]: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    [hidden email]  : PGP 0x8F31830F9F2772BF

attachment0 (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: iflib/bridge kernel panic

Felix Kronlage-2
Alexander Leidinger wrote on 03.10.20 17:37:

> Quoting Kristof Provost <[hidden email]> (from Sat, 03 Oct 2020 16:06:43
> +0200):

>> Okay, let’s abandon that patch. It’s ugly and it doesn’t work.
>>
>> Here’s a different approach that I’m much happier with.
>> https://people.freebsd.org/~kp/0001-bridge-Call-member-interface-ioctl-without-NET_EPOCH.patch
>>
>>
>> It passes the regression tests with WITNESS and INVARIANTS enabled,
>> and a hack in the epair ioctl() handler to make it sleep (to look a
>> bit like the Intel ioctl() handler that currently trips up if_bridge).
> Works for me.
> No crash, no LOR, promisc-mode stays enabled, jails are reachable.

indeed! I can second that. Works nicely, my machine does not panic
anymore and machines (bhyve vms) behind the bridge are reachable.


felix

--
GPG/PGP: 7A0B612C / 5F4D 9B06 C240 3250 35BF 66ED 1AD3 A9B8 7A0B 612C
https://hazardous.org/ - [hidden email] - fkr@irc - @felixkronlage
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
12