Quantcast

MFC VIMAGE fixes to 11-stable

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

MFC VIMAGE fixes to 11-stable

peter.blok
All,

I’m running jails and bhyve using netgraph bridge. The jails are using Devin Teske’s jng and I have adapted iohyve to use the same netgraph bridge.

I haven’t had any panic’s after applying revisions 306684, 312943, 315131, 315469, 307235, 313001, 315136 and 315741.

Can someone please MFC them to 11-stable?

I’m starting and stopping jails in a continous loop. I have added some extra code to track an occasional panic in pf_purge_expired_states, but so far no luck.

I also have a change in zone_release to fix another panic and leak in slab_free_item. The issue is that zone_release tries to release a keg that never belonged to the zone it is trying to release. With my limited knowledge, i think that should not happen.

--- vm/uma_core.c (revision 317156)
+++ vm/uma_core.c (working copy)
@@ -2846,7 +2846,8 @@
  KEG_LOCK(keg);
  }
  }
- slab_free_item(keg, slab, item);
+ if (keg == slab->us_keg)
+ slab_free_item(keg, slab, item);
  if (keg->uk_flags & UMA_ZFLAG_FULL) {
  if (keg->uk_pages < keg->uk_maxpages) {
  keg->uk_flags &= ~UMA_ZFLAG_FULL;


Peter



_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Marko Zec-2
On Wed, 19 Apr 2017 21:31:50 +0200
<[hidden email]> wrote:
...

> I also have a change in zone_release to fix another panic and leak in
> slab_free_item. The issue is that zone_release tries to release a keg
> that never belonged to the zone it is trying to release. With my
> limited knowledge, i think that should not happen.
>
> --- vm/uma_core.c (revision 317156)
> +++ vm/uma_core.c (working copy)
> @@ -2846,7 +2846,8 @@
>   KEG_LOCK(keg);
>   }
>   }
> - slab_free_item(keg, slab, item);
> + if (keg == slab->us_keg)
> + slab_free_item(keg, slab, item);
>   if (keg->uk_flags & UMA_ZFLAG_FULL) {
>   if (keg->uk_pages < keg->uk_maxpages) {
>   keg->uk_flags &= ~UMA_ZFLAG_FULL;
>

This change only masks the cause of the panic while still continuing to
leak memory, and should never be commited.

The real culprit lies somewhere in PF code which operates on a wrong
vnet.  Without a backtrace it's difficult to guess, but a quick read
reveals that

pfi_initialize()

is called from the default vnet context, and subsequently registers
interface eventhandlers so that all interface attach, change and detach
events will be always executed in the default vnet, regardless of the
real vnet where the interfaces bound to the events actually reside.  In
other words,

pfi_attach_group_event()
pfi_change_group_event()
pfi_detach_group_event()

will operate fine only in the default vnet, but will wreak havoc
otherwise.  Hence, those handlers should be fixed first.

Marko
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

peter.blok
Hi Marko,

Thanks for the pointer. It was not my intention to have this committed, but it helped identify other problems. I have asked this before in -current, but got no answer so I posted it here to get an answer.

If you look inside slab_free_item there is a KASSERT for just this, so that’s why I tried it.

I have added debug information to print the zone’s and the keg’s and It all looked good. I was not able to find any place where we operated on the wrong context, but perhaps I missed one.

I’ll dig further.

Peter


> On 20 Apr 2017, at 12:42, Marko Zec <[hidden email]> wrote:
>
> On Wed, 19 Apr 2017 21:31:50 +0200
> <[hidden email]> wrote:
> ...
>> I also have a change in zone_release to fix another panic and leak in
>> slab_free_item. The issue is that zone_release tries to release a keg
>> that never belonged to the zone it is trying to release. With my
>> limited knowledge, i think that should not happen.
>>
>> --- vm/uma_core.c (revision 317156)
>> +++ vm/uma_core.c (working copy)
>> @@ -2846,7 +2846,8 @@
>> KEG_LOCK(keg);
>> }
>> }
>> - slab_free_item(keg, slab, item);
>> + if (keg == slab->us_keg)
>> + slab_free_item(keg, slab, item);
>> if (keg->uk_flags & UMA_ZFLAG_FULL) {
>> if (keg->uk_pages < keg->uk_maxpages) {
>> keg->uk_flags &= ~UMA_ZFLAG_FULL;
>>
>
> This change only masks the cause of the panic while still continuing to
> leak memory, and should never be commited.
>
> The real culprit lies somewhere in PF code which operates on a wrong
> vnet.  Without a backtrace it's difficult to guess, but a quick read
> reveals that
>
> pfi_initialize()
>
> is called from the default vnet context, and subsequently registers
> interface eventhandlers so that all interface attach, change and detach
> events will be always executed in the default vnet, regardless of the
> real vnet where the interfaces bound to the events actually reside.  In
> other words,
>
> pfi_attach_group_event()
> pfi_change_group_event()
> pfi_detach_group_event()
>
> will operate fine only in the default vnet, but will wreak havoc
> otherwise.  Hence, those handlers should be fixed first.
>
> Marko
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Kristof Provost-3
In reply to this post by Marko Zec-2
On 20 Apr 2017, at 12:42, Marko Zec wrote:

> The real culprit lies somewhere in PF code which operates on a wrong
> vnet.  Without a backtrace it's difficult to guess, but a quick read
> reveals that
>
> pfi_initialize()
>
> is called from the default vnet context, and subsequently registers
> interface eventhandlers so that all interface attach, change and
> detach
> events will be always executed in the default vnet, regardless of the
> real vnet where the interfaces bound to the events actually reside.  
> In
> other words,
>
> pfi_attach_group_event()
> pfi_change_group_event()
> pfi_detach_group_event()
>
> will operate fine only in the default vnet, but will wreak havoc
> otherwise.  Hence, those handlers should be fixed first.
>
I don’t think that’s right.

The event handler doesn’t carry vnet information.  It’s just called
in whatever
vnet is active when it’s invoked.
There's no CURVNET_SET() in the EVENTHANDLER_INVOKE() macro.

That means that we end up in pf_attach_group_event() with CURVNET set to
the
relevant vnet, not to the default vnet.

There are certainly still issues with pf and vnets, but I don't think
this is
one.

Regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Marko Zec-2
On Thu, 20 Apr 2017 15:13:42 +0200
Kristof Provost <[hidden email]> wrote:

> On 20 Apr 2017, at 12:42, Marko Zec wrote:
> > The real culprit lies somewhere in PF code which operates on a wrong
> > vnet.  Without a backtrace it's difficult to guess, but a quick read
> > reveals that
> >
> > pfi_initialize()
> >
> > is called from the default vnet context, and subsequently registers
> > interface eventhandlers so that all interface attach, change and
> > detach
> > events will be always executed in the default vnet, regardless of
> > the real vnet where the interfaces bound to the events actually
> > reside. In
> > other words,
> >
> > pfi_attach_group_event()
> > pfi_change_group_event()
> > pfi_detach_group_event()
> >
> > will operate fine only in the default vnet, but will wreak havoc
> > otherwise.  Hence, those handlers should be fixed first.
> >  
> I don’t think that’s right.
>
> The event handler doesn’t carry vnet information.  It’s just called
> in whatever vnet is active when it’s invoked.
> There's no CURVNET_SET() in the EVENTHANDLER_INVOKE() macro.
>
> That means that we end up in pf_attach_group_event() with CURVNET set
> to the relevant vnet, not to the default vnet.

Right.  But pfi_attach_group_event() and the other handlers cited above
_do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding the proper
vnet choice from the caller.

Therefore the proper fix should be as simple as removing CURVNET_SET() /
CURVNET_RESTORE() macro pairs from the cited handlers.

Marko


>
> There are certainly still issues with pf and vnets, but I don't think
> this is
> one.
>
> Regards,
> Kristof

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Kristof Provost-3
On 20 Apr 2017, at 15:28, Marko Zec wrote:
> Right.  But pfi_attach_group_event() and the other handlers cited
> above
> _do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding the proper
> vnet choice from the caller.
>
Yes, that does look wrong.
I should have looked a bit further.

> Therefore the proper fix should be as simple as removing CURVNET_SET()
> /
> CURVNET_RESTORE() macro pairs from the cited handlers.
>
Hopefully, yes. I’ve still got some other pf/vnet issues on my todo
list, but
I’ve now added this too.  It might actually explain some other bug
report I’ve
seen (but not looked at in any depth).

Regards,
Kristof
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

peter.blok
I’ll test this today.

> On 20 Apr 2017, at 15:32, Kristof Provost <[hidden email]> wrote:
>
> On 20 Apr 2017, at 15:28, Marko Zec wrote:
>> Right.  But pfi_attach_group_event() and the other handlers cited above
>> _do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding the proper
>> vnet choice from the caller.
>>
> Yes, that does look wrong.
> I should have looked a bit further.
>
>> Therefore the proper fix should be as simple as removing CURVNET_SET() /
>> CURVNET_RESTORE() macro pairs from the cited handlers.
>>
> Hopefully, yes. I’ve still got some other pf/vnet issues on my todo list, but
> I’ve now added this too.  It might actually explain some other bug report I’ve
> seen (but not looked at in any depth).
>
> Regards,
> Kristof
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

peter.blok
In reply to this post by Kristof Provost-3
It doesn’t solve my problem, but below is the stack back trace that leads to the problem that allocation doen for the default vnet are given back as part of the vnet destroy.

#0 0xffffffff807ff275 at pfr_destroy_kentry+0x35
#1 0xffffffff807fe47c at pfr_remove_kentries+0x1fc
#2 0xffffffff808053cd at pfr_setflags_ktable+0xcd
#3 0xffffffff80802108 at pfr_clr_tables+0x248
#4 0xffffffff807ecd75 at vnet_pf_uninit+0x4a5
#5 0xffffffff806a9d2c at vnet_destroy+0x13c
#6 0xffffffff8056cdcf at prison_deref+0x2af
#7 0xffffffff805ee287 at taskqueue_run_locked+0x127
#8 0xffffffff805ef428 at taskqueue_thread_loop+0xc8
#9 0xffffffff80565505 at fork_exit+0x85
#10 0xffffffff808d245e at fork_trampoline+0xe


> On 20 Apr 2017, at 15:32, Kristof Provost <[hidden email]> wrote:
>
> On 20 Apr 2017, at 15:28, Marko Zec wrote:
>> Right.  But pfi_attach_group_event() and the other handlers cited above
>> _do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding the proper
>> vnet choice from the caller.
>>
> Yes, that does look wrong.
> I should have looked a bit further.
>
>> Therefore the proper fix should be as simple as removing CURVNET_SET() /
>> CURVNET_RESTORE() macro pairs from the cited handlers.
>>
> Hopefully, yes. I’ve still got some other pf/vnet issues on my todo list, but
> I’ve now added this too.  It might actually explain some other bug report I’ve
> seen (but not looked at in any depth).
>
> Regards,
> Kristof
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Marko Zec-2
On Thu, 20 Apr 2017 21:24:33 +0200
<[hidden email]> wrote:

> It doesn’t solve my problem, but below is the stack back trace that
> leads to the problem that allocation doen for the default vnet are
> given back as part of the vnet destroy.
>
> #0 0xffffffff807ff275 at pfr_destroy_kentry+0x35
> #1 0xffffffff807fe47c at pfr_remove_kentries+0x1fc
> #2 0xffffffff808053cd at pfr_setflags_ktable+0xcd
> #3 0xffffffff80802108 at pfr_clr_tables+0x248
> #4 0xffffffff807ecd75 at vnet_pf_uninit+0x4a5
> #5 0xffffffff806a9d2c at vnet_destroy+0x13c
> #6 0xffffffff8056cdcf at prison_deref+0x2af
> #7 0xffffffff805ee287 at taskqueue_run_locked+0x127
> #8 0xffffffff805ef428 at taskqueue_thread_loop+0xc8
> #9 0xffffffff80565505 at fork_exit+0x85
> #10 0xffffffff808d245e at fork_trampoline+0xe

Having absolutely no clue what the PF code does or is supposed to do,
I'd bet that V_irtualizing pfr_ktablehead, and probably pfr_nulltable
and pfr_ktable_cnt as well, would help here.

Marko

>
>
> > On 20 Apr 2017, at 15:32, Kristof Provost <[hidden email]>
> > wrote:
> >
> > On 20 Apr 2017, at 15:28, Marko Zec wrote:  
> >> Right.  But pfi_attach_group_event() and the other handlers cited
> >> above _do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding
> >> the proper vnet choice from the caller.
> >>  
> > Yes, that does look wrong.
> > I should have looked a bit further.
> >  
> >> Therefore the proper fix should be as simple as removing
> >> CURVNET_SET() / CURVNET_RESTORE() macro pairs from the cited
> >> handlers.
> > Hopefully, yes. I’ve still got some other pf/vnet issues on my todo
> > list, but I’ve now added this too.  It might actually explain some
> > other bug report I’ve seen (but not looked at in any depth).
> >
> > Regards,
> > Kristof
> > _______________________________________________
> > [hidden email] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to
> > "[hidden email]"  
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

Marko Zec-2
On Thu, 20 Apr 2017 21:41:28 +0200
Marko Zec <[hidden email]> wrote:

> [This sender failed our fraud detection checks and may not be who
> they appear to be. Learn about spoofing at
> http://aka.ms/LearnAboutSpoofing]
>
> On Thu, 20 Apr 2017 21:24:33 +0200
> <[hidden email]> wrote:
>
> > It doesn’t solve my problem, but below is the stack back trace that
> > leads to the problem that allocation doen for the default vnet are
> > given back as part of the vnet destroy.
> >
> > #0 0xffffffff807ff275 at pfr_destroy_kentry+0x35
> > #1 0xffffffff807fe47c at pfr_remove_kentries+0x1fc
> > #2 0xffffffff808053cd at pfr_setflags_ktable+0xcd
> > #3 0xffffffff80802108 at pfr_clr_tables+0x248
> > #4 0xffffffff807ecd75 at vnet_pf_uninit+0x4a5
> > #5 0xffffffff806a9d2c at vnet_destroy+0x13c
> > #6 0xffffffff8056cdcf at prison_deref+0x2af
> > #7 0xffffffff805ee287 at taskqueue_run_locked+0x127
> > #8 0xffffffff805ef428 at taskqueue_thread_loop+0xc8
> > #9 0xffffffff80565505 at fork_exit+0x85
> > #10 0xffffffff808d245e at fork_trampoline+0xe  
>
> Having absolutely no clue what the PF code does or is supposed to do,
> I'd bet that V_irtualizing pfr_ktablehead, and probably pfr_nulltable
> and pfr_ktable_cnt as well, would help here.

pf_main_anchor looks suspicious, too, as it is never dereferenced via
the VNET() accessor, so effectively it is still used as a single global
variable.

Marko
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: MFC VIMAGE fixes to 11-stable

peter.blok
In reply to this post by Marko Zec-2
Yeah, you are right. To keep the pf code as unchanged as possible, it is sometimes unclear whether something is virtualised or not.

The SLIST_HEAD and RB_HEAD in pfvar.h need virtualisation as well.


> On 20 Apr 2017, at 21:41, Marko Zec <[hidden email]> wrote:
>
> On Thu, 20 Apr 2017 21:24:33 +0200
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>> It doesn’t solve my problem, but below is the stack back trace that
>> leads to the problem that allocation doen for the default vnet are
>> given back as part of the vnet destroy.
>>
>> #0 0xffffffff807ff275 at pfr_destroy_kentry+0x35
>> #1 0xffffffff807fe47c at pfr_remove_kentries+0x1fc
>> #2 0xffffffff808053cd at pfr_setflags_ktable+0xcd
>> #3 0xffffffff80802108 at pfr_clr_tables+0x248
>> #4 0xffffffff807ecd75 at vnet_pf_uninit+0x4a5
>> #5 0xffffffff806a9d2c at vnet_destroy+0x13c
>> #6 0xffffffff8056cdcf at prison_deref+0x2af
>> #7 0xffffffff805ee287 at taskqueue_run_locked+0x127
>> #8 0xffffffff805ef428 at taskqueue_thread_loop+0xc8
>> #9 0xffffffff80565505 at fork_exit+0x85
>> #10 0xffffffff808d245e at fork_trampoline+0xe
>
> Having absolutely no clue what the PF code does or is supposed to do,
> I'd bet that V_irtualizing pfr_ktablehead, and probably pfr_nulltable
> and pfr_ktable_cnt as well, would help here.
>
> Marko
>
>>
>>
>>> On 20 Apr 2017, at 15:32, Kristof Provost <[hidden email]>
>>> wrote:
>>>
>>> On 20 Apr 2017, at 15:28, Marko Zec wrote:  
>>>> Right.  But pfi_attach_group_event() and the other handlers cited
>>>> above _do_ in fact invoke CURVNET_SET(vnet0) on entry, overriding
>>>> the proper vnet choice from the caller.
>>>>
>>> Yes, that does look wrong.
>>> I should have looked a bit further.
>>>
>>>> Therefore the proper fix should be as simple as removing
>>>> CURVNET_SET() / CURVNET_RESTORE() macro pairs from the cited
>>>> handlers.
>>> Hopefully, yes. I’ve still got some other pf/vnet issues on my todo
>>> list, but I’ve now added this too.  It might actually explain some
>>> other bug report I’ve seen (but not looked at in any depth).
>>>
>>> Regards,
>>> Kristof
>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to
>>> "[hidden email]"  

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[hidden email]"
Loading...