cpuset/jail creation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

cpuset/jail creation

Kyle Evans-3
Hello!

I've done some work to try and make qemu-user-static honor cpuset and
advertise a fake hw.ncpu to emulated processes based on the number of
cpus actually available to it.

In doing so, I discovered that created jail's inherit the parent
jail's cpuset mask, rather than the creating process. This is OK for
persistent jails, as one can create the jail, cpuset(1), then jexec
whatever tasks they want to do; but jails that would otherwise not be
persistent have to either deal with racy cpuset(1) after creation or
settle for instead creating a persistent jail because they need a
specific cpuset.

I've got this patch that I'd like to propose[0], but it's unclear if
it's really OK to do or if anyone else cares about this. I can't see
any theoretical problem with it off-hand, as the creating thread
should be guaranteed to have a cpuset that's valid as a child of the
parent prison's cpuset.

The patch creates a new poorly-named cpuset_create_root_td KPI to
inherit the cpuset from the creating thread, and leaves the previous
KPI intact in case something else is using it or to leave the door
open to adding an option to go either way with this (inherit from
parent jail vs. inherit from thread).

From a MAC perspective, I think it makes a lot of sense to inherit
from the thread by default. e.g. a non-root user could be granted
PRIV_JAIL_SET, then they're freely able to create jails using the
parent jail's root cpuset even if they've been limited themselves via
login.conf(5) restriction.

For most existing use-cases, it should effectively be a nop unless
they were cpuset(1)ing a process not expecting the created jail to
inherit that.

Thoughts? Thanks,

Kyle Evans

[0] https://people.freebsd.org/~kevans/jail-cpuset.diff
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cpuset/jail creation

Kyle Evans-3
On Wed, Nov 18, 2020 at 12:39 PM James Gritton <[hidden email]> wrote:

>
> On 2020-11-17 10:40, Kyle Evans wrote:
> > Hello!
> >
> > I've done some work to try and make qemu-user-static honor cpuset and
> > advertise a fake hw.ncpu to emulated processes based on the number of
> > cpus actually available to it.
> >
> > In doing so, I discovered that created jail's inherit the parent
> > jail's cpuset mask, rather than the creating process. This is OK for
> > persistent jails, as one can create the jail, cpuset(1), then jexec
> > whatever tasks they want to do; but jails that would otherwise not be
> > persistent have to either deal with racy cpuset(1) after creation or
> > settle for instead creating a persistent jail because they need a
> > specific cpuset.
> >
> > I've got this patch that I'd like to propose[0], but it's unclear if
> > it's really OK to do or if anyone else cares about this. I can't see
> > any theoretical problem with it off-hand, as the creating thread
> > should be guaranteed to have a cpuset that's valid as a child of the
> > parent prison's cpuset.
> >
> > The patch creates a new poorly-named cpuset_create_root_td KPI to
> > inherit the cpuset from the creating thread, and leaves the previous
> > KPI intact in case something else is using it or to leave the door
> > open to adding an option to go either way with this (inherit from
> > parent jail vs. inherit from thread).
> >
> > From a MAC perspective, I think it makes a lot of sense to inherit
> > from the thread by default. e.g. a non-root user could be granted
> > PRIV_JAIL_SET, then they're freely able to create jails using the
> > parent jail's root cpuset even if they've been limited themselves via
> > login.conf(5) restriction.
> >
> > For most existing use-cases, it should effectively be a nop unless
> > they were cpuset(1)ing a process not expecting the created jail to
> > inherit that.
> >
> > Thoughts? Thanks,
> >
> > Kyle Evans
> >
> > [0] https://people.freebsd.org/~kevans/jail-cpuset.diff
>
> There are two separate issues here: the cpuset of the jail, and the
> cpuset of the processes in the jail.
>
> For the first, I think the current code does the right thing,
> inheriting the cpuset from the parent jail.
>
> For the second, the current code appears to be *almost* correct.  When
> you attach to a jail, including a non-persistent one, you keep the
> thread's cpuset, but it is masked to the jail's cpuset.  Thus a jailed
> process is always:
> * at least as restricted as the jail's cpuset.
> * at least as restricted as the process' original cpuset.
>
> So a process that has a restricted cpuset may be able to create a jail
> that allows CPUs that the process lacks access to, but it still can't
> run on those CPUs.  This applies to non-persistent jails: the cpuset
> of the jail ends up being somewhat illusory, since the only thing
> running in the jail is still restricted to its original cpuset (or
> less, if the parent jail is also restricted).  True, if some other
> process later attaches to that jail it could use CPUs that the jail
> creator couldn't.  But IMHO that's as it should be: this is only if
> the later-attaching process had access to those CPUs anyway.
>

Sure, ok, that makes sense.

> Now I mention that the current behavior is almost correct.  In trying
> this out, I found that the attached process keeps its cpuset, with a
> mask applied from the jail cpuset.  I think it would be better the
> other way around: use the jail's cpuset, and mask it to the current
> thread mask.  Interestingly, if attaching to a jail from a normal
> (unrestricted) system thread, that's exactly what happens.  This is
> down to an implementation detail of cpuset_setproc_setthread that
> looks simple to change but that where I'm unsure of side effects
> without further exploration.  The only case where this matters is the
> "racy cpuset(1) after creation" you mentioned, where setting the
> jail's cpu list wouldn't change anything running in the non-persistent
> jail, because those are still in the process' cpuset (even if masked).
>
> So I don't think cpuset_create_root_td() is necessary, but I do think
> cpuset_setproc_sethread and cpuset_setproc_setthread_mask may need
> some tweaking
>

I can also agree with that, and that the changes are a little more
involved. cpuset_setproc w/ non-NULL set is also used for cpuset(2)
and cpuset_setid(2), so the current contract needs to be maintained
for those interfaces because that is pretty much the essence of what
they do.

I'm having a hard time reconciling this, though:

 * 1) Set is non-null.  This reparents all anonymous sets to the provided
 *    set and replaces all non-anonymous td_cpusets with the provided set

The verbiage would seem to indicate to me that the new parent provided
by cpuset_setproc_setthread() should be set, but it's not:

        return cpuset_shadow(tdset, nsetp, &mask, &domain, freelist,
            domainlist);

cpuset_shadow makes tdset the parent, unless it's anon -- then it uses
the base of tdset, which appears to be wrong. I think this really
wants to be passing 'set' as the first parameter to meet the promise
of cpuset_settproc.

I've mulled over the rest of it a little bit, and I think this is what
I came up with based on your assessment:
https://people.freebsd.org/~kevans/cpuset-reroot.diff

The basic premise of this patch is that we shouldn't use the set
passed to cpuset_setproc directly if it has a different root than
whatever root the process has [*]. More importantly, the process
should not lose the identity of its base in the move. Currently, in
the scenario that I'm going through, the process and its single thread
goes from having its own cpuset 5 that was just created with a
different affinity to an anonymous set hanging off of the jail's
cpuset.

We can make the obvious change and make that an anonymous set with the
correct mask, but I think that's actually not an ideal behavior; if
you inspect the process cpuset id from before and after the attach, it
goes from having a cpuset id that you can modify to one that you
cannot modify because the one thread has an anonymous cpuset with
parent set to the prison's cpuset.

On the other hand, let's go the other way with it: say I have a
process based on cpuset 1 and just a different affinity, so the one
thread has an anonymous cpuset based on the mother of all prison0
processes. With the above patch applied, it suddenly gets its own
numbered cpuset [**].

[*] This isn't quite right, though, because unjailed processes can
cpuset_setid(2) any cpuset on the system, jailed or not. We likely
need to have the caller indicate if this is a jail update for the
process cpuset.

[**] I think this particular issue can be easily addressed by walking
through all of the threads in the process one more time, just before
we allocate a new base cpuset for it if we're going to. If every
thread has an anonymous set, then we can simply avoid creating a new
base for it and just keep the existing behavior, which will do the
right thing -- create a new anonymous set hanging off of the jail's
root. The jail's root set can be modified by real root, so from that
perspective nothing has changed.

This should be safe, even, because we hold the proc lock and all of
the paths that could modify one of its thread cpusets to invalidate
the walk must also grab the proc lock.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cpuset/jail creation

Kyle Evans-3
On Thu, Nov 19, 2020 at 10:56 PM Kyle Evans <[hidden email]> wrote:

>
> [... snip ...]
>
> On the other hand, let's go the other way with it: say I have a
> process based on cpuset 1 and just a different affinity, so the one
> thread has an anonymous cpuset based on the mother of all prison0
> processes. With the above patch applied, it suddenly gets its own
> numbered cpuset [**].
>
> [... snip ...]
>
> [**] I think this particular issue can be easily addressed by walking
> through all of the threads in the process one more time, just before
> we allocate a new base cpuset for it if we're going to. If every
> thread has an anonymous set, then we can simply avoid creating a new
> base for it and just keep the existing behavior, which will do the
> right thing -- create a new anonymous set hanging off of the jail's
> root. The jail's root set can be modified by real root, so from that
> perspective nothing has changed.

It turns out this isn't quite right, and we're a tad inconsistent:
prison0 processes don't inherently get its root cpuset, while other
jails' processes do. The above should have been "every thread has
either an anonymous set or the root set", but we can't do that, so
instead: "anonymous set, the root set, or the default set (i.e.
thread0's)"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cpuset/jail creation

Kyle Evans-3
In reply to this post by Kyle Evans-3
On Tue, Nov 17, 2020 at 12:40 PM Kyle Evans <[hidden email]> wrote:

>
> Hello!
>
> I've done some work to try and make qemu-user-static honor cpuset and
> advertise a fake hw.ncpu to emulated processes based on the number of
> cpus actually available to it.
>
> In doing so, I discovered that created jail's inherit the parent
> jail's cpuset mask, rather than the creating process. This is OK for
> persistent jails, as one can create the jail, cpuset(1), then jexec
> whatever tasks they want to do; but jails that would otherwise not be
> persistent have to either deal with racy cpuset(1) after creation or
> settle for instead creating a persistent jail because they need a
> specific cpuset.
>
> I've got this patch that I'd like to propose[0], but it's unclear if
> it's really OK to do or if anyone else cares about this. I can't see
> any theoretical problem with it off-hand, as the creating thread
> should be guaranteed to have a cpuset that's valid as a child of the
> parent prison's cpuset.
>
> The patch creates a new poorly-named cpuset_create_root_td KPI to
> inherit the cpuset from the creating thread, and leaves the previous
> KPI intact in case something else is using it or to leave the door
> open to adding an option to go either way with this (inherit from
> parent jail vs. inherit from thread).
>
> From a MAC perspective, I think it makes a lot of sense to inherit
> from the thread by default. e.g. a non-root user could be granted
> PRIV_JAIL_SET, then they're freely able to create jails using the
> parent jail's root cpuset even if they've been limited themselves via
> login.conf(5) restriction.
>
> For most existing use-cases, it should effectively be a nop unless
> they were cpuset(1)ing a process not expecting the created jail to
> inherit that.
>
> Thoughts? Thanks,
>
> Kyle Evans
>

I've pushed a pair of reviews derived from my impression of Jamie's response:

https://reviews.freebsd.org/D27297
https://reviews.freebsd.org/D27298

The former supports the latter, the latter effectively appropriately
'rebases' a process cpuset into the jail when it attaches rather than
the administrative action of replacing a process cpuset.

Thanks,

Kyle Evans
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cpuset/jail creation

Kyle Evans-3
In reply to this post by Kyle Evans-3
On Fri, Nov 20, 2020 at 3:09 PM James Gritton <[hidden email]> wrote:

>
> On 2020-11-19 20:56, Kyle Evans wrote:
> > The basic premise of this patch is that we shouldn't use the set
> > passed to cpuset_setproc directly if it has a different root than
> > whatever root the process has [*]. More importantly, the process
> > should not lose the identity of its base in the move. Currently, in
> > the scenario that I'm going through, the process and its single thread
> > goes from having its own cpuset 5 that was just created with a
> > different affinity to an anonymous set hanging off of the jail's
> > cpuset.
>
> It will always be the case when attaching to a jail, that the process'
> current cpuset base is different from the jails.
>
> Considering that the only processes that run in a jail are those that
> have attached, making sure all attached processes don't have the
> jail's cpuset base removes much of the jail cpuset's usefulness.
> I would expect that changing a jail cpuset's mask to apply to
> processes within that jail, but now it seems all processes in the
> jail are guaranteed *not* to be within the jail's cpuset.
>
> I'll admit that I hadn't considered affinities at all, only masks.
> It make sense to preserve the process' cpuset affinity.  I suppose
> theoretically that can be said of the jail's affinity as well, though
> a jail is less likely to have such a thing in its cpuset.
>
> Anyway, by the time I got around to replying, I see D27297 and D27298
> have shown up.  So I guess the conversation moves there.  And I'll
> have an actual look at the code before I muse anymore in a possibly
> irrelevant direction.
>

Heh, sorry. :-) With the latest patch, only those that deviated from
the jail they were in prior aren't using the new cpuset. If they used
the previous root set, they will simply accept the new root set.

Said differently, if you modified a process you can expect it to
continue being unique once attached. If a process was part of the
hivemind in its previous jail, it's part of the hivemind in its new
jail.

Thanks,

Kyle Evans
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-jail
To unsubscribe, send any mail to "[hidden email]"