unlinkfd

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

unlinkfd

Mariusz Zaborski
Hello,

Today I would like to propose a new syscall called unlinkfd(2) which came up
during a discussion with Ed Maste.

Currently in UNIX we can’t remove files safely. If we will try to do so we
always end up in a race condition. For example when we open a file, and check
it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
unlink could be a different one than the one we were fstating just a moment ago.

Another reason of implementing unlinkfd(2) came to us when we were trying
to sandbox some applications like: uudecode/b64decode or bspatch. It occured
to us that we don’t have a good way of removing single files. Of course we can
try to determine in which directory we are in, and then open this directory and
remove a single file.

It looks even more bizarre if we would think about a program which operates on
multiple files. If we would analyze a situation with two totally different
directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
a root directory or keeping as many directories as we are working on open.
All of that effort only to remove two files. This make it totally impractical!

I think that opening directories also presents some wider attack vector because
we are keeping a single descriptor to a directory only to remove one file.
Unfortunately this means that an attacker can remove all files in that directory.

I proposed this as well on the last Capsicum call. There was a suggestion that
instead of doing a single syscall maybe we should have a Casper service that
will allow us to remove files. Another idea was that we should perhaps redesign
programs to create some subdirs work on the subdirs and then remove all files in
this subdir. I don’t feel that creating a Casper service is a good idea because
we still have exactly the same issue of race condition. In my opinion creating
subdirs is also a problem for us.

First we would need to redesign some of our tools and I think we should
simplyfiy capsicumizition of the process instead of making it harder.

Secondly we can create a temporary subdirectory but what will remove it?
We are going back to having a fd to directory in which we just created a subdir.
Another way would be to have Casper service which would remove a directory but
with the risk of RC.

In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
is easy to implement. The only downside of this implementation is that we not
only need to provide a fd but also a path file. This is because inodes nor
vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
path, if they are exactly the same we remove a file. In the syscall we are using
a fd so there is no Ambient Authority because we are proving that we already
have access to that file. Thanks to that the syscall can be safely used with
Caspsicum. I have already discussed this with some people and they said
`Hey I already had that idea a while ago…` so let’s do something with that idea!
If you are intereted in patch you can find it here:
https://reviews.freebsd.org/D14567

Thanks,
--
Mariusz Zaborski
oshogbo//vx | http://oshogbo.vexillium.org
FreeBSD commiter | https://freebsd.org
Software developer | http://wheelsystems.com
If it's not broken, let's fix it till it is!!1

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Justin Cormack
I think it would make sense to have an unlinkfd() that unlinks the file from
everywhere, so it does not need a name to be specified. This might be
hard to implement.

For temporary files, I really like Linux memfd_create(2) that opens an anonymous
file without a name. This semantics is really useful. (Linux memfd also has
additional options for sealing the file fo make it immutable which are very
useful for safely passing files between processes.) Having a way to make
unnamed temporary files solves a lot of deletion issues as the file
never needs to
be unlinked.


On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]> wrote:

> Hello,
>
> Today I would like to propose a new syscall called unlinkfd(2) which came up
> during a discussion with Ed Maste.
>
> Currently in UNIX we can’t remove files safely. If we will try to do so we
> always end up in a race condition. For example when we open a file, and check
> it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
> unlink could be a different one than the one we were fstating just a moment ago.
>
> Another reason of implementing unlinkfd(2) came to us when we were trying
> to sandbox some applications like: uudecode/b64decode or bspatch. It occured
> to us that we don’t have a good way of removing single files. Of course we can
> try to determine in which directory we are in, and then open this directory and
> remove a single file.
>
> It looks even more bizarre if we would think about a program which operates on
> multiple files. If we would analyze a situation with two totally different
> directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
> a root directory or keeping as many directories as we are working on open.
> All of that effort only to remove two files. This make it totally impractical!
>
> I think that opening directories also presents some wider attack vector because
> we are keeping a single descriptor to a directory only to remove one file.
> Unfortunately this means that an attacker can remove all files in that directory.
>
> I proposed this as well on the last Capsicum call. There was a suggestion that
> instead of doing a single syscall maybe we should have a Casper service that
> will allow us to remove files. Another idea was that we should perhaps redesign
> programs to create some subdirs work on the subdirs and then remove all files in
> this subdir. I don’t feel that creating a Casper service is a good idea because
> we still have exactly the same issue of race condition. In my opinion creating
> subdirs is also a problem for us.
>
> First we would need to redesign some of our tools and I think we should
> simplyfiy capsicumizition of the process instead of making it harder.
>
> Secondly we can create a temporary subdirectory but what will remove it?
> We are going back to having a fd to directory in which we just created a subdir.
> Another way would be to have Casper service which would remove a directory but
> with the risk of RC.
>
> In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
> is easy to implement. The only downside of this implementation is that we not
> only need to provide a fd but also a path file. This is because inodes nor
> vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
> path, if they are exactly the same we remove a file. In the syscall we are using
> a fd so there is no Ambient Authority because we are proving that we already
> have access to that file. Thanks to that the syscall can be safely used with
> Caspsicum. I have already discussed this with some people and they said
> `Hey I already had that idea a while ago…` so let’s do something with that idea!
> If you are intereted in patch you can find it here:
> https://reviews.freebsd.org/D14567
>
> Thanks,
> --
> Mariusz Zaborski
> oshogbo//vx             | http://oshogbo.vexillium.org
> FreeBSD commiter        | https://freebsd.org
> Software developer      | http://wheelsystems.com
> If it's not broken, let's fix it till it is!!1
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Robert N. M. Watson
FWIW, this is part of why we introduced anonymous POSIX shared memory objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a SHM_ANON special name, which causes the creation of a swap-backed, mappable file-like object that can have I/O, memory mapping, etc, performed on it .. but never has any persistent state across reboots even in the event of a crash.

With Capsicum you can then refine a file descriptor to the otherwise writable object to be read-only for the purposes of delegation. There is not, however, a mechanism to "freeze" the state of the object causing other outstanding writable descriptors to become read-only -- certainly something could be added, but some care regarding VM semantics would be required -- in particular, so that faults could not be experienced as a result of an memory store performed before the "freeze" but issued to VFS only later.

I certainly have no objection to an unlinkat(2) system call -- it's unfortunate that a full suite of the at(2) APIs wasn't introduced in the first place. It would be worth checking that no one else (e.g., Solaris, Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API semantics for. I think I take the view that for truly anonymous objects, shm_open(2) without a name (or the Linux equiv) is the right thing -- and hence unlinkat(2) is for more conventional use cases where the final pathname element is known.

On directories: There, I find myself falling back on a Casper-like service, since GC'ing a single anonymous memory object is straightforward, but GC'ing a directory hierarchy is a more messy business.

Robert

> On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email]> wrote:
>
> I think it would make sense to have an unlinkfd() that unlinks the file from
> everywhere, so it does not need a name to be specified. This might be
> hard to implement.
>
> For temporary files, I really like Linux memfd_create(2) that opens an anonymous
> file without a name. This semantics is really useful. (Linux memfd also has
> additional options for sealing the file fo make it immutable which are very
> useful for safely passing files between processes.) Having a way to make
> unnamed temporary files solves a lot of deletion issues as the file
> never needs to
> be unlinked.
>
>
> On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]> wrote:
>> Hello,
>>
>> Today I would like to propose a new syscall called unlinkfd(2) which came up
>> during a discussion with Ed Maste.
>>
>> Currently in UNIX we can’t remove files safely. If we will try to do so we
>> always end up in a race condition. For example when we open a file, and check
>> it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
>> unlink could be a different one than the one we were fstating just a moment ago.
>>
>> Another reason of implementing unlinkfd(2) came to us when we were trying
>> to sandbox some applications like: uudecode/b64decode or bspatch. It occured
>> to us that we don’t have a good way of removing single files. Of course we can
>> try to determine in which directory we are in, and then open this directory and
>> remove a single file.
>>
>> It looks even more bizarre if we would think about a program which operates on
>> multiple files. If we would analyze a situation with two totally different
>> directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
>> a root directory or keeping as many directories as we are working on open.
>> All of that effort only to remove two files. This make it totally impractical!
>>
>> I think that opening directories also presents some wider attack vector because
>> we are keeping a single descriptor to a directory only to remove one file.
>> Unfortunately this means that an attacker can remove all files in that directory.
>>
>> I proposed this as well on the last Capsicum call. There was a suggestion that
>> instead of doing a single syscall maybe we should have a Casper service that
>> will allow us to remove files. Another idea was that we should perhaps redesign
>> programs to create some subdirs work on the subdirs and then remove all files in
>> this subdir. I don’t feel that creating a Casper service is a good idea because
>> we still have exactly the same issue of race condition. In my opinion creating
>> subdirs is also a problem for us.
>>
>> First we would need to redesign some of our tools and I think we should
>> simplyfiy capsicumizition of the process instead of making it harder.
>>
>> Secondly we can create a temporary subdirectory but what will remove it?
>> We are going back to having a fd to directory in which we just created a subdir.
>> Another way would be to have Casper service which would remove a directory but
>> with the risk of RC.
>>
>> In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
>> is easy to implement. The only downside of this implementation is that we not
>> only need to provide a fd but also a path file. This is because inodes nor
>> vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
>> path, if they are exactly the same we remove a file. In the syscall we are using
>> a fd so there is no Ambient Authority because we are proving that we already
>> have access to that file. Thanks to that the syscall can be safely used with
>> Caspsicum. I have already discussed this with some people and they said
>> `Hey I already had that idea a while ago…` so let’s do something with that idea!
>> If you are intereted in patch you can find it here:
>> https://reviews.freebsd.org/D14567
>>
>> Thanks,
>> --
>> Mariusz Zaborski
>> oshogbo//vx             | http://oshogbo.vexillium.org
>> FreeBSD commiter        | https://freebsd.org
>> Software developer      | http://wheelsystems.com
>> If it's not broken, let's fix it till it is!!1
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

David Gwynne
In reply to this post by Justin Cormack


> On 3 Mar 2018, at 7:53 pm, Justin Cormack <[hidden email]> wrote:
>
> I think it would make sense to have an unlinkfd() that unlinks the file from
> everywhere, so it does not need a name to be specified. This might be
> hard to implement.
>
> For temporary files, I really like Linux memfd_create(2) that opens an anonymous
> file without a name. This semantics is really useful. (Linux memfd also has
> additional options for sealing the file fo make it immutable which are very
> useful for safely passing files between processes.) Having a way to make
> unnamed temporary files solves a lot of deletion issues as the file
> never needs to
> be unlinked.

maybe you could get close enough to that with a new flag for open(2)/openat(2). eg, open("/backing/mount/point/randomname", O_CREAT|O_UNLINK);

>
>
> On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]> wrote:
>> Hello,
>>
>> Today I would like to propose a new syscall called unlinkfd(2) which came up
>> during a discussion with Ed Maste.
>>
>> Currently in UNIX we can’t remove files safely. If we will try to do so we
>> always end up in a race condition. For example when we open a file, and check
>> it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
>> unlink could be a different one than the one we were fstating just a moment ago.
>>
>> Another reason of implementing unlinkfd(2) came to us when we were trying
>> to sandbox some applications like: uudecode/b64decode or bspatch. It occured
>> to us that we don’t have a good way of removing single files. Of course we can
>> try to determine in which directory we are in, and then open this directory and
>> remove a single file.
>>
>> It looks even more bizarre if we would think about a program which operates on
>> multiple files. If we would analyze a situation with two totally different
>> directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
>> a root directory or keeping as many directories as we are working on open.
>> All of that effort only to remove two files. This make it totally impractical!
>>
>> I think that opening directories also presents some wider attack vector because
>> we are keeping a single descriptor to a directory only to remove one file.
>> Unfortunately this means that an attacker can remove all files in that directory.
>>
>> I proposed this as well on the last Capsicum call. There was a suggestion that
>> instead of doing a single syscall maybe we should have a Casper service that
>> will allow us to remove files. Another idea was that we should perhaps redesign
>> programs to create some subdirs work on the subdirs and then remove all files in
>> this subdir. I don’t feel that creating a Casper service is a good idea because
>> we still have exactly the same issue of race condition. In my opinion creating
>> subdirs is also a problem for us.
>>
>> First we would need to redesign some of our tools and I think we should
>> simplyfiy capsicumizition of the process instead of making it harder.
>>
>> Secondly we can create a temporary subdirectory but what will remove it?
>> We are going back to having a fd to directory in which we just created a subdir.
>> Another way would be to have Casper service which would remove a directory but
>> with the risk of RC.
>>
>> In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
>> is easy to implement. The only downside of this implementation is that we not
>> only need to provide a fd but also a path file. This is because inodes nor
>> vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
>> path, if they are exactly the same we remove a file. In the syscall we are using
>> a fd so there is no Ambient Authority because we are proving that we already
>> have access to that file. Thanks to that the syscall can be safely used with
>> Caspsicum. I have already discussed this with some people and they said
>> `Hey I already had that idea a while ago…` so let’s do something with that idea!
>> If you are intereted in patch you can find it here:
>> https://reviews.freebsd.org/D14567
>>
>> Thanks,
>> --
>> Mariusz Zaborski
>> oshogbo//vx             | http://oshogbo.vexillium.org
>> FreeBSD commiter        | https://freebsd.org
>> Software developer      | http://wheelsystems.com
>> If it's not broken, let's fix it till it is!!1
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

freebsd-hackers mailing list
In reply to this post by Mariusz Zaborski
On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]> wrote:

> Hello,
>
> Today I would like to propose a new syscall called unlinkfd(2) which came
> up
> during a discussion with Ed Maste.
>
> Currently in UNIX we can’t remove files safely. If we will try to do so we
> always end up in a race condition. For example when we open a file, and
> check
> it with fstat, etc. then we want to unlink(2) it… but the file we are
> trying to
> unlink could be a different one than the one we were fstating just a
> moment ago.
>
> Another reason of implementing unlinkfd(2) came to us when we were trying
> to sandbox some applications like: uudecode/b64decode or bspatch. It
> occured
> to us that we don’t have a good way of removing single files. Of course we
> can
> try to determine in which directory we are in, and then open this
> directory and
> remove a single file.
>
> It looks even more bizarre if we would think about a program which
> operates on
> multiple files. If we would analyze a situation with two totally different
> directories like `/tmp` and `/home/oshogbo` we would end up with pre
> opening
> a root directory or keeping as many directories as we are working on open.
> All of that effort only to remove two files. This make it totally
> impractical!
>
> I think that opening directories also presents some wider attack vector
> because
> we are keeping a single descriptor to a directory only to remove one file.
> Unfortunately this means that an attacker can remove all files in that
> directory.
>
> I proposed this as well on the last Capsicum call. There was a suggestion
> that
> instead of doing a single syscall maybe we should have a Casper service
> that
> will allow us to remove files. Another idea was that we should perhaps
> redesign
> programs to create some subdirs work on the subdirs and then remove all
> files in
> this subdir. I don’t feel that creating a Casper service is a good idea
> because
> we still have exactly the same issue of race condition. In my opinion
> creating
> subdirs is also a problem for us.
>
> First we would need to redesign some of our tools and I think we should
> simplyfiy capsicumizition of the process instead of making it harder.
>
> Secondly we can create a temporary subdirectory but what will remove it?
> We are going back to having a fd to directory in which we just created a
> subdir.
> Another way would be to have Casper service which would remove a directory
> but
> with the risk of RC.
>
> In conclusion, I think we need syscall like unlinkfd(2), which turn out
> taht it
> is easy to implement. The only downside of this implementation is that we
> not
> only need to provide a fd but also a path file. This is because inodes nor
> vnodes don’t contain filenames. We are comparing vnodes of the fd and the
> given
> path, if they are exactly the same we remove a file. In the syscall we are
> using
> a fd so there is no Ambient Authority because we are proving that we
> already
> have access to that file.


That seems incorrect. You are proving you have access to the inode, not the
directory entry. So, for example, I could create a link to a file I wanted
to remove, that I don't have permission to remove, then use this call to
unlink it.


> Thanks to that the syscall can be safely used with
> Caspsicum. I have already discussed this with some people and they said
> `Hey I already had that idea a while ago…` so let’s do something with that
> idea!
> If you are intereted in patch you can find it here:
> https://reviews.freebsd.org/D14567
>
> Thanks,
> --
> Mariusz Zaborski
> oshogbo//vx             | http://oshogbo.vexillium.org
> FreeBSD commiter        | https://freebsd.org
> Software developer      | http://wheelsystems.com
> If it's not broken, let's fix it till it is!!1
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Robert N. M. Watson
In reply to this post by Robert N. M. Watson
In general, in UNIX, "unlink" is a namespace operation relative to a directory, and not an operation on a file, so I wouldn't expect to have a system call that searches a directory looking for a matching file, but rather always a call that specifies the specific segment to remove (as there may well be more than one of them).

It seems to me like there are a few different use cases:

(1) Just want some temporary non-persistent file-like storage please. Here, swap-backed anonymous objects are probably generally preferable, although if they will be huge, perhaps a filesystem is a better place to back them.

(2) Want a temporary (non-persistent) hierarchal namespace full of file-like things. This need is not well served, as you need to not only create this within a current filesystem, but garbage collection of the results is not reliable in the presence of crashes/etc.

(3) Want capability-based access to a persistent hierarchal namespace full of files. This is well served by the current at(2) system calls along with filesystems, although there are API gaps (e.g., a lack of unlinkat(2) in FreeBSD).

Because of the complexity of (2), a Casper service is likely the way to go. We should fill the API gaps on (3) through new POSIX-like at(2). For (1), the real issue is if the current swap-backed APIs are insufficient, in which case a Casper service might be the way to go.

Robert

> On 3 Mar 2018, at 11:31, Alex Richardson <[hidden email]> wrote:
>
> Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat <https://linux.die.net/man/2/unlinkat>) but it doesn't seem to have a flag that lets you unlink the fd itself.
> Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I haven't tried whether that works.
> It also has a AT_REMOVEDIR flag to make it function as rmdirat().
>
> Alex
>
> On 3 March 2018 at 10:41, Robert N. M. Watson <[hidden email] <mailto:[hidden email]>> wrote:
> FWIW, this is part of why we introduced anonymous POSIX shared memory objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a SHM_ANON special name, which causes the creation of a swap-backed, mappable file-like object that can have I/O, memory mapping, etc, performed on it .. but never has any persistent state across reboots even in the event of a crash.
>
> With Capsicum you can then refine a file descriptor to the otherwise writable object to be read-only for the purposes of delegation. There is not, however, a mechanism to "freeze" the state of the object causing other outstanding writable descriptors to become read-only -- certainly something could be added, but some care regarding VM semantics would be required -- in particular, so that faults could not be experienced as a result of an memory store performed before the "freeze" but issued to VFS only later.
>
> I certainly have no objection to an unlinkat(2) system call -- it's unfortunate that a full suite of the at(2) APIs wasn't introduced in the first place. It would be worth checking that no one else (e.g., Solaris, Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API semantics for. I think I take the view that for truly anonymous objects, shm_open(2) without a name (or the Linux equiv) is the right thing -- and hence unlinkat(2) is for more conventional use cases where the final pathname element is known.
>
> On directories: There, I find myself falling back on a Casper-like service, since GC'ing a single anonymous memory object is straightforward, but GC'ing a directory hierarchy is a more messy business.
>
> Robert
>
> > On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email] <mailto:[hidden email]>> wrote:
> >
> > I think it would make sense to have an unlinkfd() that unlinks the file from
> > everywhere, so it does not need a name to be specified. This might be
> > hard to implement.
> >
> > For temporary files, I really like Linux memfd_create(2) that opens an anonymous
> > file without a name. This semantics is really useful. (Linux memfd also has
> > additional options for sealing the file fo make it immutable which are very
> > useful for safely passing files between processes.) Having a way to make
> > unnamed temporary files solves a lot of deletion issues as the file
> > never needs to
> > be unlinked.
> >
> >
> > On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email] <mailto:[hidden email]>> wrote:
> >> Hello,
> >>
> >> Today I would like to propose a new syscall called unlinkfd(2) which came up
> >> during a discussion with Ed Maste.
> >>
> >> Currently in UNIX we can’t remove files safely. If we will try to do so we
> >> always end up in a race condition. For example when we open a file, and check
> >> it with fstat, etc. then we want to unlink(2) it… but the file we are trying to
> >> unlink could be a different one than the one we were fstating just a moment ago.
> >>
> >> Another reason of implementing unlinkfd(2) came to us when we were trying
> >> to sandbox some applications like: uudecode/b64decode or bspatch. It occured
> >> to us that we don’t have a good way of removing single files. Of course we can
> >> try to determine in which directory we are in, and then open this directory and
> >> remove a single file.
> >>
> >> It looks even more bizarre if we would think about a program which operates on
> >> multiple files. If we would analyze a situation with two totally different
> >> directories like `/tmp` and `/home/oshogbo` we would end up with pre opening
> >> a root directory or keeping as many directories as we are working on open.
> >> All of that effort only to remove two files. This make it totally impractical!
> >>
> >> I think that opening directories also presents some wider attack vector because
> >> we are keeping a single descriptor to a directory only to remove one file.
> >> Unfortunately this means that an attacker can remove all files in that directory.
> >>
> >> I proposed this as well on the last Capsicum call. There was a suggestion that
> >> instead of doing a single syscall maybe we should have a Casper service that
> >> will allow us to remove files. Another idea was that we should perhaps redesign
> >> programs to create some subdirs work on the subdirs and then remove all files in
> >> this subdir. I don’t feel that creating a Casper service is a good idea because
> >> we still have exactly the same issue of race condition. In my opinion creating
> >> subdirs is also a problem for us.
> >>
> >> First we would need to redesign some of our tools and I think we should
> >> simplyfiy capsicumizition of the process instead of making it harder.
> >>
> >> Secondly we can create a temporary subdirectory but what will remove it?
> >> We are going back to having a fd to directory in which we just created a subdir.
> >> Another way would be to have Casper service which would remove a directory but
> >> with the risk of RC.
> >>
> >> In conclusion, I think we need syscall like unlinkfd(2), which turn out taht it
> >> is easy to implement. The only downside of this implementation is that we not
> >> only need to provide a fd but also a path file. This is because inodes nor
> >> vnodes don’t contain filenames. We are comparing vnodes of the fd and the given
> >> path, if they are exactly the same we remove a file. In the syscall we are using
> >> a fd so there is no Ambient Authority because we are proving that we already
> >> have access to that file. Thanks to that the syscall can be safely used with
> >> Caspsicum. I have already discussed this with some people and they said
> >> `Hey I already had that idea a while ago…` so let’s do something with that idea!
> >> If you are intereted in patch you can find it here:
> >> https://reviews.freebsd.org/D14567 <https://reviews.freebsd.org/D14567>
> >>
> >> Thanks,
> >> --
> >> Mariusz Zaborski
> >> oshogbo//vx             | http://oshogbo.vexillium.org <http://oshogbo.vexillium.org/>
> >> FreeBSD commiter        | https://freebsd.org <https://freebsd.org/>
> >> Software developer      | http://wheelsystems.com <http://wheelsystems.com/>
> >> If it's not broken, let's fix it till it is!!1
> >
>
>
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Alexander Richardson
In reply to this post by Robert N. M. Watson
Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat)
but it doesn't seem to have a flag that lets you unlink the fd itself.
Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I
haven't tried whether that works.
It also has a AT_REMOVEDIR flag to make it function as rmdirat().

On 3 March 2018 at 10:41, Robert N. M. Watson <[hidden email]>
wrote:

> FWIW, this is part of why we introduced anonymous POSIX shared memory
> objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a
> SHM_ANON special name, which causes the creation of a swap-backed, mappable
> file-like object that can have I/O, memory mapping, etc, performed on it ..
> but never has any persistent state across reboots even in the event of a
> crash.
>
> With Capsicum you can then refine a file descriptor to the otherwise
> writable object to be read-only for the purposes of delegation. There is
> not, however, a mechanism to "freeze" the state of the object causing other
> outstanding writable descriptors to become read-only -- certainly something
> could be added, but some care regarding VM semantics would be required --
> in particular, so that faults could not be experienced as a result of an
> memory store performed before the "freeze" but issued to VFS only later.
>
> I certainly have no objection to an unlinkat(2) system call -- it's
> unfortunate that a full suite of the at(2) APIs wasn't introduced in the
> first place. It would be worth checking that no one else (e.g., Solaris,
> Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API
> semantics for. I think I take the view that for truly anonymous objects,
> shm_open(2) without a name (or the Linux equiv) is the right thing -- and
> hence unlinkat(2) is for more conventional use cases where the final
> pathname element is known.
>
> On directories: There, I find myself falling back on a Casper-like
> service, since GC'ing a single anonymous memory object is straightforward,
> but GC'ing a directory hierarchy is a more messy business.
>
> Robert
>
> > On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email]>
> wrote:
> >
> > I think it would make sense to have an unlinkfd() that unlinks the file
> from
> > everywhere, so it does not need a name to be specified. This might be
> > hard to implement.
> >
> > For temporary files, I really like Linux memfd_create(2) that opens an
> anonymous
> > file without a name. This semantics is really useful. (Linux memfd also
> has
> > additional options for sealing the file fo make it immutable which are
> very
> > useful for safely passing files between processes.) Having a way to make
> > unnamed temporary files solves a lot of deletion issues as the file
> > never needs to
> > be unlinked.
> >
> >
> > On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]> wrote:
> >> Hello,
> >>
> >> Today I would like to propose a new syscall called unlinkfd(2) which
> came up
> >> during a discussion with Ed Maste.
> >>
> >> Currently in UNIX we can’t remove files safely. If we will try to do so
> we
> >> always end up in a race condition. For example when we open a file, and
> check
> >> it with fstat, etc. then we want to unlink(2) it… but the file we are
> trying to
> >> unlink could be a different one than the one we were fstating just a
> moment ago.
> >>
> >> Another reason of implementing unlinkfd(2) came to us when we were
> trying
> >> to sandbox some applications like: uudecode/b64decode or bspatch. It
> occured
> >> to us that we don’t have a good way of removing single files. Of course
> we can
> >> try to determine in which directory we are in, and then open this
> directory and
> >> remove a single file.
> >>
> >> It looks even more bizarre if we would think about a program which
> operates on
> >> multiple files. If we would analyze a situation with two totally
> different
> >> directories like `/tmp` and `/home/oshogbo` we would end up with pre
> opening
> >> a root directory or keeping as many directories as we are working on
> open.
> >> All of that effort only to remove two files. This make it totally
> impractical!
> >>
> >> I think that opening directories also presents some wider attack vector
> because
> >> we are keeping a single descriptor to a directory only to remove one
> file.
> >> Unfortunately this means that an attacker can remove all files in that
> directory.
> >>
> >> I proposed this as well on the last Capsicum call. There was a
> suggestion that
> >> instead of doing a single syscall maybe we should have a Casper service
> that
> >> will allow us to remove files. Another idea was that we should perhaps
> redesign
> >> programs to create some subdirs work on the subdirs and then remove all
> files in
> >> this subdir. I don’t feel that creating a Casper service is a good idea
> because
> >> we still have exactly the same issue of race condition. In my opinion
> creating
> >> subdirs is also a problem for us.
> >>
> >> First we would need to redesign some of our tools and I think we should
> >> simplyfiy capsicumizition of the process instead of making it harder.
> >>
> >> Secondly we can create a temporary subdirectory but what will remove it?
> >> We are going back to having a fd to directory in which we just created
> a subdir.
> >> Another way would be to have Casper service which would remove a
> directory but
> >> with the risk of RC.
> >>
> >> In conclusion, I think we need syscall like unlinkfd(2), which turn out
> taht it
> >> is easy to implement. The only downside of this implementation is that
> we not
> >> only need to provide a fd but also a path file. This is because inodes
> nor
> >> vnodes don’t contain filenames. We are comparing vnodes of the fd and
> the given
> >> path, if they are exactly the same we remove a file. In the syscall we
> are using
> >> a fd so there is no Ambient Authority because we are proving that we
> already
> >> have access to that file. Thanks to that the syscall can be safely used
> with
> >> Caspsicum. I have already discussed this with some people and they said
> >> `Hey I already had that idea a while ago…` so let’s do something with
> that idea!
> >> If you are intereted in patch you can find it here:
> >> https://reviews.freebsd.org/D14567
> >>
> >> Thanks,
> >> --
> >> Mariusz Zaborski
> >> oshogbo//vx             | http://oshogbo.vexillium.org
> >> FreeBSD commiter        | https://freebsd.org
> >> Software developer      | http://wheelsystems.com
> >> If it's not broken, let's fix it till it is!!1
> >
>
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Alan Somers-2
In fact, FreeBSD has that same unlinkat(2) system call.  But it doesn't
solve Mariusz's problem.  He's concerned about race conditions.  With
either unlink(2) or unlinkat(2), there's no way to ensure that the
directory entry you remove is for the file you think it is.  Because after
reading/writing a file and before unlinking it, some other processes
could've unlinked it and created a new one with the same name.  It's this
race condition that Mariuz seeks to solve with unlinkfd.
-Alan

On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson <
[hidden email]> wrote:

> Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat)
> but it doesn't seem to have a flag that lets you unlink the fd itself.
> Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I
> haven't tried whether that works.
> It also has a AT_REMOVEDIR flag to make it function as rmdirat().
>
> On 3 March 2018 at 10:41, Robert N. M. Watson <[hidden email]>
> wrote:
>
> > FWIW, this is part of why we introduced anonymous POSIX shared memory
> > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a
> > SHM_ANON special name, which causes the creation of a swap-backed,
> mappable
> > file-like object that can have I/O, memory mapping, etc, performed on it
> ..
> > but never has any persistent state across reboots even in the event of a
> > crash.
> >
> > With Capsicum you can then refine a file descriptor to the otherwise
> > writable object to be read-only for the purposes of delegation. There is
> > not, however, a mechanism to "freeze" the state of the object causing
> other
> > outstanding writable descriptors to become read-only -- certainly
> something
> > could be added, but some care regarding VM semantics would be required --
> > in particular, so that faults could not be experienced as a result of an
> > memory store performed before the "freeze" but issued to VFS only later.
> >
> > I certainly have no objection to an unlinkat(2) system call -- it's
> > unfortunate that a full suite of the at(2) APIs wasn't introduced in the
> > first place. It would be worth checking that no one else (e.g., Solaris,
> > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match
> API
> > semantics for. I think I take the view that for truly anonymous objects,
> > shm_open(2) without a name (or the Linux equiv) is the right thing -- and
> > hence unlinkat(2) is for more conventional use cases where the final
> > pathname element is known.
> >
> > On directories: There, I find myself falling back on a Casper-like
> > service, since GC'ing a single anonymous memory object is
> straightforward,
> > but GC'ing a directory hierarchy is a more messy business.
> >
> > Robert
> >
> > > On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email]>
> > wrote:
> > >
> > > I think it would make sense to have an unlinkfd() that unlinks the file
> > from
> > > everywhere, so it does not need a name to be specified. This might be
> > > hard to implement.
> > >
> > > For temporary files, I really like Linux memfd_create(2) that opens an
> > anonymous
> > > file without a name. This semantics is really useful. (Linux memfd also
> > has
> > > additional options for sealing the file fo make it immutable which are
> > very
> > > useful for safely passing files between processes.) Having a way to
> make
> > > unnamed temporary files solves a lot of deletion issues as the file
> > > never needs to
> > > be unlinked.
> > >
> > >
> > > On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email]>
> wrote:
> > >> Hello,
> > >>
> > >> Today I would like to propose a new syscall called unlinkfd(2) which
> > came up
> > >> during a discussion with Ed Maste.
> > >>
> > >> Currently in UNIX we can’t remove files safely. If we will try to do
> so
> > we
> > >> always end up in a race condition. For example when we open a file,
> and
> > check
> > >> it with fstat, etc. then we want to unlink(2) it… but the file we are
> > trying to
> > >> unlink could be a different one than the one we were fstating just a
> > moment ago.
> > >>
> > >> Another reason of implementing unlinkfd(2) came to us when we were
> > trying
> > >> to sandbox some applications like: uudecode/b64decode or bspatch. It
> > occured
> > >> to us that we don’t have a good way of removing single files. Of
> course
> > we can
> > >> try to determine in which directory we are in, and then open this
> > directory and
> > >> remove a single file.
> > >>
> > >> It looks even more bizarre if we would think about a program which
> > operates on
> > >> multiple files. If we would analyze a situation with two totally
> > different
> > >> directories like `/tmp` and `/home/oshogbo` we would end up with pre
> > opening
> > >> a root directory or keeping as many directories as we are working on
> > open.
> > >> All of that effort only to remove two files. This make it totally
> > impractical!
> > >>
> > >> I think that opening directories also presents some wider attack
> vector
> > because
> > >> we are keeping a single descriptor to a directory only to remove one
> > file.
> > >> Unfortunately this means that an attacker can remove all files in that
> > directory.
> > >>
> > >> I proposed this as well on the last Capsicum call. There was a
> > suggestion that
> > >> instead of doing a single syscall maybe we should have a Casper
> service
> > that
> > >> will allow us to remove files. Another idea was that we should perhaps
> > redesign
> > >> programs to create some subdirs work on the subdirs and then remove
> all
> > files in
> > >> this subdir. I don’t feel that creating a Casper service is a good
> idea
> > because
> > >> we still have exactly the same issue of race condition. In my opinion
> > creating
> > >> subdirs is also a problem for us.
> > >>
> > >> First we would need to redesign some of our tools and I think we
> should
> > >> simplyfiy capsicumizition of the process instead of making it harder.
> > >>
> > >> Secondly we can create a temporary subdirectory but what will remove
> it?
> > >> We are going back to having a fd to directory in which we just created
> > a subdir.
> > >> Another way would be to have Casper service which would remove a
> > directory but
> > >> with the risk of RC.
> > >>
> > >> In conclusion, I think we need syscall like unlinkfd(2), which turn
> out
> > taht it
> > >> is easy to implement. The only downside of this implementation is that
> > we not
> > >> only need to provide a fd but also a path file. This is because inodes
> > nor
> > >> vnodes don’t contain filenames. We are comparing vnodes of the fd and
> > the given
> > >> path, if they are exactly the same we remove a file. In the syscall we
> > are using
> > >> a fd so there is no Ambient Authority because we are proving that we
> > already
> > >> have access to that file. Thanks to that the syscall can be safely
> used
> > with
> > >> Caspsicum. I have already discussed this with some people and they
> said
> > >> `Hey I already had that idea a while ago…` so let’s do something with
> > that idea!
> > >> If you are intereted in patch you can find it here:
> > >> https://reviews.freebsd.org/D14567
> > >>
> > >> Thanks,
> > >> --
> > >> Mariusz Zaborski
> > >> oshogbo//vx             | http://oshogbo.vexillium.org
> > >> FreeBSD commiter        | https://freebsd.org
> > >> Software developer      | http://wheelsystems.com
> > >> If it's not broken, let's fix it till it is!!1
> > >
> >
> >
> >
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Konstantin Belousov-3
In reply to this post by Robert N. M. Watson
On Sat, Mar 03, 2018 at 12:16:34PM +0000, Robert N. M. Watson wrote:
> (3) Want capability-based access to a persistent hierarchal namespace
> full of files. This is well served by the current at(2) system calls
> along with filesystems, although there are API gaps (e.g., a lack of
> unlinkat(2) in FreeBSD).

Why do you claim this ?  We have unlinkat(2).

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Robert N. M. Watson
In reply to this post by Alan Somers-2
New _check() variants of the unlinkat(2) and rmdirat(2) system calls might do the trick -- e.g.,

        int unlinkat_check(dirfd, name, checkfd);
        int rmdirat_check(dirfd, name, checkfd);

The calls would succeed only if 'name' refers to the filesystem object passed via checkfd. This would retain UNIX-style directory behaviour but allows an atomic check that the object is as expected.

Of course, what you do about it if it turns out the check fails is another question... Better not to have a name at all, hence shm_open(SHM_ANON, ...) -- although just for file objects, and not directory hierarchies.

Robert

> On 3 Mar 2018, at 15:29, Alan Somers <[hidden email]> wrote:
>
> In fact, FreeBSD has that same unlinkat(2) system call.  But it doesn't solve Mariusz's problem.  He's concerned about race conditions.  With either unlink(2) or unlinkat(2), there's no way to ensure that the directory entry you remove is for the file you think it is.  Because after reading/writing a file and before unlinking it, some other processes could've unlinked it and created a new one with the same name.  It's this race condition that Mariuz seeks to solve with unlinkfd.
> -Alan
>
> On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson <[hidden email] <mailto:[hidden email]>> wrote:
> Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat <https://linux.die.net/man/2/unlinkat>)
> but it doesn't seem to have a flag that lets you unlink the fd itself.
> Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I
> haven't tried whether that works.
> It also has a AT_REMOVEDIR flag to make it function as rmdirat().
>
> On 3 March 2018 at 10:41, Robert N. M. Watson <[hidden email] <mailto:[hidden email]>>
> wrote:
>
> > FWIW, this is part of why we introduced anonymous POSIX shared memory
> > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a
> > SHM_ANON special name, which causes the creation of a swap-backed, mappable
> > file-like object that can have I/O, memory mapping, etc, performed on it ..
> > but never has any persistent state across reboots even in the event of a
> > crash.
> >
> > With Capsicum you can then refine a file descriptor to the otherwise
> > writable object to be read-only for the purposes of delegation. There is
> > not, however, a mechanism to "freeze" the state of the object causing other
> > outstanding writable descriptors to become read-only -- certainly something
> > could be added, but some care regarding VM semantics would be required --
> > in particular, so that faults could not be experienced as a result of an
> > memory store performed before the "freeze" but issued to VFS only later.
> >
> > I certainly have no objection to an unlinkat(2) system call -- it's
> > unfortunate that a full suite of the at(2) APIs wasn't introduced in the
> > first place. It would be worth checking that no one else (e.g., Solaris,
> > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API
> > semantics for. I think I take the view that for truly anonymous objects,
> > shm_open(2) without a name (or the Linux equiv) is the right thing -- and
> > hence unlinkat(2) is for more conventional use cases where the final
> > pathname element is known.
> >
> > On directories: There, I find myself falling back on a Casper-like
> > service, since GC'ing a single anonymous memory object is straightforward,
> > but GC'ing a directory hierarchy is a more messy business.
> >
> > Robert
> >
> > > On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email] <mailto:[hidden email]>>
> > wrote:
> > >
> > > I think it would make sense to have an unlinkfd() that unlinks the file
> > from
> > > everywhere, so it does not need a name to be specified. This might be
> > > hard to implement.
> > >
> > > For temporary files, I really like Linux memfd_create(2) that opens an
> > anonymous
> > > file without a name. This semantics is really useful. (Linux memfd also
> > has
> > > additional options for sealing the file fo make it immutable which are
> > very
> > > useful for safely passing files between processes.) Having a way to make
> > > unnamed temporary files solves a lot of deletion issues as the file
> > > never needs to
> > > be unlinked.
> > >
> > >
> > > On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email] <mailto:[hidden email]>> wrote:
> > >> Hello,
> > >>
> > >> Today I would like to propose a new syscall called unlinkfd(2) which
> > came up
> > >> during a discussion with Ed Maste.
> > >>
> > >> Currently in UNIX we can’t remove files safely. If we will try to do so
> > we
> > >> always end up in a race condition. For example when we open a file, and
> > check
> > >> it with fstat, etc. then we want to unlink(2) it… but the file we are
> > trying to
> > >> unlink could be a different one than the one we were fstating just a
> > moment ago.
> > >>
> > >> Another reason of implementing unlinkfd(2) came to us when we were
> > trying
> > >> to sandbox some applications like: uudecode/b64decode or bspatch. It
> > occured
> > >> to us that we don’t have a good way of removing single files. Of course
> > we can
> > >> try to determine in which directory we are in, and then open this
> > directory and
> > >> remove a single file.
> > >>
> > >> It looks even more bizarre if we would think about a program which
> > operates on
> > >> multiple files. If we would analyze a situation with two totally
> > different
> > >> directories like `/tmp` and `/home/oshogbo` we would end up with pre
> > opening
> > >> a root directory or keeping as many directories as we are working on
> > open.
> > >> All of that effort only to remove two files. This make it totally
> > impractical!
> > >>
> > >> I think that opening directories also presents some wider attack vector
> > because
> > >> we are keeping a single descriptor to a directory only to remove one
> > file.
> > >> Unfortunately this means that an attacker can remove all files in that
> > directory.
> > >>
> > >> I proposed this as well on the last Capsicum call. There was a
> > suggestion that
> > >> instead of doing a single syscall maybe we should have a Casper service
> > that
> > >> will allow us to remove files. Another idea was that we should perhaps
> > redesign
> > >> programs to create some subdirs work on the subdirs and then remove all
> > files in
> > >> this subdir. I don’t feel that creating a Casper service is a good idea
> > because
> > >> we still have exactly the same issue of race condition. In my opinion
> > creating
> > >> subdirs is also a problem for us.
> > >>
> > >> First we would need to redesign some of our tools and I think we should
> > >> simplyfiy capsicumizition of the process instead of making it harder.
> > >>
> > >> Secondly we can create a temporary subdirectory but what will remove it?
> > >> We are going back to having a fd to directory in which we just created
> > a subdir.
> > >> Another way would be to have Casper service which would remove a
> > directory but
> > >> with the risk of RC.
> > >>
> > >> In conclusion, I think we need syscall like unlinkfd(2), which turn out
> > taht it
> > >> is easy to implement. The only downside of this implementation is that
> > we not
> > >> only need to provide a fd but also a path file. This is because inodes
> > nor
> > >> vnodes don’t contain filenames. We are comparing vnodes of the fd and
> > the given
> > >> path, if they are exactly the same we remove a file. In the syscall we
> > are using
> > >> a fd so there is no Ambient Authority because we are proving that we
> > already
> > >> have access to that file. Thanks to that the syscall can be safely used
> > with
> > >> Caspsicum. I have already discussed this with some people and they said
> > >> `Hey I already had that idea a while ago…` so let’s do something with
> > that idea!
> > >> If you are intereted in patch you can find it here:
> > >> https://reviews.freebsd.org/D14567 <https://reviews.freebsd.org/D14567>
> > >>
> > >> Thanks,
> > >> --
> > >> Mariusz Zaborski
> > >> oshogbo//vx             | http://oshogbo.vexillium.org <http://oshogbo.vexillium.org/>
> > >> FreeBSD commiter        | https://freebsd.org <https://freebsd.org/>
> > >> Software developer      | http://wheelsystems.com <http://wheelsystems.com/>
> > >> If it's not broken, let's fix it till it is!!1
> > >
> >
> >
> >
> _______________________________________________
> [hidden email] <mailto:[hidden email]> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>
> To unsubscribe, send any mail to "[hidden email] <mailto:[hidden email]>"
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Mariusz Zaborski
I feel that there is two different things we can think about:
- What we would implement in the capability system if we would build it from
  scratch. Here shm_open(2) and SHM_ANON can be solution to our problems.
- On the other hand we have a working operating system and we can't expect that
  all our programs that are already implemented will fit to those assumptions
  nor ask developers to rewrite many existing programs.

On Sat, Mar 03, 2018 at 05:16:38PM +0000, Robert N. M. Watson wrote:
> New _check() variants of the unlinkat(2) and rmdirat(2) system calls might do the trick -- e.g.,
>
> int unlinkat_check(dirfd, name, checkfd);
> int rmdirat_check(dirfd, name, checkfd);
>
Similar API was proposed on the review. This solves the issue with RC.
Unfortunately it's not solve the problem with guessing in which directory we
will work in.

When I think about sandboxing for example rm(1) we would need to preopen root
directory, or preopen all directories we will work in. Both solution just don't
feel right.

I'm not saying that the unlinkfd is the right and only solution - I'm just trying
to solve problem we identified while sandboxing apps. I'm glad we started this
discussion I hope we will work some compromise between all presented challenges.

Thanks,
--
Mariusz Zaborski
oshogbo//vx | http://oshogbo.vexillium.org
FreeBSD commiter | https://freebsd.org
Software developer | http://wheelsystems.com
If it's not broken, let's fix it till it is!!1

> The calls would succeed only if 'name' refers to the filesystem object passed via checkfd. This would retain UNIX-style directory behaviour but allows an atomic check that the object is as expected.
>
> Of course, what you do about it if it turns out the check fails is another question... Better not to have a name at all, hence shm_open(SHM_ANON, ...) -- although just for file objects, and not directory hierarchies.
>
> Robert
>
> > On 3 Mar 2018, at 15:29, Alan Somers <[hidden email]> wrote:
> >
> > In fact, FreeBSD has that same unlinkat(2) system call.  But it doesn't solve Mariusz's problem.  He's concerned about race conditions.  With either unlink(2) or unlinkat(2), there's no way to ensure that the directory entry you remove is for the file you think it is.  Because after reading/writing a file and before unlinking it, some other processes could've unlinked it and created a new one with the same name.  It's this race condition that Mariuz seeks to solve with unlinkfd.
> > -Alan
> >
> > On Sat, Mar 3, 2018 at 5:46 AM, Alexander Richardson <[hidden email] <mailto:[hidden email]>> wrote:
> > Linux has a unlinkat() system call (https://linux.die.net/man/2/unlinkat <https://linux.die.net/man/2/unlinkat>)
> > but it doesn't seem to have a flag that lets you unlink the fd itself.
> > Possibly pathname == NULL and AT_EMPTY_PATH could mean unlink the fd but I
> > haven't tried whether that works.
> > It also has a AT_REMOVEDIR flag to make it function as rmdirat().
> >
> > On 3 March 2018 at 10:41, Robert N. M. Watson <[hidden email] <mailto:[hidden email]>>
> > wrote:
> >
> > > FWIW, this is part of why we introduced anonymous POSIX shared memory
> > > objects with Capsicum in FreeBSD -- we allow shm_open(2) to be passed a
> > > SHM_ANON special name, which causes the creation of a swap-backed, mappable
> > > file-like object that can have I/O, memory mapping, etc, performed on it ..
> > > but never has any persistent state across reboots even in the event of a
> > > crash.
> > >
> > > With Capsicum you can then refine a file descriptor to the otherwise
> > > writable object to be read-only for the purposes of delegation. There is
> > > not, however, a mechanism to "freeze" the state of the object causing other
> > > outstanding writable descriptors to become read-only -- certainly something
> > > could be added, but some care regarding VM semantics would be required --
> > > in particular, so that faults could not be experienced as a result of an
> > > memory store performed before the "freeze" but issued to VFS only later.
> > >
> > > I certainly have no objection to an unlinkat(2) system call -- it's
> > > unfortunate that a full suite of the at(2) APIs wasn't introduced in the
> > > first place. It would be worth checking that no one else (e.g., Solaris,
> > > Mac OS X, Linux) hasn't already added an unlinkat(2) that we can match API
> > > semantics for. I think I take the view that for truly anonymous objects,
> > > shm_open(2) without a name (or the Linux equiv) is the right thing -- and
> > > hence unlinkat(2) is for more conventional use cases where the final
> > > pathname element is known.
> > >
> > > On directories: There, I find myself falling back on a Casper-like
> > > service, since GC'ing a single anonymous memory object is straightforward,
> > > but GC'ing a directory hierarchy is a more messy business.
> > >
> > > Robert
> > >
> > > > On 3 Mar 2018, at 09:53, Justin Cormack <[hidden email] <mailto:[hidden email]>>
> > > wrote:
> > > >
> > > > I think it would make sense to have an unlinkfd() that unlinks the file
> > > from
> > > > everywhere, so it does not need a name to be specified. This might be
> > > > hard to implement.
> > > >
> > > > For temporary files, I really like Linux memfd_create(2) that opens an
> > > anonymous
> > > > file without a name. This semantics is really useful. (Linux memfd also
> > > has
> > > > additional options for sealing the file fo make it immutable which are
> > > very
> > > > useful for safely passing files between processes.) Having a way to make
> > > > unnamed temporary files solves a lot of deletion issues as the file
> > > > never needs to
> > > > be unlinked.
> > > >
> > > >
> > > > On 2 March 2018 at 18:35, Mariusz Zaborski <[hidden email] <mailto:[hidden email]>> wrote:
> > > >> Hello,
> > > >>
> > > >> Today I would like to propose a new syscall called unlinkfd(2) which
> > > came up
> > > >> during a discussion with Ed Maste.
> > > >>
> > > >> Currently in UNIX we can’t remove files safely. If we will try to do so
> > > we
> > > >> always end up in a race condition. For example when we open a file, and
> > > check
> > > >> it with fstat, etc. then we want to unlink(2) it… but the file we are
> > > trying to
> > > >> unlink could be a different one than the one we were fstating just a
> > > moment ago.
> > > >>
> > > >> Another reason of implementing unlinkfd(2) came to us when we were
> > > trying
> > > >> to sandbox some applications like: uudecode/b64decode or bspatch. It
> > > occured
> > > >> to us that we don’t have a good way of removing single files. Of course
> > > we can
> > > >> try to determine in which directory we are in, and then open this
> > > directory and
> > > >> remove a single file.
> > > >>
> > > >> It looks even more bizarre if we would think about a program which
> > > operates on
> > > >> multiple files. If we would analyze a situation with two totally
> > > different
> > > >> directories like `/tmp` and `/home/oshogbo` we would end up with pre
> > > opening
> > > >> a root directory or keeping as many directories as we are working on
> > > open.
> > > >> All of that effort only to remove two files. This make it totally
> > > impractical!
> > > >>
> > > >> I think that opening directories also presents some wider attack vector
> > > because
> > > >> we are keeping a single descriptor to a directory only to remove one
> > > file.
> > > >> Unfortunately this means that an attacker can remove all files in that
> > > directory.
> > > >>
> > > >> I proposed this as well on the last Capsicum call. There was a
> > > suggestion that
> > > >> instead of doing a single syscall maybe we should have a Casper service
> > > that
> > > >> will allow us to remove files. Another idea was that we should perhaps
> > > redesign
> > > >> programs to create some subdirs work on the subdirs and then remove all
> > > files in
> > > >> this subdir. I don’t feel that creating a Casper service is a good idea
> > > because
> > > >> we still have exactly the same issue of race condition. In my opinion
> > > creating
> > > >> subdirs is also a problem for us.
> > > >>
> > > >> First we would need to redesign some of our tools and I think we should
> > > >> simplyfiy capsicumizition of the process instead of making it harder.
> > > >>
> > > >> Secondly we can create a temporary subdirectory but what will remove it?
> > > >> We are going back to having a fd to directory in which we just created
> > > a subdir.
> > > >> Another way would be to have Casper service which would remove a
> > > directory but
> > > >> with the risk of RC.
> > > >>
> > > >> In conclusion, I think we need syscall like unlinkfd(2), which turn out
> > > taht it
> > > >> is easy to implement. The only downside of this implementation is that
> > > we not
> > > >> only need to provide a fd but also a path file. This is because inodes
> > > nor
> > > >> vnodes don’t contain filenames. We are comparing vnodes of the fd and
> > > the given
> > > >> path, if they are exactly the same we remove a file. In the syscall we
> > > are using
> > > >> a fd so there is no Ambient Authority because we are proving that we
> > > already
> > > >> have access to that file. Thanks to that the syscall can be safely used
> > > with
> > > >> Caspsicum. I have already discussed this with some people and they said
> > > >> `Hey I already had that idea a while ago…` so let’s do something with
> > > that idea!
> > > >> If you are intereted in patch you can find it here:
> > > >> https://reviews.freebsd.org/D14567 <https://reviews.freebsd.org/D14567>
> > > >>
> > > >> Thanks,
> > > >> --
> > > >> Mariusz Zaborski
> > > >> oshogbo//vx             | http://oshogbo.vexillium.org <http://oshogbo.vexillium.org/>
> > > >> FreeBSD commiter        | https://freebsd.org <https://freebsd.org/>
> > > >> Software developer      | http://wheelsystems.com <http://wheelsystems.com/>
> > > >> If it's not broken, let's fix it till it is!!1
> > > >
> > >
> > >
> > >
> > _______________________________________________
> > [hidden email] <mailto:[hidden email]> mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>
> > To unsubscribe, send any mail to "[hidden email] <mailto:[hidden email]>"
> >
>

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Christoph Hellwig
In reply to this post by Robert N. M. Watson
With my Linux hat I'd much prefer using the AT_EMPTY_PATH flag that
Linux already supports for a few *at calls to operate on the dirfd
fd.  But it seems like neither FreeBSD nor anyone else picked up that
flag, so it might be a bit of a hard sell.

FreeBSD bugzilla related to AT_EMPTY_PATH and the lack of it for
unlinkat even in Linux:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197778

Linux bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=93441

If you are interested in bringing this to FreeBSD that might be reason
enough for me into looking into a Linux implementation as well.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Konstantin Belousov-3
On Mon, Mar 05, 2018 at 07:55:50AM -0800, Christoph Hellwig wrote:

> With my Linux hat I'd much prefer using the AT_EMPTY_PATH flag that
> Linux already supports for a few *at calls to operate on the dirfd
> fd.  But it seems like neither FreeBSD nor anyone else picked up that
> flag, so it might be a bit of a hard sell.
>
> FreeBSD bugzilla related to AT_EMPTY_PATH and the lack of it for
> unlinkat even in Linux:
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197778
>
> Linux bugzilla:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=93441
>
> If you are interested in bringing this to FreeBSD that might be reason
> enough for me into looking into a Linux implementation as well.

It is not clear from the FreeBSD PR, how unlinkat() is supposed to work.
Do you mean that unlinkat(AT_EMPTY_PATH) removes the name entry for
given fd, which was used for open(2) ? If yes, this is not possible to
implement in the current FreeBSD VFS.

Also, does linkat(AT_EMPTY_PATH) supposed to link the inode referenced
only by the file descriptor ? I believe this feature is objected to.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: [capsicum] unlinkfd

Christoph Hellwig
On Mon, Mar 05, 2018 at 06:15:08PM +0200, Konstantin Belousov wrote:
> It is not clear from the FreeBSD PR, how unlinkat() is supposed to work.
> Do you mean that unlinkat(AT_EMPTY_PATH) removes the name entry for
> given fd, which was used for open(2) ?

It removes the current name for the open fd, including tracking renames
that might have happened since open.

> If yes, this is not possible to
> implement in the current FreeBSD VFS.
>
> Also, does linkat(AT_EMPTY_PATH) supposed to link the inode referenced
> only by the file descriptor ? I believe this feature is objected to.

linkat(..., AT_EMPTY_PATH) in Linux creates a new link to inode
reference by the fd.  It is a privileged operation, and requires
the inode to already have a non-zero link count, or be opened using
the Linux-specific O_TMPFILE flag, which creates and open by unlinked
file.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"