RFC: What to do about VOP_INACTIVE?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: What to do about VOP_INACTIVE?

Conrad Meyer-2
Hello,

The nominal return type of the VOP_INACTIVE vnode method is 'int', but
in practice any error returned is silently discarded.

The only caller is vinactive(), which is also a void routine.
vinactive ignores the return value of VOP_INACTIVE().  (vinactive
tends to be called by other void routines, like vput(), so propagating
an error up the stack is non-trivial.)

In practice, most filesystems in the kernel unconditionally return
zero for INACTIVE.  The exceptions are: msdosfs, ext2fs, nandfs, and
(notably) ufs.

The problem (as I see it) is that the return type makes it appear that
INACTIVE is allowed to fail, but it is not.  One important
ramification of this is that interruptible sleeps in INACTIVE are
basically not permitted.

This seems problematic because INACTIVE is invoked as part of
close(2), and we can potentially block that user process indefinitely
when the kernel filesystem is stalled on a network resource, or
something like a FUSE userspace filesystem (which can also access
network resources).

Can we live with the current behavior (INACTIVE cannot fail)?  In that
case, I think we should change its return type to void to match.

Thoughts?

Thanks,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What to do about VOP_INACTIVE?

Rick Macklem
Conrad Meyer wrote:

>The nominal return type of the VOP_INACTIVE vnode method is 'int', but
>in practice any error returned is silently discarded.
>
>The only caller is vinactive(), which is also a void routine.
>vinactive ignores the return value of VOP_INACTIVE().  (vinactive
>tends to be called by other void routines, like vput(), so propagating
>an error up the stack is non-trivial.)
>
>In practice, most filesystems in the kernel unconditionally return
>zero for INACTIVE.  The exceptions are: msdosfs, ext2fs, nandfs, and
>(notably) ufs.
>
>The problem (as I see it) is that the return type makes it appear that
>INACTIVE is allowed to fail, but it is not.  One important
>ramification of this is that interruptible sleeps in INACTIVE are
>basically not permitted.
>
>This seems problematic because INACTIVE is invoked as part of
>close(2), and we can potentially block that user process indefinitely
>when the kernel filesystem is stalled on a network resource, or
>something like a FUSE userspace filesystem (which can also access
>network resources).
>
>Can we live with the current behavior (INACTIVE cannot fail)?  In that
>case, I think we should change its return type to void to match.
>
>Thoughts?
Well Kostik is the expert, but my understanding is that a file system cannot
assume that VOP_INACTIVE() will actually be called for a vnode. As such, the
file system needs to do anything it does in its VOP_INACTIVE() in its VOP_RECLAIM()
as well, if it must be done before the vnode is reused.

As such, a failed VOP_INACTIVE() doesn't seem to be meaningful, since it might
not get called at all?

Having said that, making sure file system implementors know the above seems
more important than whether it returns int vs void.
(ie. Returning 0 seems harmless to me and may be useful in the future. Since
 changing the VFS/VOP interface can only be done for major releases and requires
 all file systems to be changed, I leave it, but this is not a strong opinion.)

rick

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: RFC: What to do about VOP_INACTIVE?

Konstantin Belousov
In reply to this post by Conrad Meyer-2
On Mon, Feb 11, 2019 at 04:52:00PM -0800, Conrad Meyer wrote:

> Hello,
>
> The nominal return type of the VOP_INACTIVE vnode method is 'int', but
> in practice any error returned is silently discarded.
>
> The only caller is vinactive(), which is also a void routine.
> vinactive ignores the return value of VOP_INACTIVE().  (vinactive
> tends to be called by other void routines, like vput(), so propagating
> an error up the stack is non-trivial.)
>
> In practice, most filesystems in the kernel unconditionally return
> zero for INACTIVE.  The exceptions are: msdosfs, ext2fs, nandfs, and
> (notably) ufs.
>
> The problem (as I see it) is that the return type makes it appear that
> INACTIVE is allowed to fail, but it is not.  One important
> ramification of this is that interruptible sleeps in INACTIVE are
> basically not permitted.
>
> This seems problematic because INACTIVE is invoked as part of
> close(2), and we can potentially block that user process indefinitely
> when the kernel filesystem is stalled on a network resource, or
> something like a FUSE userspace filesystem (which can also access
> network resources).
>
> Can we live with the current behavior (INACTIVE cannot fail)?  In that
> case, I think we should change its return type to void to match.
>
> Thoughts?

Our close(2) always removes the file descriptor from the process table,
regardless of the error returned, except for the EBADF situation.
Due to this, if some filesystem like FUSE have to stop executing its
VOP_INACTIVE due to signal, it does not change anything for the caller.

On the other hand, allowing unbound interruptible sleeps in the
implementation of inactive or reclaim is very dangerous practice, since
executing the VOPs on the vnode reclamation from VFS daemons would stop
free vnode supply to the system, effectively blocking it.  In less
dangerous situation, it would block unmount.

I do not think that efforts to change VOP_INACTIVE() return type to void
are worth the time.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: RFC: What to do about VOP_INACTIVE?

Poul-Henning Kamp
In reply to this post by Conrad Meyer-2
--------
In message <CAG6CVpU8J=G8za8W2uan8SAGEbe4PD=SXwMow=[hidden email]>, Conrad Meyer writes:

>The nominal return type of the VOP_INACTIVE vnode method is 'int', but
>in practice any error returned is silently discarded.

The return type is an historical accident from John Heidemann's phd.

The way he implemented stackable filesystems cast all VOP's to:

        typedef int vop_t __P((void *));

So that filesystems could have a VOP methodvector which looked like:

        static struct vnodeopv_entry_desc fifo_vnodeop_entries[] = {
                { &vop_default_desc,            (vop_t *) vn_default_error },
                { &vop_abortop_desc,            (vop_t *) fifo_badop },
                { &vop_access_desc,             (vop_t *) fifo_ebadf },
                { &vop_advlock_desc,            (vop_t *) fifo_advlock },
                [...]

Then on runtime, he built the real vop dispatch tables.

Around 1997 I untangled that so that we got compiler type-checks
all the way though the VOP calls, and there were enough historical
cruft to clean up that I never got around to the return type of
the individual VOPs.

If you want to fix this, the right thing to do is to add return types
in vnode_if.src and teach the tools/vnode_if.awk script about them.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: RFC: What to do about VOP_INACTIVE?

Conrad Meyer-2
In reply to this post by Konstantin Belousov
Hi Konstantin,

On Mon, Feb 11, 2019 at 11:19 PM Konstantin Belousov
<[hidden email]> wrote:
> Our close(2) always removes the file descriptor from the process table,
> regardless of the error returned, except for the EBADF situation.
> Due to this, if some filesystem like FUSE have to stop executing its
> VOP_INACTIVE due to signal, it does not change anything for the caller.

Sure.  Does it violate any contract that the kernel relies upon?  For
example, vgonel() will issue a VOP_INACTIVE() followed by
VOP_RECLAIM(); I guess this means filesystems with interruptible
INACTIVE cannot rely upon RECLAIMed vnodes being inactivated first.

> On the other hand, allowing unbound interruptible sleeps in the
> implementation of inactive or reclaim is very dangerous practice, since
> executing the VOPs on the vnode reclamation from VFS daemons would stop
> free vnode supply to the system, effectively blocking it.  In less
> dangerous situation, it would block unmount.

This is a good concern.  Even bounding sleep to some plausible time
per individual vnode would drastically restrict VFS daemons' ability
to process many vnodes in a timely fashion.  And eliminating sleeps or
bounding them to very short times may be undesirable the majority of
the time (userspace close).

I don't know what the best way to fix this is.  We could plumb a flag
argument down to INACTIVE and RECLAIM methods.  On the other hand, we
already have the 'td' argument.  Maybe that is a sufficient for VOP
methods to determine whether the caller is userspace or a kernel
daemon.

Either way, vinactive() and callers will not make any use of an EAGAIN
signal, so why have the 'int' return type?  It is misleading.

> I do not think that efforts to change VOP_INACTIVE() return type to void
> are worth the time.

It's trivial; I'm happy to do it.

Thanks,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: RFC: What to do about VOP_INACTIVE?

Konstantin Belousov
On Thu, Feb 14, 2019 at 09:50:55AM -0800, Conrad Meyer wrote:

> Hi Konstantin,
>
> On Mon, Feb 11, 2019 at 11:19 PM Konstantin Belousov
> <[hidden email]> wrote:
> > Our close(2) always removes the file descriptor from the process table,
> > regardless of the error returned, except for the EBADF situation.
> > Due to this, if some filesystem like FUSE have to stop executing its
> > VOP_INACTIVE due to signal, it does not change anything for the caller.
>
> Sure.  Does it violate any contract that the kernel relies upon?  For
> example, vgonel() will issue a VOP_INACTIVE() followed by
> VOP_RECLAIM(); I guess this means filesystems with interruptible
> INACTIVE cannot rely upon RECLAIMed vnodes being inactivated first.
VFS does not guarantee that VOP_INACTIVE is always called when the
use count reaches zero.  Look e.g. at vputx() where we only call
the inactive method when we were able to upgrade to exclusive lock,
and look at vget() where we only call it when we locked exclusive.

In other words, the reclaim method of a filesystem must be prepared to
do the work, it is typically done by calling internal inactivation
function from VOP_RECLAIM.

>
> > On the other hand, allowing unbound interruptible sleeps in the
> > implementation of inactive or reclaim is very dangerous practice, since
> > executing the VOPs on the vnode reclamation from VFS daemons would stop
> > free vnode supply to the system, effectively blocking it.  In less
> > dangerous situation, it would block unmount.
>
> This is a good concern.  Even bounding sleep to some plausible time
> per individual vnode would drastically restrict VFS daemons' ability
> to process many vnodes in a timely fashion.  And eliminating sleeps or
> bounding them to very short times may be undesirable the majority of
> the time (userspace close).
>
> I don't know what the best way to fix this is.  We could plumb a flag
> argument down to INACTIVE and RECLAIM methods.  On the other hand, we
> already have the 'td' argument.  Maybe that is a sufficient for VOP
> methods to determine whether the caller is userspace or a kernel
> daemon.
>
> Either way, vinactive() and callers will not make any use of an EAGAIN
> signal, so why have the 'int' return type?  It is misleading.
>
> > I do not think that efforts to change VOP_INACTIVE() return type to void
> > are worth the time.
>
> It's trivial; I'm happy to do it.
>
> Thanks,
> Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"