Refactoring asynchronous I/O

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Refactoring asynchronous I/O

John Baldwin
You may have noticed some small cleanups and fixes going into the AIO code
recently.  I have been working on a minor overhaul of the AIO code recently
and the recent changes have been trying to reduce the diff down to the truly
meaty changes so they are easier to review.  I think things are far enough
along to start on the meaty bits.

The current AIO code is a bit creaky and not very extensible.  It forces
all requests down via the existing fo_read/fo_write file ops so that all
requests are inherently synchronous even if the underlying file descriptor
could support async operation.  This also makes cancellation more fragile
as you can't cancel a job that is stuck sleeping in one of the AIO daemons.

The original motivation for my changes is to support efficient zero-copy
receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
suited to this type of workflow.  Various efforts in the past have tried
using page flipping (the old ZERO_COPY_SOCKETS code which required custom
ti(4) firmware) which only works if you can get things lined up "just right"
(e.g. page-aligned and sized buffers, custom firmware on your NIC, etc.) or
introducing new APIs that replace read/write (such as IO-Lite).  The primary
issue with read() of course is that the NIC DMAs data to one place and later
userland comes along and tells the kernel where it wants the data.  The issue
with introducing a new API like IO-Lite is convincing software to use it.
However, aio_read() is an existing API that can be used to queue user buffers
in advance.  In particular, you can use two buffers to ping-pong similar to
the BPF zero-copy code where you queue two buffers at the start and requeue
each completed buffer after consuming its data.  In theory the Chelsio driver
can "peek" into the request queue for a socket and schedule the pending
requests for zero copy receive.  However, doing that requires a way for
allowing the driver to "claim" certain requests and support cancelling them,
completing them, etc.

To facilitate this use case I decided to rework the AIO code to use a model
closer to the I/O Request Packets (Irps) that Windows drivers use.  In
particular, when a Windows driver decides to queue a request so that it can
be completed later, it has to install a cancel routine that is responsible
for cancelling a queued request.

To this end, I have reworked the AIO code as such:

1) File descriptor types are now the "drivers" for AIO requests rather than
   the AIO code.  When an AIO request for an fd is queued (via aio_read/write,
   etc.) a new file op (fo_aio_queue()) is called to queue or handle the
   request.  This method is responsible for completeing the request or
   queueing it to be completed later.  Currently, a default implementation of
   this method which queues the job to the existing AIO daemon pool for
   fo_read/fo_write is provided, but file types can override that with more
   specific behavior if desired.

2) The AIO code is now a library of calls for manipulating AIO requests.
   Functions to manage cancel routines, mark AIO requests as cancelled or
   completed, and schedule handler functions to run in an AIO daemon context
   are provided.

3) Operations that choose to queue an AIO request while waiting for a
   suitable resource to service it (CPU time, data to arrive on a socket,
   etc.) are required to install a cancel routine to handle cancellation of
   a request due to aio_cancel() or the exit or exec of the owning process.
   This allows the type-specific queueing logic to be self-contained
   without the AIO code having to know about all the possible queue states
   of an AIO request.

In my current implementation I use the "default" fo_aio_queue method for most
file types.  However, sockets now use a private pool of AIO kprocs, and they
also service sockets (rather than jobs).  This means that when a socket
becomes ready for either read or write, it queues a task for that socket
buffer to the socket AIO daemon pool.  That task will complete as many
requests as possible for that socket buffer (ensuring that there are no
concurrent AIO operations on a given socket).  It is also able to use
MSG_NOWAIT to avoid blocking even for blocking sockets.

One thing I have not yet done is move the physio fast-path out of vfs_aio.c
and into the devfs-specific fileops, but that can easily be done with a
custom fo_aio_queue op for the devfs file ops.

I believe that this approach also permits other file types to provide more
suitable AIO handling when suitable.

For the Chelsio use case I have added a protocol hook to allow a given
protocol to claim AIO requests instead of letting them be handled by the
generic socket AIO fileop.  This ends up being a very small change, and
the Chelsio-specific logic can live in the TOM module using the AIO library
calls to service the AIO requests.

My current WIP (not including the Chelsio bits, they need to be forward
ported from an earlier prototype) is available on the 'aio_rework' branch
of freebsd in my github space:

https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework

Note that binding the AIO support to a new fileop does mean that the AIO code
now becomes mandatory (rather than optional).  We could perhaps make the
system calls continue to be optional if people really need that, but the guts
of the code will now need to always be in the kernel.

I'd like to hear what people think of this design.  It needs some additional
cleanup before it is a commit candidate (and I'll see if I can't split it up
some more if we go this route).

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Adrian Chadd-4
On 26 January 2016 at 17:39, John Baldwin <[hidden email]> wrote:

[snip]

>
> Note that binding the AIO support to a new fileop does mean that the AIO code
> now becomes mandatory (rather than optional).  We could perhaps make the
> system calls continue to be optional if people really need that, but the guts
> of the code will now need to always be in the kernel.
>
> I'd like to hear what people think of this design.  It needs some additional
> cleanup before it is a commit candidate (and I'll see if I can't split it up
> some more if we go this route).

So this doesn't change the direct dispatch read/write to a block device, right?
That strategy path is pretty damned cheap.

Also, why's it become mandatory? I thought we had support for optional
fileops...


-a
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Slawa Olhovchenkov
In reply to this post by John Baldwin
On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:

> The original motivation for my changes is to support efficient zero-copy
> receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill

I undertuns that not you work, but: what about (teoretical) async
open/close/unlink/etc?
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
In reply to this post by Adrian Chadd-4
On Tuesday, January 26, 2016 07:17:33 PM Adrian Chadd wrote:

> On 26 January 2016 at 17:39, John Baldwin <[hidden email]> wrote:
>
> [snip]
>
> >
> > Note that binding the AIO support to a new fileop does mean that the AIO code
> > now becomes mandatory (rather than optional).  We could perhaps make the
> > system calls continue to be optional if people really need that, but the guts
> > of the code will now need to always be in the kernel.
> >
> > I'd like to hear what people think of this design.  It needs some additional
> > cleanup before it is a commit candidate (and I'll see if I can't split it up
> > some more if we go this route).
>
> So this doesn't change the direct dispatch read/write to a block device, right?
> That strategy path is pretty damned cheap.

No, that is the physio code that could be moved to be devfs specific.
 However, by exposing the aio queueing point to the fileops directly it allows
for other fileops to implement direct dispatch without having to expose
internals to the AIO code.

> Also, why's it become mandatory? I thought we had support for optional
> fileops...

Some fileops are optional in that not all file descriptors implement them
(e.g. not all fileops can be monitored by kevent() or mmap()ed), however we
do not support kldloading a fileop.  If you leave soo_aio_queue() in
sys_socket.c then the kernel would not link without vfs_aio.c since the
socket aio routine needs things like aio_complete, etc.

You could perhaps split the system calls themselves out of vfs_aio.c into a
separate sys_aio.c and make that loadable, but that would be fairly small.
It also seems fairly pointless.  The module is loadable because it was
experimental and unfinished when it was first implemented.  The current
version is certainly not perfect, but it is much further along than the
original code from 1997.  I know folks have used the current code in
production.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
In reply to this post by Slawa Olhovchenkov
On Wednesday, January 27, 2016 01:52:05 PM Slawa Olhovchenkov wrote:
> On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
>
> > The original motivation for my changes is to support efficient zero-copy
> > receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
>
> I undertuns that not you work, but: what about (teoretical) async
> open/close/unlink/etc?

Implementing more asynchronous operations is orthogonal to this.  It
would perhaps be a bit simpler to implement these in the new model
since most of the logic would live in a vnode-specific aio_queue
method in vfs_vnops.c.  However, the current AIO approach is to add a
new system call for each async system call (e.g. aio_open()).  You
would then create an internal LIO opcode (e.g. LIO_OPEN).  The vnode
aio hook would then have to support LIO_OPEN requests and return the
opened fd via aio_complete().  Async stat / open might be nice for
network filesystems in particular.  I've known of programs forking
separate threads just to do open/fstat of NFS files to achieve the
equivalent of aio_open() / aio_stat().

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Slawa Olhovchenkov
On Wed, Jan 27, 2016 at 09:52:12AM -0800, John Baldwin wrote:

> On Wednesday, January 27, 2016 01:52:05 PM Slawa Olhovchenkov wrote:
> > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> >
> > > The original motivation for my changes is to support efficient zero-copy
> > > receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
> >
> > I undertuns that not you work, but: what about (teoretical) async
> > open/close/unlink/etc?
>
> Implementing more asynchronous operations is orthogonal to this.  It
> would perhaps be a bit simpler to implement these in the new model
> since most of the logic would live in a vnode-specific aio_queue
> method in vfs_vnops.c.  However, the current AIO approach is to add a
> new system call for each async system call (e.g. aio_open()).  You
> would then create an internal LIO opcode (e.g. LIO_OPEN).  The vnode
> aio hook would then have to support LIO_OPEN requests and return the
> opened fd via aio_complete().  Async stat / open might be nice for
> network filesystems in particular.  I've known of programs forking
> separate threads just to do open/fstat of NFS files to achieve the
> equivalent of aio_open() / aio_stat().

Some problem exist for open()/unlink/rename/etc -- you can't use
fd-related semantic.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Thursday, January 28, 2016 12:04:20 AM Slawa Olhovchenkov wrote:

> On Wed, Jan 27, 2016 at 09:52:12AM -0800, John Baldwin wrote:
>
> > On Wednesday, January 27, 2016 01:52:05 PM Slawa Olhovchenkov wrote:
> > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > >
> > > > The original motivation for my changes is to support efficient zero-copy
> > > > receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
> > >
> > > I undertuns that not you work, but: what about (teoretical) async
> > > open/close/unlink/etc?
> >
> > Implementing more asynchronous operations is orthogonal to this.  It
> > would perhaps be a bit simpler to implement these in the new model
> > since most of the logic would live in a vnode-specific aio_queue
> > method in vfs_vnops.c.  However, the current AIO approach is to add a
> > new system call for each async system call (e.g. aio_open()).  You
> > would then create an internal LIO opcode (e.g. LIO_OPEN).  The vnode
> > aio hook would then have to support LIO_OPEN requests and return the
> > opened fd via aio_complete().  Async stat / open might be nice for
> > network filesystems in particular.  I've known of programs forking
> > separate threads just to do open/fstat of NFS files to achieve the
> > equivalent of aio_open() / aio_stat().
>
> Some problem exist for open()/unlink/rename/etc -- you can't use
> fd-related semantic.

Mmmm.  We have an aio_mlock().  aio_open() would require more of a special
case like aio_mlock().  It's still doable, but it would not go via the
fileop, yes.  fstat could go via the fileop, but a path-based stat would
be akin to aio_open().

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Slawa Olhovchenkov
On Wed, Jan 27, 2016 at 01:16:37PM -0800, John Baldwin wrote:

> On Thursday, January 28, 2016 12:04:20 AM Slawa Olhovchenkov wrote:
> > On Wed, Jan 27, 2016 at 09:52:12AM -0800, John Baldwin wrote:
> >
> > > On Wednesday, January 27, 2016 01:52:05 PM Slawa Olhovchenkov wrote:
> > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > >
> > > > > The original motivation for my changes is to support efficient zero-copy
> > > > > receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
> > > >
> > > > I undertuns that not you work, but: what about (teoretical) async
> > > > open/close/unlink/etc?
> > >
> > > Implementing more asynchronous operations is orthogonal to this.  It
> > > would perhaps be a bit simpler to implement these in the new model
> > > since most of the logic would live in a vnode-specific aio_queue
> > > method in vfs_vnops.c.  However, the current AIO approach is to add a
> > > new system call for each async system call (e.g. aio_open()).  You
> > > would then create an internal LIO opcode (e.g. LIO_OPEN).  The vnode
> > > aio hook would then have to support LIO_OPEN requests and return the
> > > opened fd via aio_complete().  Async stat / open might be nice for
> > > network filesystems in particular.  I've known of programs forking
> > > separate threads just to do open/fstat of NFS files to achieve the
> > > equivalent of aio_open() / aio_stat().
> >
> > Some problem exist for open()/unlink/rename/etc -- you can't use
> > fd-related semantic.
>
> Mmmm.  We have an aio_mlock().  aio_open() would require more of a special
> case like aio_mlock().  It's still doable, but it would not go via the
> fileop, yes.  fstat could go via the fileop, but a path-based stat would
> be akin to aio_open().

aio_rename require yet more of special handling.
As I see this is can't be packed in current structures (aiocb and
perhaps sigevent). I am don't see space for multiple paths. I am don't
see space for fd return.

Need to change some semantics (dissalow some notifications, for
examples, only SIGEV_THREAD will be allowed? How pass information
about called aio operation?).

Also, may be some problems inside kernel for fd-less operations?
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Thursday, January 28, 2016 02:18:17 AM Slawa Olhovchenkov wrote:

> On Wed, Jan 27, 2016 at 01:16:37PM -0800, John Baldwin wrote:
>
> > On Thursday, January 28, 2016 12:04:20 AM Slawa Olhovchenkov wrote:
> > > On Wed, Jan 27, 2016 at 09:52:12AM -0800, John Baldwin wrote:
> > >
> > > > On Wednesday, January 27, 2016 01:52:05 PM Slawa Olhovchenkov wrote:
> > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > > >
> > > > > > The original motivation for my changes is to support efficient zero-copy
> > > > > > receive for TOE using Chelsio T4/T5 adapters.  However, read() is ill
> > > > >
> > > > > I undertuns that not you work, but: what about (teoretical) async
> > > > > open/close/unlink/etc?
> > > >
> > > > Implementing more asynchronous operations is orthogonal to this.  It
> > > > would perhaps be a bit simpler to implement these in the new model
> > > > since most of the logic would live in a vnode-specific aio_queue
> > > > method in vfs_vnops.c.  However, the current AIO approach is to add a
> > > > new system call for each async system call (e.g. aio_open()).  You
> > > > would then create an internal LIO opcode (e.g. LIO_OPEN).  The vnode
> > > > aio hook would then have to support LIO_OPEN requests and return the
> > > > opened fd via aio_complete().  Async stat / open might be nice for
> > > > network filesystems in particular.  I've known of programs forking
> > > > separate threads just to do open/fstat of NFS files to achieve the
> > > > equivalent of aio_open() / aio_stat().
> > >
> > > Some problem exist for open()/unlink/rename/etc -- you can't use
> > > fd-related semantic.
> >
> > Mmmm.  We have an aio_mlock().  aio_open() would require more of a special
> > case like aio_mlock().  It's still doable, but it would not go via the
> > fileop, yes.  fstat could go via the fileop, but a path-based stat would
> > be akin to aio_open().
>
> aio_rename require yet more of special handling.
> As I see this is can't be packed in current structures (aiocb and
> perhaps sigevent). I am don't see space for multiple paths. I am don't
> see space for fd return.
>
> Need to change some semantics (dissalow some notifications, for
> examples, only SIGEV_THREAD will be allowed? How pass information
> about called aio operation?).
>
> Also, may be some problems inside kernel for fd-less operations?

The kernel side of aiocb is free to hold additional information, and you
could always pass additional information via arguments, e.g.

aio_rename(aiocb, from, to)

(Alternatively, you could define aio_buf as pointing to a structure
that holds arguments.)

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Jilles Tjoelker
In reply to this post by John Baldwin
On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> Note that binding the AIO support to a new fileop does mean that the AIO code
> now becomes mandatory (rather than optional).  We could perhaps make the
> system calls continue to be optional if people really need that, but the guts
> of the code will now need to always be in the kernel.

Enabling this by default is OK with me as long as the easy ways to get a
stuck process are at least disabled by default. Currently, a process
gets stuck forever if it has an AIO request from or to a pipe that will
never complete. An AIO daemon should not be allowed to perform an
unbounded sleep such as for a pipe (NFS server should be OK).

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:

> On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > Note that binding the AIO support to a new fileop does mean that the AIO code
> > now becomes mandatory (rather than optional).  We could perhaps make the
> > system calls continue to be optional if people really need that, but the guts
> > of the code will now need to always be in the kernel.
>
> Enabling this by default is OK with me as long as the easy ways to get a
> stuck process are at least disabled by default. Currently, a process
> gets stuck forever if it has an AIO request from or to a pipe that will
> never complete. An AIO daemon should not be allowed to perform an
> unbounded sleep such as for a pipe (NFS server should be OK).

Mmm, I don't currently fix this for pipes, but my changes do fix this for
sockets (right now if you queue multiple reads for a socket both are woken up
when data arrives and if the data only satifies the first read request, the
second will block an AIO daemon forever).

However, having fo_aio_queue() should make this fixable for pipes as they
could use their own queueing logic and handling function.  It may be that
pipes need to work more like sockets (where the handling is object-centric
rather than request-centric, so a pipe would queue a task when it was ready
and would drain any requests queued to that pipe).  Pipes could either share
the same AIO daemon pool as sockets or use a private pool.  I'd probably like
to avoid an explosion of daemon pools though.

I considered having a single AIO pool, but it's kind of messy to keep the
per-process job limits in the request-centric pool while also permitting
object-centric handlers on the internal job queue.  OTOH, if enough backends
end up using object-centric handlers then the job limits might end up
meaningless and we could just drop them.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
In reply to this post by Jilles Tjoelker
On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:

> On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > Note that binding the AIO support to a new fileop does mean that the AIO code
> > now becomes mandatory (rather than optional).  We could perhaps make the
> > system calls continue to be optional if people really need that, but the guts
> > of the code will now need to always be in the kernel.
>
> Enabling this by default is OK with me as long as the easy ways to get a
> stuck process are at least disabled by default. Currently, a process
> gets stuck forever if it has an AIO request from or to a pipe that will
> never complete. An AIO daemon should not be allowed to perform an
> unbounded sleep such as for a pipe (NFS server should be OK).

One thing I could do is split vfs_aio.c into two files: kern_aio.c that
holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
sys_aio.c that is just the system calls.  kern_aio.c would be standard,
but sys_aio.c could still be optional and aio.ko would contain it.
This would still make AIO optional, though aio.ko would be fairly small,
so not having it probably wouldn't save much in terms of size.

Does this seem reasonable or is a trivial aio.ko not worth it?

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Jilles Tjoelker
On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote:
> On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:
> > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > Note that binding the AIO support to a new fileop does mean that
> > > the AIO code now becomes mandatory (rather than optional).  We
> > > could perhaps make the system calls continue to be optional if
> > > people really need that, but the guts of the code will now need to
> > > always be in the kernel.

> > Enabling this by default is OK with me as long as the easy ways to get a
> > stuck process are at least disabled by default. Currently, a process
> > gets stuck forever if it has an AIO request from or to a pipe that will
> > never complete. An AIO daemon should not be allowed to perform an
> > unbounded sleep such as for a pipe (NFS server should be OK).

> One thing I could do is split vfs_aio.c into two files: kern_aio.c that
> holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
> sys_aio.c that is just the system calls.  kern_aio.c would be standard,
> but sys_aio.c could still be optional and aio.ko would contain it.
> This would still make AIO optional, though aio.ko would be fairly small,
> so not having it probably wouldn't save much in terms of size.

> Does this seem reasonable or is a trivial aio.ko not worth it?

It is one possible option. Another option is to refuse AIO for kinds of
file that have not been vetted to avoid blocking an AIO daemon for too
long. "Kinds of file" is about the code that will be executed to do I/O,
so that, for example, /dev/klog and /dev/ttyv0 are different kinds.
Depending on whether this restriction breaks existing code, it may need
to be a sysctl.

I don't expect a full AIO implementation for all kinds of file any time
soon, so putting AIO syscalls into a kernel module will still leave the
standard system without AIO for a long time, while still paying for the
footprint of the AIO code.

All of this would not be a problem if PCATCH worked in AIO daemons
(treat aio_cancel like a pending signal in a user process) but the last
time I looked this appeared quite hard to fix.

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote:

> On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote:
> > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:
> > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > > Note that binding the AIO support to a new fileop does mean that
> > > > the AIO code now becomes mandatory (rather than optional).  We
> > > > could perhaps make the system calls continue to be optional if
> > > > people really need that, but the guts of the code will now need to
> > > > always be in the kernel.
>
> > > Enabling this by default is OK with me as long as the easy ways to get a
> > > stuck process are at least disabled by default. Currently, a process
> > > gets stuck forever if it has an AIO request from or to a pipe that will
> > > never complete. An AIO daemon should not be allowed to perform an
> > > unbounded sleep such as for a pipe (NFS server should be OK).
>
> > One thing I could do is split vfs_aio.c into two files: kern_aio.c that
> > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
> > sys_aio.c that is just the system calls.  kern_aio.c would be standard,
> > but sys_aio.c could still be optional and aio.ko would contain it.
> > This would still make AIO optional, though aio.ko would be fairly small,
> > so not having it probably wouldn't save much in terms of size.
>
> > Does this seem reasonable or is a trivial aio.ko not worth it?
>
> It is one possible option. Another option is to refuse AIO for kinds of
> file that have not been vetted to avoid blocking an AIO daemon for too
> long. "Kinds of file" is about the code that will be executed to do I/O,
> so that, for example, /dev/klog and /dev/ttyv0 are different kinds.
> Depending on whether this restriction breaks existing code, it may need
> to be a sysctl.

This doesn't seem to be very easy to implement, especially if it should
vary by character device.  It sounds like what you would do today is
to allow mlock() and AIO on sockets and maybe the fast physio path
and disable everything else by default for now?  This would be slightly
more feasible than something more fine-grained.  A sysctl would allow
"full" AIO use that would disable these checks?

> All of this would not be a problem if PCATCH worked in AIO daemons
> (treat aio_cancel like a pending signal in a user process) but the last
> time I looked this appeared quite hard to fix.

The cancel routines approach is what I'm doing to try to resolve this,
but it means handling the case explicitly in the various backends.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Jilles Tjoelker
On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote:
> On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote:
> > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote:
> > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:
> > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > > > Note that binding the AIO support to a new fileop does mean that
> > > > > the AIO code now becomes mandatory (rather than optional).  We
> > > > > could perhaps make the system calls continue to be optional if
> > > > > people really need that, but the guts of the code will now need to
> > > > > always be in the kernel.

> > > > Enabling this by default is OK with me as long as the easy ways to get a
> > > > stuck process are at least disabled by default. Currently, a process
> > > > gets stuck forever if it has an AIO request from or to a pipe that will
> > > > never complete. An AIO daemon should not be allowed to perform an
> > > > unbounded sleep such as for a pipe (NFS server should be OK).

> > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that
> > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
> > > sys_aio.c that is just the system calls.  kern_aio.c would be standard,
> > > but sys_aio.c could still be optional and aio.ko would contain it.
> > > This would still make AIO optional, though aio.ko would be fairly small,
> > > so not having it probably wouldn't save much in terms of size.

> > > Does this seem reasonable or is a trivial aio.ko not worth it?

> > It is one possible option. Another option is to refuse AIO for kinds of
> > file that have not been vetted to avoid blocking an AIO daemon for too
> > long. "Kinds of file" is about the code that will be executed to do I/O,
> > so that, for example, /dev/klog and /dev/ttyv0 are different kinds.
> > Depending on whether this restriction breaks existing code, it may need
> > to be a sysctl.

> This doesn't seem to be very easy to implement, especially if it should
> vary by character device.  It sounds like what you would do today is
> to allow mlock() and AIO on sockets and maybe the fast physio path
> and disable everything else by default for now?  This would be slightly
> more feasible than something more fine-grained.  A sysctl would allow
> "full" AIO use that would disable these checks?

It does not seem that complicated to me. A check can be inserted before
placing a job into aio_jobs (in aio_queue_file() in the patched code).
If the code can detect some safe cases, they could be allowed;
otherwise, the AIO daemons will not be used. I'm assuming that the code
in the new fo_aio_queue file ops behaves properly and will not cause AIO
daemons to get stuck.

Note that this means there is no easy way to do kernel AIO on a kind of
file without specific support in the code for that kind of file. I think
the risk of a stuck process (that may even cause shutdown to hang
indefinitely) is bad enough to forbid it by default.

> > All of this would not be a problem if PCATCH worked in AIO daemons
> > (treat aio_cancel like a pending signal in a user process) but the last
> > time I looked this appeared quite hard to fix.

> The cancel routines approach is what I'm doing to try to resolve this,
> but it means handling the case explicitly in the various backends.

OK.

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Wednesday, February 10, 2016 11:58:44 PM Jilles Tjoelker wrote:

> On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote:
> > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote:
> > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote:
> > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:
> > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > > > > Note that binding the AIO support to a new fileop does mean that
> > > > > > the AIO code now becomes mandatory (rather than optional).  We
> > > > > > could perhaps make the system calls continue to be optional if
> > > > > > people really need that, but the guts of the code will now need to
> > > > > > always be in the kernel.
>
> > > > > Enabling this by default is OK with me as long as the easy ways to get a
> > > > > stuck process are at least disabled by default. Currently, a process
> > > > > gets stuck forever if it has an AIO request from or to a pipe that will
> > > > > never complete. An AIO daemon should not be allowed to perform an
> > > > > unbounded sleep such as for a pipe (NFS server should be OK).
>
> > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that
> > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
> > > > sys_aio.c that is just the system calls.  kern_aio.c would be standard,
> > > > but sys_aio.c could still be optional and aio.ko would contain it.
> > > > This would still make AIO optional, though aio.ko would be fairly small,
> > > > so not having it probably wouldn't save much in terms of size.
>
> > > > Does this seem reasonable or is a trivial aio.ko not worth it?
>
> > > It is one possible option. Another option is to refuse AIO for kinds of
> > > file that have not been vetted to avoid blocking an AIO daemon for too
> > > long. "Kinds of file" is about the code that will be executed to do I/O,
> > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds.
> > > Depending on whether this restriction breaks existing code, it may need
> > > to be a sysctl.
>
> > This doesn't seem to be very easy to implement, especially if it should
> > vary by character device.  It sounds like what you would do today is
> > to allow mlock() and AIO on sockets and maybe the fast physio path
> > and disable everything else by default for now?  This would be slightly
> > more feasible than something more fine-grained.  A sysctl would allow
> > "full" AIO use that would disable these checks?
>
> It does not seem that complicated to me. A check can be inserted before
> placing a job into aio_jobs (in aio_queue_file() in the patched code).
> If the code can detect some safe cases, they could be allowed;
> otherwise, the AIO daemons will not be used. I'm assuming that the code
> in the new fo_aio_queue file ops behaves properly and will not cause AIO
> daemons to get stuck.
>
> Note that this means there is no easy way to do kernel AIO on a kind of
> file without specific support in the code for that kind of file. I think
> the risk of a stuck process (that may even cause shutdown to hang
> indefinitely) is bad enough to forbid it by default.

Ok.  One issue is that some software may assume that aio will "work" if
modfind("aio") works and might be surprised by it not working.  I'm not sure
how real that is.  I don't plan on merging this to 10 so we will have some
testing time in 11 to figure that out.  If it does prove problematic we can
revert to splitting the syscalls out into aio.ko.  For now I think the two
cases to enable by default would be sockets (which use a fo_aio_queue method)
and physio.  We could let the VFS_AIO option change the sysctl's default
value to allow all AIO for compat, though that seems a bit clumsy as the
name doesn't really make sense for that.

(I also still don't like the vfs_aio.c filename as aio is not VFS-specific.)

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Wednesday, February 10, 2016 03:19:12 PM John Baldwin wrote:

> On Wednesday, February 10, 2016 11:58:44 PM Jilles Tjoelker wrote:
> > On Mon, Feb 08, 2016 at 11:22:25AM -0800, John Baldwin wrote:
> > > On Friday, February 05, 2016 10:32:12 PM Jilles Tjoelker wrote:
> > > > On Fri, Feb 05, 2016 at 12:21:52PM -0800, John Baldwin wrote:
> > > > > On Monday, February 01, 2016 12:02:14 AM Jilles Tjoelker wrote:
> > > > > > On Tue, Jan 26, 2016 at 05:39:03PM -0800, John Baldwin wrote:
> > > > > > > Note that binding the AIO support to a new fileop does mean that
> > > > > > > the AIO code now becomes mandatory (rather than optional).  We
> > > > > > > could perhaps make the system calls continue to be optional if
> > > > > > > people really need that, but the guts of the code will now need to
> > > > > > > always be in the kernel.
> >
> > > > > > Enabling this by default is OK with me as long as the easy ways to get a
> > > > > > stuck process are at least disabled by default. Currently, a process
> > > > > > gets stuck forever if it has an AIO request from or to a pipe that will
> > > > > > never complete. An AIO daemon should not be allowed to perform an
> > > > > > unbounded sleep such as for a pipe (NFS server should be OK).
> >
> > > > > One thing I could do is split vfs_aio.c into two files: kern_aio.c that
> > > > > holds the "library" such as aio_aqueue() / aio_complete(), etc. and a
> > > > > sys_aio.c that is just the system calls.  kern_aio.c would be standard,
> > > > > but sys_aio.c could still be optional and aio.ko would contain it.
> > > > > This would still make AIO optional, though aio.ko would be fairly small,
> > > > > so not having it probably wouldn't save much in terms of size.
> >
> > > > > Does this seem reasonable or is a trivial aio.ko not worth it?
> >
> > > > It is one possible option. Another option is to refuse AIO for kinds of
> > > > file that have not been vetted to avoid blocking an AIO daemon for too
> > > > long. "Kinds of file" is about the code that will be executed to do I/O,
> > > > so that, for example, /dev/klog and /dev/ttyv0 are different kinds.
> > > > Depending on whether this restriction breaks existing code, it may need
> > > > to be a sysctl.
> >
> > > This doesn't seem to be very easy to implement, especially if it should
> > > vary by character device.  It sounds like what you would do today is
> > > to allow mlock() and AIO on sockets and maybe the fast physio path
> > > and disable everything else by default for now?  This would be slightly
> > > more feasible than something more fine-grained.  A sysctl would allow
> > > "full" AIO use that would disable these checks?
> >
> > It does not seem that complicated to me. A check can be inserted before
> > placing a job into aio_jobs (in aio_queue_file() in the patched code).
> > If the code can detect some safe cases, they could be allowed;
> > otherwise, the AIO daemons will not be used. I'm assuming that the code
> > in the new fo_aio_queue file ops behaves properly and will not cause AIO
> > daemons to get stuck.
> >
> > Note that this means there is no easy way to do kernel AIO on a kind of
> > file without specific support in the code for that kind of file. I think
> > the risk of a stuck process (that may even cause shutdown to hang
> > indefinitely) is bad enough to forbid it by default.
>
> Ok.  One issue is that some software may assume that aio will "work" if
> modfind("aio") works and might be surprised by it not working.  I'm not sure
> how real that is.  I don't plan on merging this to 10 so we will have some
> testing time in 11 to figure that out.  If it does prove problematic we can
> revert to splitting the syscalls out into aio.ko.  For now I think the two
> cases to enable by default would be sockets (which use a fo_aio_queue method)
> and physio.  We could let the VFS_AIO option change the sysctl's default
> value to allow all AIO for compat, though that seems a bit clumsy as the
> name doesn't really make sense for that.
>
> (I also still don't like the vfs_aio.c filename as aio is not VFS-specific.)

I've pushed a new version with a vfs.aio.enable_unsafe sysctl (defaults to
off).  If you think this is ok then I'll work on cleaning up some of the
module bits (removing the unload support mostly).  I figure marking the
syscalls as STD and removing all the helper stuff can be done as a followup
commit to reduce extra noise in this diff (which is large enough on its own).

https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework

(Also, for the inital commit I will leave out the socket protocol hook.  That
can be added later.)

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

Jilles Tjoelker
On Wed, Feb 10, 2016 at 04:21:30PM -0800, John Baldwin wrote:
> I've pushed a new version with a vfs.aio.enable_unsafe sysctl
> (defaults to off).  If you think this is ok then I'll work on cleaning
> up some of the module bits (removing the unload support mostly).  I
> figure marking the syscalls as STD and removing all the helper stuff
> can be done as a followup commit to reduce extra noise in this diff
> (which is large enough on its own).

> https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework

> (Also, for the inital commit I will leave out the socket protocol
> hook.  That can be added later.)

Looks good to me, except that the error should be [ENOTSUP] instead of
the overloaded [EINVAL].

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Refactoring asynchronous I/O

John Baldwin
On Thursday, February 11, 2016 11:32:38 PM Jilles Tjoelker wrote:

> On Wed, Feb 10, 2016 at 04:21:30PM -0800, John Baldwin wrote:
> > I've pushed a new version with a vfs.aio.enable_unsafe sysctl
> > (defaults to off).  If you think this is ok then I'll work on cleaning
> > up some of the module bits (removing the unload support mostly).  I
> > figure marking the syscalls as STD and removing all the helper stuff
> > can be done as a followup commit to reduce extra noise in this diff
> > (which is large enough on its own).
>
> > https://github.com/freebsd/freebsd/compare/master...bsdjhb:aio_rework
>
> > (Also, for the inital commit I will leave out the socket protocol
> > hook.  That can be added later.)
>
> Looks good to me, except that the error should be [ENOTSUP] instead of
> the overloaded [EINVAL].

Ok.  I've done some further cleanups and posted a commit candidate at
https://reviews.freebsd.org/D5289 for anyone else who is interested.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"