should a copy_file_range(2) syscall be interrupted via a signal

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

should a copy_file_range(2) syscall be interrupted via a signal

Rick Macklem
Hi,

I have been working on a Linux compatible copy_file_range(2) syscall
(the current code can be found at https://reviews.freebsd.org/D20584).

One outstanding issue is how it should deal with signals.
Right now, I have vn_start_write() without PCATCH, so that it won't be
interrupted by a signal, but I notice that vn_write() {ie. write syscall } does
have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
without IO_NODELOCKED.

I am thinking that copy_file_range(2) should do this also.
However, if it returns an error, it is impossible for the caller to know how much
of the data range got copied.

What do you think the copy_file_range(2) code should do?

Thanks, rick
ps: I've used FreeBSD-current@ this time, to see if I get more replies than I
      did using FreeBSD-fs@.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Hans Petter Selasky-6
On 2019-07-05 02:28, Rick Macklem wrote:
> I am thinking that copy_file_range(2) should do this also.
> However, if it returns an error, it is impossible for the caller to know how much
> of the data range got copied.

How can you kill a program stuck on copy_file_range(2) w/o catching signals?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Alan Somers-2
In reply to this post by Rick Macklem
On Thu, Jul 4, 2019 at 6:29 PM Rick Macklem <[hidden email]> wrote:

>
> Hi,
>
> I have been working on a Linux compatible copy_file_range(2) syscall
> (the current code can be found at https://reviews.freebsd.org/D20584).
>
> One outstanding issue is how it should deal with signals.
> Right now, I have vn_start_write() without PCATCH, so that it won't be
> interrupted by a signal, but I notice that vn_write() {ie. write syscall } does
> have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
> without IO_NODELOCKED.
>
> I am thinking that copy_file_range(2) should do this also.
> However, if it returns an error, it is impossible for the caller to know how much
> of the data range got copied.
>
> What do you think the copy_file_range(2) code should do?
>
> Thanks, rick
> ps: I've used FreeBSD-current@ this time, to see if I get more replies than I
>       did using FreeBSD-fs@.

I though copy_file_range(2) is allowed to return short.  Why can't it
do that if it gets interrupted?
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Mark Johnston-2
In reply to this post by Rick Macklem
On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:

> Hi,
>
> I have been working on a Linux compatible copy_file_range(2) syscall
> (the current code can be found at https://reviews.freebsd.org/D20584).
>
> One outstanding issue is how it should deal with signals.
> Right now, I have vn_start_write() without PCATCH, so that it won't be
> interrupted by a signal, but I notice that vn_write() {ie. write syscall } does
> have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
> without IO_NODELOCKED.
>
> I am thinking that copy_file_range(2) should do this also.
> However, if it returns an error, it is impossible for the caller to know how much
> of the data range got copied.

Couldn't copy_file_range() return the number of bytes copied in this
case?  (The Linux man page notes that short writes are possible.) I
would expect to see the same error handling that we have in
dofilewrite(), where certain errnos are squashed.

> What do you think the copy_file_range(2) code should do?

I'd find it surprising if copy_file_range() isn't interruptible.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Rick Macklem
Mark Johnston wrote:

>On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
>> Hi,
>>
>> I have been working on a Linux compatible copy_file_range(2) syscall
>> (the current code can be found at https://reviews.freebsd.org/D20584).
>>
>> One outstanding issue is how it should deal with signals.
>> Right now, I have vn_start_write() without PCATCH, so that it won't be
>> interrupted by a signal, but I notice that vn_write() {ie. write syscall } does
>> have PCATCH on vn_start_write() and so does vn_rdwr() when it is called
>> without IO_NODELOCKED.
>>
>> I am thinking that copy_file_range(2) should do this also.
>> However, if it returns an error, it is impossible for the caller to know how much
>> of the data range got copied.
>
>Couldn't copy_file_range() return the number of bytes copied in this
>case?  (The Linux man page notes that short writes are possible.) I
>would expect to see the same error handling that we have in
>dofilewrite(), where certain errnos are squashed.
I think this would be a good approach for local file systems, since I believe that
the only place that EINTR can be generated is the vn_start_write() call, since
vn_rdwr(IO_NODELOCKED) never returns it and the call completes before
returning.

As such, the EINTR happens at a "well known" place in the copy and a return of
the bytes copied should be fine.

Now, for NFS, it gets a little weird...
- For NFSv3, many use the "intr" mount option, which means that a VOP_WRITE()
  can return EINTR and the caller doesn't know if the write succeeded on the NFS
  server or not.
  --> Returning "bytes copied" instead of an error for this case doesn't seem
       appropriate to me, since there is no way to know if the last write happened?
However, "intr" is not recommended for NFSv4 and NFSv4.2 is the only case where
there is an RPC to do this on the server.

Maybe nfs_copy_file_range() shouldn't "hide" EINTR, although the local file
systems do so.

I think sounds like a good approach.
What do others think?

>> What do you think the copy_file_range(2) code should do?
>
>I'd find it surprising if copy_file_range() isn't interruptible.
I'll admit I haven't tested on Linux, so I don't know what happens there.
The Linux man page doesn't mention EINTR, but I don't know what happens
for a Linux "intr" NFS mount. I do have a Linux system for testing, but it is the
same system I have been using to test this syscall on FreeBSD. Maybe I need to
boot/play around with it.

I do think returning "bytes copied" instead of EINTR is a good idea, where practical.

Thanks for the comments, rick
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Rick Macklem
In reply to this post by Hans Petter Selasky-6
Hans Petter Selasky wrote:
>On 2019-07-05 02:28, Rick Macklem wrote:
>> I am thinking that copy_file_range(2) should do this also.
>> However, if it returns an error, it is impossible for the caller to know how much
>> of the data range got copied.
>
>How can you kill a program stuck on copy_file_range(2) w/o catching signals?
Well, if "stuck" means sleeping somewhere inside the VOP_WRITE() call for
the file system, I think it is "stuck" forever, just like write(2), isn't it?

For NFS, the "intr" option might allow write(2) to return EINTR, but it often
takes a forced dismount (actually "umount -N") to get it "unstuck".

However, I think for the case where the signal is detected outside of
VOP_READ()/VOP_WRITE() in the copy loop, it does make sense to terminate
it and I think the suggestion of returning "bytes copied" instead of EINTR is
a good one.

rick
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Jilles Tjoelker
In reply to this post by Rick Macklem
On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
> I have been working on a Linux compatible copy_file_range(2) syscall
> (the current code can be found at https://reviews.freebsd.org/D20584).

> One outstanding issue is how it should deal with signals. Right now, I
> have vn_start_write() without PCATCH, so that it won't be interrupted
> by a signal, but I notice that vn_write() {ie. write syscall } does
> have PCATCH on vn_start_write() and so does vn_rdwr() when it is
> called without IO_NODELOCKED.

A regular write() is only interruptible when writing to a terminal,
pseudo-terminal master, pipe, socket, or, under certain conditions, a
file on an NFS intr mount. Therefore, applications may not have the code
to resume interrupted writes to regular files gracefully.

> I am thinking that copy_file_range(2) should do this also.
> However, if it returns an error, it is impossible for the caller to
> know how much of the data range got copied.

A regular write() returns partial success if interrupted by a signal
when it has already written something. Therefore, the application can
resume the operation by adjusting pointers and counts.

Something similar applies to "deterministic" errors like [EFBIG] where
the first call will write as far as possible (if this is not nothing)
successfully and the next attempt will return the error.

> What do you think the copy_file_range(2) code should do?

I'm not sure it should actually be done, but the need for adjusting
pointers and counts could be avoided with a little extra kernel and libc
code. The system call would receive an additional argument pointing to
an off_t that indicates how many bytes previous calls have already
written. A libc wrapper would initialize this to 0. With this, the
system call can be restarted automatically after a signal.

In any case, [EINTR] and the internal ERESTART must not be returned
unless it is safe to repeat the call with the same (direct) arguments.

--
Jilles Tjoelker
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Konstantin Belousov
On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:

> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
> > I have been working on a Linux compatible copy_file_range(2) syscall
> > (the current code can be found at https://reviews.freebsd.org/D20584).
>
> > One outstanding issue is how it should deal with signals. Right now, I
> > have vn_start_write() without PCATCH, so that it won't be interrupted
> > by a signal, but I notice that vn_write() {ie. write syscall } does
> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
> > called without IO_NODELOCKED.
>
> A regular write() is only interruptible when writing to a terminal,
> pseudo-terminal master, pipe, socket, or, under certain conditions, a
> file on an NFS intr mount. Therefore, applications may not have the code
> to resume interrupted writes to regular files gracefully.
>
> > I am thinking that copy_file_range(2) should do this also.
> > However, if it returns an error, it is impossible for the caller to
> > know how much of the data range got copied.
>
> A regular write() returns partial success if interrupted by a signal
> when it has already written something. Therefore, the application can
> resume the operation by adjusting pointers and counts.
>
> Something similar applies to "deterministic" errors like [EFBIG] where
> the first call will write as far as possible (if this is not nothing)
> successfully and the next attempt will return the error.
>
> > What do you think the copy_file_range(2) code should do?
>
> I'm not sure it should actually be done, but the need for adjusting
> pointers and counts could be avoided with a little extra kernel and libc
> code. The system call would receive an additional argument pointing to
> an off_t that indicates how many bytes previous calls have already
> written. A libc wrapper would initialize this to 0. With this, the
> system call can be restarted automatically after a signal.
>
> In any case, [EINTR] and the internal ERESTART must not be returned
> unless it is safe to repeat the call with the same (direct) arguments.

BTW, if the syscall is made interruptible, it should be made cancellable ?

I think that PCATCH commonly used for vn_start_write(9) is not the best
decision.  It is safe in the sense explained by Jilles, since its interruption
only happens at the very beginning of the syscall, but it contradict to the
tradition of write(2) to the local fs being not interruptible.

I suggest to not make the syscall interruptible by default, and perhaps
only allow it with a flag.  Then you would need to explain that the
syscall is only interruptible between VOPs, it is up to fs to decide if
the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Rick Macklem
Konstantin Belousov wrote:

>On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
>> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
>> > I have been working on a Linux compatible copy_file_range(2) syscall
>> > (the current code can be found at https://reviews.freebsd.org/D20584).
>>
>> > One outstanding issue is how it should deal with signals. Right now, I
>> > have vn_start_write() without PCATCH, so that it won't be interrupted
>> > by a signal, but I notice that vn_write() {ie. write syscall } does
>> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
>> > called without IO_NODELOCKED.
>>
>> A regular write() is only interruptible when writing to a terminal,
>> pseudo-terminal master, pipe, socket, or, under certain conditions, a
>> file on an NFS intr mount. Therefore, applications may not have the code
>> to resume interrupted writes to regular files gracefully.
Yes, agreed. Since this syscall only works on VREG vnodes, the only weird cases
are NFS (and maybe fuse). I'll let asomers@ address the fuse situation.

>>
>> > I am thinking that copy_file_range(2) should do this also.
>> > However, if it returns an error, it is impossible for the caller to
>> > know how much of the data range got copied.
>>
>> A regular write() returns partial success if interrupted by a signal
>> when it has already written something. Therefore, the application can
>> resume the operation by adjusting pointers and counts.
>>
>> Something similar applies to "deterministic" errors like [EFBIG] where
>> the first call will write as far as possible (if this is not nothing)
>> successfully and the next attempt will return the error.
>>
>> > What do you think the copy_file_range(2) code should do?
>>
>> I'm not sure it should actually be done, but the need for adjusting
>> pointers and counts could be avoided with a little extra kernel and libc
>> code. The system call would receive an additional argument pointing to
>> an off_t that indicates how many bytes previous calls have already
>> written. A libc wrapper would initialize this to 0. With this, the
>> system call can be restarted automatically after a signal.
>>
>> In any case, [EINTR] and the internal ERESTART must not be returned
>> unless it is safe to repeat the call with the same (direct) arguments.
Well, since the copy_file_range(2) syscall is allowed to return fewer bytes copied
than requested and this doesn't mean EOF, it seems that doing that would
achieve the result of allowing an application to call it again.
(Basically, it must be used in a loop until the bytes of the range have been copied,
 since returning fewer bytes copied than requested is a normal outcome.)

>BTW, if the syscall is made interruptible, it should be made cancellable ?
Not sure what you mean by "cancellable"? If you mean "terminated by a signal
where there has been no change to the output file, then that could only easily be
done by returning EINTR before any data has been copied.
If you mean something else, then I'd need to know what that is?

>I think that PCATCH commonly used for vn_start_write(9) is not the best
>decision.  It is safe in the sense explained by Jilles, since its interruption
>only happens at the very beginning of the syscall, but it contradict to the
>tradition of write(2) to the local fs being not interruptible.
>
>I suggest to not make the syscall interruptible by default, and perhaps
>only allow it with a flag.  Then you would need to explain that the
>syscall is only interruptible between VOPs, it is up to fs to decide if
>the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
This is how it is coded now. The one thing I have noticed is that a
copy_file_range() can take a long time (about 2min for 2Gbytes on the old hardware
I test on). This seems like a long delay for <crtl>C when you do that to an application
copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it isn't a bug
in copy_file_range(2). It just introduces a long delay in response to <crtl>C.)

rick

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Konstantin Belousov
On Fri, Jul 05, 2019 at 08:59:23PM +0000, Rick Macklem wrote:

> Konstantin Belousov wrote:
> >On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
> >> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
> >> > I have been working on a Linux compatible copy_file_range(2) syscall
> >> > (the current code can be found at https://reviews.freebsd.org/D20584).
> >>
> >> > One outstanding issue is how it should deal with signals. Right now, I
> >> > have vn_start_write() without PCATCH, so that it won't be interrupted
> >> > by a signal, but I notice that vn_write() {ie. write syscall } does
> >> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
> >> > called without IO_NODELOCKED.
> >>
> >> A regular write() is only interruptible when writing to a terminal,
> >> pseudo-terminal master, pipe, socket, or, under certain conditions, a
> >> file on an NFS intr mount. Therefore, applications may not have the code
> >> to resume interrupted writes to regular files gracefully.
> Yes, agreed. Since this syscall only works on VREG vnodes, the only weird cases
> are NFS (and maybe fuse). I'll let asomers@ address the fuse situation.
>
> >>
> >> > I am thinking that copy_file_range(2) should do this also.
> >> > However, if it returns an error, it is impossible for the caller to
> >> > know how much of the data range got copied.
> >>
> >> A regular write() returns partial success if interrupted by a signal
> >> when it has already written something. Therefore, the application can
> >> resume the operation by adjusting pointers and counts.
> >>
> >> Something similar applies to "deterministic" errors like [EFBIG] where
> >> the first call will write as far as possible (if this is not nothing)
> >> successfully and the next attempt will return the error.
> >>
> >> > What do you think the copy_file_range(2) code should do?
> >>
> >> I'm not sure it should actually be done, but the need for adjusting
> >> pointers and counts could be avoided with a little extra kernel and libc
> >> code. The system call would receive an additional argument pointing to
> >> an off_t that indicates how many bytes previous calls have already
> >> written. A libc wrapper would initialize this to 0. With this, the
> >> system call can be restarted automatically after a signal.
> >>
> >> In any case, [EINTR] and the internal ERESTART must not be returned
> >> unless it is safe to repeat the call with the same (direct) arguments.
> Well, since the copy_file_range(2) syscall is allowed to return fewer bytes copied
> than requested and this doesn't mean EOF, it seems that doing that would
> achieve the result of allowing an application to call it again.
> (Basically, it must be used in a loop until the bytes of the range have been copied,
>  since returning fewer bytes copied than requested is a normal outcome.)
>
> >BTW, if the syscall is made interruptible, it should be made cancellable ?
> Not sure what you mean by "cancellable"? If you mean "terminated by a signal
> where there has been no change to the output file, then that could only easily be
> done by returning EINTR before any data has been copied.
> If you mean something else, then I'd need to know what that is?
See pthread_setcancelstate(3) for start, but the POSIX 1003.1-2017
2.9.5 Thread Cancellation is the definitive spec, including the quite
readable overview.

>
> >I think that PCATCH commonly used for vn_start_write(9) is not the best
> >decision.  It is safe in the sense explained by Jilles, since its interruption
> >only happens at the very beginning of the syscall, but it contradict to the
> >tradition of write(2) to the local fs being not interruptible.
> >
> >I suggest to not make the syscall interruptible by default, and perhaps
> >only allow it with a flag.  Then you would need to explain that the
> >syscall is only interruptible between VOPs, it is up to fs to decide if
> >the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
> This is how it is coded now. The one thing I have noticed is that a
> copy_file_range() can take a long time (about 2min for 2Gbytes on the old hardware
> I test on). This seems like a long delay for <crtl>C when you do that to an application
> copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it isn't a bug
> in copy_file_range(2). It just introduces a long delay in response to <crtl>C.)
That long delay is inconvenience but not something that we should spent
too much time trying to fix. We cause the same delay if program does a
write(2) of several GB, or when very large process like firefox dumps
core.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: should a copy_file_range(2) syscall be interrupted via a signal

Rick Macklem
Konstantin Belousov wrote:

>On Fri, Jul 05, 2019 at 08:59:23PM +0000, Rick Macklem wrote:
>> Konstantin Belousov wrote:
>> >On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
>> >> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
>> >> > I have been working on a Linux compatible copy_file_range(2) syscall
>> >> > (the current code can be found at https://reviews.freebsd.org/D20584).
>> >>
>> >> > One outstanding issue is how it should deal with signals. Right now, I
>> >> > have vn_start_write() without PCATCH, so that it won't be interrupted
>> >> > by a signal, but I notice that vn_write() {ie. write syscall } does
>> >> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
>> >> > called without IO_NODELOCKED.
>> >>
>> >> A regular write() is only interruptible when writing to a terminal,
>> >> pseudo-terminal master, pipe, socket, or, under certain conditions, a
>> >> file on an NFS intr mount. Therefore, applications may not have the code
>> >> to resume interrupted writes to regular files gracefully.
>> Yes, agreed. Since this syscall only works on VREG vnodes, the only weird cases
>> are NFS (and maybe fuse). I'll let asomers@ address the fuse situation.
>>
>> >>
>> >> > I am thinking that copy_file_range(2) should do this also.
>> >> > However, if it returns an error, it is impossible for the caller to
>> >> > know how much of the data range got copied.
>> >>
>> >> A regular write() returns partial success if interrupted by a signal
>> >> when it has already written something. Therefore, the application can
>> >> resume the operation by adjusting pointers and counts.
>> >>
>> >> Something similar applies to "deterministic" errors like [EFBIG] where
>> >> the first call will write as far as possible (if this is not nothing)
>> >> successfully and the next attempt will return the error.
>> >>
>> >> > What do you think the copy_file_range(2) code should do?
>> >>
>> >> I'm not sure it should actually be done, but the need for adjusting
>> >> pointers and counts could be avoided with a little extra kernel and libc
>> >> code. The system call would receive an additional argument pointing to
>> >> an off_t that indicates how many bytes previous calls have already
>> >> written. A libc wrapper would initialize this to 0. With this, the
>> >> system call can be restarted automatically after a signal.
>> >>
>> >> In any case, [EINTR] and the internal ERESTART must not be returned
>> >> unless it is safe to repeat the call with the same (direct) arguments.
>> Well, since the copy_file_range(2) syscall is allowed to return fewer bytes copied
>> than requested and this doesn't mean EOF, it seems that doing that would
>> achieve the result of allowing an application to call it again.
>> (Basically, it must be used in a loop until the bytes of the range have been copied,
>>  since returning fewer bytes copied than requested is a normal outcome.)
>>
>> >BTW, if the syscall is made interruptible, it should be made cancellable ?
>> Not sure what you mean by "cancellable"? If you mean "terminated by a signal
>> where there has been no change to the output file, then that could only easily be
>> done by returning EINTR before any data has been copied.
>> If you mean something else, then I'd need to know what that is?
>See pthread_setcancelstate(3) for start, but the POSIX 1003.1-2017
>2.9.5 Thread Cancellation is the definitive spec, including the quite
>readable overview.
Ok, thanks. That explains why cancellation of NFSv4.2 Copy operations are defined
the way they are.

>>
>> >I think that PCATCH commonly used for vn_start_write(9) is not the best
>> >decision.  It is safe in the sense explained by Jilles, since its interruption
>> >only happens at the very beginning of the syscall, but it contradict to the
>> >tradition of write(2) to the local fs being not interruptible.
>> >
>> >I suggest to not make the syscall interruptible by default, and perhaps
>> >only allow it with a flag.  Then you would need to explain that the
>> >syscall is only interruptible between VOPs, it is up to fs to decide if
>> >the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
>> This is how it is coded now. The one thing I have noticed is that a
>> copy_file_range() can take a long time (about 2min for 2Gbytes on the old hardware
>> I test on). This seems like a long delay for <crtl>C when you do that to an application
>> copying a large file. ("cp" and "dd" also take 2min for 2Gbytes, so it isn't a bug
>> in copy_file_range(2). It just introduces a long delay in response to <crtl>C.)
>That long delay is inconvenience but not something that we should spent
>too much time trying to fix. We cause the same delay if program does a
>write(2) of several GB, or when very large process like firefox dumps
>core.

Well, I am happy to leave the patch the way it is now, where the only case
EINTR/ERESTART is returned is if the VOP_xxx() call for the underlying file
system has returned it (such as an NFS mount with "intr" option).

Thanks, rick
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"