Hole-punching, TRIM, etc

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Hole-punching, TRIM, etc

Alan Somers-2
Hole-punching has been discussed on these lists before[1].  It basically
means to turn a dense file into a sparse file by deallocating storage for
some of the blocks in the middle.  There's no standard API for it.  Linux
uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).

A related concept is telling a block device that some blocks are no longer
used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
"Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
basically the same thing, and it's analogous to hole-punching for regular
files.  They are also all inaccessible from FreeBSD's userland except by
using pass(4), which is inconvenient and protocol-specific.

Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
but it's totally undocumented and doesn't work on regular files.

I propose adding support for all of these things using the fcntl(2) API.
Using the same syntax that Solaris defined, you would be able to punch a
hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
(though FreeBSD's port never did, and the code was deleted in r303763).
Here's what I would do:

1) Add the F_FREESP command to fcntl(2).
2) Add a .fo_space field for struct fileops
3) Add a devfs_space method that implements .fo_space
4) Add a .d_space field to struct cdevsw
5) Add a g_dev_space method for GEOM that implements .d_space using
BIO_DELETE.
6) Add a VOP_SPACE vop
7) Implement VOP_SPACE for tmpfs
8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).

The greatest beneficiaries of this work would be type 2 hypervisors like
QEMU and VirtualBox with guests that use TRIM, and userland filesystems
such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
using SPDK would also benefit.  The last item, aio_freesp(2), may seem
unnecessary but it would really benefit my application.

Questions, objections, flames?

-Alan

[1] https://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010881.html
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Warner Losh
On Tue, Nov 13, 2018 at 3:10 PM Alan Somers <[hidden email]> wrote:

> Hole-punching has been discussed on these lists before[1].  It basically
> means to turn a dense file into a sparse file by deallocating storage for
> some of the blocks in the middle.  There's no standard API for it.  Linux
> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>
> A related concept is telling a block device that some blocks are no longer
> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> basically the same thing, and it's analogous to hole-punching for regular
> files.  They are also all inaccessible from FreeBSD's userland except by
> using pass(4), which is inconvenient and protocol-specific.
>
> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
> but it's totally undocumented and doesn't work on regular files.
>
> I propose adding support for all of these things using the fcntl(2) API.
> Using the same syntax that Solaris defined, you would be able to punch a
> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
> (though FreeBSD's port never did, and the code was deleted in r303763).
> Here's what I would do:
>
> 1) Add the F_FREESP command to fcntl(2).
> 2) Add a .fo_space field for struct fileops
> 3) Add a devfs_space method that implements .fo_space
> 4) Add a .d_space field to struct cdevsw
> 5) Add a g_dev_space method for GEOM that implements .d_space using
> BIO_DELETE.
> 6) Add a VOP_SPACE vop
> 7) Implement VOP_SPACE for tmpfs
> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>
> The greatest beneficiaries of this work would be type 2 hypervisors like
> QEMU and VirtualBox with guests that use TRIM, and userland filesystems
> such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
> using SPDK would also benefit.  The last item, aio_freesp(2), may seem
> unnecessary but it would really benefit my application.
>
> Questions, objections, flames?
>

So the fcntl would deallocate blocks from a filesystem only. The filesystem
may issue BIO_DELETE as a result, but that's up to the filesystem, correct?

On a raw device it would be translated into a BIO_DELETE command directly,
correct?

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Conrad Meyer-2
In reply to this post by Alan Somers-2
Hi Alan,

On Tue, Nov 13, 2018 at 2:10 PM Alan Somers <[hidden email]> wrote:

>
> Hole-punching has been discussed on these lists before[1].  It basically
> means to turn a dense file into a sparse file by deallocating storage for
> some of the blocks in the middle.  There's no standard API for it.  Linux
> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>
> A related concept is telling a block device that some blocks are no longer
> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> basically the same thing, and it's analogous to hole-punching for regular
> files.  They are also all inaccessible from FreeBSD's userland except by
> using pass(4), which is inconvenient and protocol-specific.

Geom devices have the DIOCGDELETE ioctl, which translates into
BIO_DELETE (which is TRIM, as I understand it).  It's available in
libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.

> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
> but it's totally undocumented and doesn't work on regular files.
>
> I propose adding support for all of these things using the fcntl(2) API.
> Using the same syntax that Solaris defined, you would be able to punch a
> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports it
> (though FreeBSD's port never did, and the code was deleted in r303763).
> Here's what I would do:
>
> 1) Add the F_FREESP command to fcntl(2).
> 2) Add a .fo_space field for struct fileops
> 3) Add a devfs_space method that implements .fo_space
> 4) Add a .d_space field to struct cdevsw
> 5) Add a g_dev_space method for GEOM that implements .d_space using
> BIO_DELETE.
> 6) Add a VOP_SPACE vop
> 7) Implement VOP_SPACE for tmpfs
> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).

Why not just add DIOCGDELETE support to various VOP_IOCTL
implementations?  The file objects forward correctly through vn_ioctl
to VOP_IOCTL for both regular files and devfs VCHR nodes.

We can emulate the Linux API if we want to be compatible there, but I
wouldn't bother with Solaris.

Best,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Alan Somers-2
In reply to this post by Warner Losh
On Tue, Nov 13, 2018 at 3:51 PM Warner Losh <[hidden email]> wrote:

>
>
> On Tue, Nov 13, 2018 at 3:10 PM Alan Somers <[hidden email]> wrote:
>
>> Hole-punching has been discussed on these lists before[1].  It basically
>> means to turn a dense file into a sparse file by deallocating storage for
>> some of the blocks in the middle.  There's no standard API for it.  Linux
>> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
>>
>> A related concept is telling a block device that some blocks are no longer
>> used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
>> "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
>> basically the same thing, and it's analogous to hole-punching for regular
>> files.  They are also all inaccessible from FreeBSD's userland except by
>> using pass(4), which is inconvenient and protocol-specific.
>>
>> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland,
>> but it's totally undocumented and doesn't work on regular files.
>>
>> I propose adding support for all of these things using the fcntl(2) API.
>> Using the same syntax that Solaris defined, you would be able to punch a
>> hole in a regular file or TRIM blocks from an SSD.  ZFS already supports
>> it
>> (though FreeBSD's port never did, and the code was deleted in r303763).
>> Here's what I would do:
>>
>> 1) Add the F_FREESP command to fcntl(2).
>> 2) Add a .fo_space field for struct fileops
>> 3) Add a devfs_space method that implements .fo_space
>> 4) Add a .d_space field to struct cdevsw
>> 5) Add a g_dev_space method for GEOM that implements .d_space using
>> BIO_DELETE.
>> 6) Add a VOP_SPACE vop
>> 7) Implement VOP_SPACE for tmpfs
>> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>>
>> The greatest beneficiaries of this work would be type 2 hypervisors like
>> QEMU and VirtualBox with guests that use TRIM, and userland filesystems
>> such as fusefs-ext2 and fusefs-exfat.  High-performance storage systems
>> using SPDK would also benefit.  The last item, aio_freesp(2), may seem
>> unnecessary but it would really benefit my application.
>>
>> Questions, objections, flames?
>>
>
> So the fcntl would deallocate blocks from a filesystem only. The
> filesystem may issue BIO_DELETE as a result, but that's up to the
> filesystem, correct?
>

Correct.


>
> On a raw device it would be translated into a BIO_DELETE command directly,
> correct?
>

Correct, modulo edge cases.


>
> Warner
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Alan Somers-2
In reply to this post by Conrad Meyer-2
On Tue, Nov 13, 2018 at 3:51 PM Conrad Meyer <[hidden email]> wrote:

> Hi Alan,
>
> On Tue, Nov 13, 2018 at 2:10 PM Alan Somers <[hidden email]> wrote:
> >
> > Hole-punching has been discussed on these lists before[1].  It basically
> > means to turn a dense file into a sparse file by deallocating storage for
> > some of the blocks in the middle.  There's no standard API for it.  Linux
> > uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2).
> >
> > A related concept is telling a block device that some blocks are no
> longer
> > used.  SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it
> > "Deallocate", ZBC and ZAC call it "Reset Write Pointer".  They all do
> > basically the same thing, and it's analogous to hole-punching for regular
> > files.  They are also all inaccessible from FreeBSD's userland except by
> > using pass(4), which is inconvenient and protocol-specific.
>
> Geom devices have the DIOCGDELETE ioctl, which translates into
> BIO_DELETE (which is TRIM, as I understand it).  It's available in
> libgeom as g_delete() and used by hastd, newfs_nandfs, and nandtool.
>

Ahh, I thought there must be such a thing, but I couldn't find it.


>
> > Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from
> userland,
> > but it's totally undocumented and doesn't work on regular files.
> >
> > I propose adding support for all of these things using the fcntl(2) API.
> > Using the same syntax that Solaris defined, you would be able to punch a
> > hole in a regular file or TRIM blocks from an SSD.  ZFS already supports
> it
> > (though FreeBSD's port never did, and the code was deleted in r303763).
> > Here's what I would do:
> >
> > 1) Add the F_FREESP command to fcntl(2).
> > 2) Add a .fo_space field for struct fileops
> > 3) Add a devfs_space method that implements .fo_space
> > 4) Add a .d_space field to struct cdevsw
> > 5) Add a g_dev_space method for GEOM that implements .d_space using
> > BIO_DELETE.
> > 6) Add a VOP_SPACE vop
> > 7) Implement VOP_SPACE for tmpfs
> > 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>
> Why not just add DIOCGDELETE support to various VOP_IOCTL
> implementations?  The file objects forward correctly through vn_ioctl
> to VOP_IOCTL for both regular files and devfs VCHR nodes.
>
> We can emulate the Linux API if we want to be compatible there, but I
> wouldn't bother with Solaris.
>

The only reason that I prefer the Solaris API is because it doesn't require
adding another syscall, and because Linux's fallocate(2) does a whole bunch
of other things besides hole-punching.

What about an asynchronous version?  ioctl(2) is still synchronous.  Do you
see any better way to hole-punch/TRIM asynchronously than with aio?


>
> Best,
> Conrad
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Poul-Henning Kamp
In reply to this post by Warner Losh
--------
In message <[hidden email]>
, Warner Losh writes:

>On a raw device it would be translated into a BIO_DELETE command directly,
>correct?

We already have ioctl(DIOCGDELETE) for that.  newfs(8) uses it.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Hole-punching, TRIM, etc

Conrad Meyer-2
In reply to this post by Alan Somers-2
On Tue, Nov 13, 2018 at 2:59 PM Alan Somers <[hidden email]> wrote:

>
> On Tue, Nov 13, 2018 at 3:51 PM Conrad Meyer <[hidden email]> wrote:
>>
>> On Tue, Nov 13, 2018 at 2:10 PM Alan Somers <[hidden email]> wrote:
>> > ...
>> > 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP).
>>
>> Why not just add DIOCGDELETE support to various VOP_IOCTL
>> implementations?  The file objects forward correctly through vn_ioctl
>> to VOP_IOCTL for both regular files and devfs VCHR nodes.
>>
>> We can emulate the Linux API if we want to be compatible there, but I
>> wouldn't bother with Solaris.
>
> The only reason that I prefer the Solaris API is because it doesn't require adding another syscall, and because Linux's fallocate(2) does a whole bunch of other things besides hole-punching.

I am imagining that if we went this route, we would implement Linux
fallocate as a library shim around the native FreeBSD ioctl (or
whatever) rather than an independent system call.  This would be for
API compatibility, not ABI compatibility.  But Linux compat can be set
aside for now, I think — it's a secondary concern.

> What about an asynchronous version?  ioctl(2) is still synchronous.  Do you see any better way to hole-punch/TRIM asynchronously than with aio?

Yeah, this is a good consideration.  No, I don't have any better
suggestion for an asynchronous API.  In general our VOPs tend to be
synchronous.  Aio does seem like the logical home for a new
asynchronous API.

Best regards,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"