Re: ZSTD Project Weekly Status Update

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
In my continued effort to finish the integration of ZSTD into ZFS, here
is my second weekly status report:

There is still a number of existing feedback items and known issues that
need to be addressed. I am trying to work through those now.

https://github.com/openzfs/zfs/commit/622b7e1e50ab667a6af1685245a2f5a8d5e9bff3
- Addressed an issue where the user could manually set the hidden
compress_level property, causing incorrect operation. The property is
not marked with the PROP_READONLY flag because it requires PROP_INHERIT

- It has been pointed out again recently that setting the compress=zstd
property on a dataset, but not actually writing any blocks, does not set
the zstd feature flag to 'active'. If this pool is then exported, and
imported using an older version of ZFS that does not know of zstd, it
will trigger an ASSERT() when the value of the compression property enum
is out-of-range. The plan is to fully activate the feature when the
property is set, but the details of how (and where) to do still still
need to be worked out.


- I am still working on the issue of inheritance for both the compress
and the hidden compress_level properties. If you create a child dataset
with the compress property set to zstd-13, it works as expected. But if
you `zfs inherit compress` on that dataset, the output of `zfs get
compress,compress_level` changes from:

zof/inherit/zstd-10/zstd-13       compression     zstd-13
            local
zof/inherit/zstd-10/zstd-13       compress_level  13
            local

to:

zof/inherit/zstd-10/zstd-13       compression     zstd-13
            inherited from zof/inherit/zstd-10
zof/inherit/zstd-10/zstd-13       compress_level  13
            local

This is due to the fact that both the parent, and the child, actually
have compress=16 (zstd), and the zstd-10 or zstd-13 is determined by
combining the hidden compress_level property.

The expected behaviour in this case would be that the compression type
(and therefore the compress_level) would get reset to the value from the
parent.

There is a related problem when the child's compression setting is set
to lz4 (or any other type that doesn't use a level).

This project was sponsored by the FreeBSD Foundation.

--
Allan Jude




signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
In my continuing effort to complete the integration of ZSTD into
OpenZFS, here is my third weekly status report:

https://github.com/allanjude/zfs/commit/87bb538bdbc4bb8848ed5791b7b0de84a026ebbe
- Completed the rewrite of the way the compression property is handled,
moving away from the initial approach of storing the compression
property (enum zio_compress) and the level (uint64_t) separately.

Previously we exposed the list of compression algorithms and levels to
userland as the corresponding value from the enum in the lower 7 bits,
and the level in the remaining upper bits. Then, as part of the property
GET and SET IOCTLs, we read the separate compression= and
compress_level= properties from the ZAP and returned the combined value,
or split the combined value into those two separate properties. This
worked but caused a lot of headache around property inheritance.

Instead I've changed to doing the combine/split when reading/writing
from the dataset properties ZAP, via the compression_changed_cb()
function. So the properties ZAP contains the combined value (lower 7
bits are the compression algorithm, as defined in the enum zio_compress,
and the upper bits are the compression level). Elsewhere in ZFS we keep
the two values separate (os_compress and os_complevel, and related
variables in all of the different parts of ZFS).

So now, inheritance of the property is handled correctly, and avoids
issues where a dataset with compression=zstd-12, would say 'inherited
from' a dataset with zstd at some other compression level (since both
actually just had compression=zstd, but different compress_level= values).


I have also further extended zdb to inspect the compression settings
when looking at an object:
https://github.com/allanjude/zfs/commit/3fef3c83b8ce90149110ed989bd9fd3e289798e0


I am still working on a solution for setting the zstd feature flag to
'active' as soon as it is set, rather than only once a block is born.


Additionally, I am investigating how to best handle the fact that
embedded block pointers compressed with ZSTD will make 'zfs send -e'
streams backwards incompatible, without a way for the user to opt-out of
sending a stream that contains zstd compressed blocks that the receiving
side may not be able to read. The same can be said for 'zfs send -c' as
well. I am open to ideas on how best to handle this. I have thought
about only sending ZSTD compressed blocks if the user specifies the -Z
(--zstd) flag, but this can lead to confusion where using -c without -Z
would have to either error out, or send the ZSTD compressed blocks
uncompressed.


This project is sponsored by the FreeBSD Foundation.

--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
In my continuing effort to complete the integration of ZSTD into
OpenZFS, here is my fourth weekly status report:

https://github.com/allanjude/zfs/commit/b0b1270d4e7835ecff413208301375e3de2a4153
- Create a new test case to make sure that the ZSTD header we write
along with the data is correct. Verify that the physical size of the
compressed data is less than the psize for the block pointer, and verify
that the level matches. It uses a random level between 1 and 19 and then
verifies with zdb that the block was compressed with that level.

I am still working on a solution for setting the zstd feature flag to
'active' as soon as it is set, rather than only once a block is born. As
well as fixing up compatibility around zfs send/recv with the embedded
block points flag.

This project is sponsored by the FreeBSD Foundation.


On 2020-07-06 20:07, Allan Jude wrote:

> In my continuing effort to complete the integration of ZSTD into
> OpenZFS, here is my third weekly status report:
>
> https://github.com/allanjude/zfs/commit/87bb538bdbc4bb8848ed5791b7b0de84a026ebbe
> - Completed the rewrite of the way the compression property is handled,
> moving away from the initial approach of storing the compression
> property (enum zio_compress) and the level (uint64_t) separately.
>
> Previously we exposed the list of compression algorithms and levels to
> userland as the corresponding value from the enum in the lower 7 bits,
> and the level in the remaining upper bits. Then, as part of the property
> GET and SET IOCTLs, we read the separate compression= and
> compress_level= properties from the ZAP and returned the combined value,
> or split the combined value into those two separate properties. This
> worked but caused a lot of headache around property inheritance.
>
> Instead I've changed to doing the combine/split when reading/writing
> from the dataset properties ZAP, via the compression_changed_cb()
> function. So the properties ZAP contains the combined value (lower 7
> bits are the compression algorithm, as defined in the enum zio_compress,
> and the upper bits are the compression level). Elsewhere in ZFS we keep
> the two values separate (os_compress and os_complevel, and related
> variables in all of the different parts of ZFS).
>
> So now, inheritance of the property is handled correctly, and avoids
> issues where a dataset with compression=zstd-12, would say 'inherited
> from' a dataset with zstd at some other compression level (since both
> actually just had compression=zstd, but different compress_level= values).
>
>
> I have also further extended zdb to inspect the compression settings
> when looking at an object:
> https://github.com/allanjude/zfs/commit/3fef3c83b8ce90149110ed989bd9fd3e289798e0
>
>
> I am still working on a solution for setting the zstd feature flag to
> 'active' as soon as it is set, rather than only once a block is born.
>
>
> Additionally, I am investigating how to best handle the fact that
> embedded block pointers compressed with ZSTD will make 'zfs send -e'
> streams backwards incompatible, without a way for the user to opt-out of
> sending a stream that contains zstd compressed blocks that the receiving
> side may not be able to read. The same can be said for 'zfs send -c' as
> well. I am open to ideas on how best to handle this. I have thought
> about only sending ZSTD compressed blocks if the user specifies the -Z
> (--zstd) flag, but this can lead to confusion where using -c without -Z
> would have to either error out, or send the ZSTD compressed blocks
> uncompressed.
>
>
> This project is sponsored by the FreeBSD Foundation.
>

--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
This is the fifth weekly status report on the project to integrate ZSTD
into OpenZFS.

https://github.com/c0d3z3r0/zfs/pull/14/commits/9807c99169e5931a754bb0df68267ffa2f289474
- Created a new test case to ensure that ZSTD compressed blocks survive
replication with the -c flag. We wanted to make sure the on-disk
compression header survived the trip.

https://github.com/c0d3z3r0/zfs/pull/14/commits/94bef464fc304e9d6f5850391e41720c3955af11
- I split the zstd.c file into OS specific bits
(module/os/{linux,freebsd}/zstd_os.c) and also split the .h file into
zstd.h and zstd_impl.h. This was done so that FreeBSD can use the
existing kmem_cache mechanism, while Linux can use the vmem_alloc pool
created in the earlier versions of this patch. I significantly changed
the FreeBSD implementation from my earlier work, to reuse the power of 2
zio_data_buf_cache[]'s that already exist, only adding a few additional
kmem_caches for large blocks with high compression levels. This should
avoid creating as many unnecessary kmem caches.

https://github.com/c0d3z3r0/zfs/pull/14/commits/3d48243b77e6c8c3bf562c7a2315dd6cc571f28c
- Lastly, in my testing I was seeing a lot of hits on the new
compression failure kstat I added. This was caused by the ZFS "early
abort" feature, where we give the compressor an output buffer that is
smaller than the input, so it will fail if the block will not compress
enough to be worth it. This helps avoid wasting CPU on uncompressible
blocks. However, it seems the 'one-file' version of zstd we are using
does not expose the ZSTD_ErrorCode enum. This needs to be investigated
further to avoid issues if the value changes (although it is apparently
stable after version 1.3.1).

I am still working on a solution for zfs send stream compatibility. I am
leaning towards creating a new flag, --zstd, to enable ZSTD compressed
output. If the -e or -c flag are used without the --zstd flag, and the
dataset has the zstd feature active, the idea would be to emit a warning
but send the blocks uncompressed, so that the stream remains compatible
with older versions of ZFS. I will be discussing this on the OpenZFS
Leadership call tomorrow, and am open to suggestions on how to best
handle this.


On 2020-07-14 22:26, Allan Jude wrote:

> In my continuing effort to complete the integration of ZSTD into
> OpenZFS, here is my fourth weekly status report:
>
> https://github.com/allanjude/zfs/commit/b0b1270d4e7835ecff413208301375e3de2a4153
> - Create a new test case to make sure that the ZSTD header we write
> along with the data is correct. Verify that the physical size of the
> compressed data is less than the psize for the block pointer, and verify
> that the level matches. It uses a random level between 1 and 19 and then
> verifies with zdb that the block was compressed with that level.
>
> I am still working on a solution for setting the zstd feature flag to
> 'active' as soon as it is set, rather than only once a block is born. As
> well as fixing up compatibility around zfs send/recv with the embedded
> block points flag.
>
> This project is sponsored by the FreeBSD Foundation.
>
>
--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
This is the sixth weekly status report on the project to integrate ZSTD
into OpenZFS.

https://github.com/openzfs/zfs/pull/10631 - Improved the `zfs recv`
error handling when it receives an out-of-bounds property value.
Specifically, if a zfs send stream is created that supports a newer
compression or checksum type, the property will fail to be set on the
receiving system. This is fine, but `zfs recv` would abort() and create
a core file, rather than reporting the error, because it did not
understand the EINVAL being returned for that property. In the case
where the property is outside the accepted range, we now return the new
ZFS_ERR_BADPROP value, and the correct message is displayed to the user.
I opted not to use ERANGE because that is used for 'this property value
should not be used on a root pool'. The idea is to get this fix merged
into the 0.8.x branch for the next point release, to improve
compatibility with streams generated by OpenZFS 2.0


https://github.com/openzfs/zfs/pull/10632 - General improvement to error
handling when the error code is EZFS_UNKNOWN.


https://github.com/allanjude/zfs/commit/8f37c1ad8edaff20a550b3df07995dab80c06492
- ZFS replication compatibility improvements. As discussed on the
leadership call earlier this month, keep the compatibility simple. If
the -c flag is given, send blocks compressed with any compression
algorithm. The improved error handling will let the user know if their
system can't handle ZSTD.


https://github.com/allanjude/zfs/commit/0ffd80e281f79652973378599cd0332172f365bd
- per-dataset feature activation. This switches the ZSTD feature flag
from 'enabled' to 'active' as soon as the property is set, instead of
when the first block is written. This ensures that the pool can't be
imported on a system that does not support ZSTD that will cause the ZFS
cli tools to panic.


I will be working on adding some tests for the feature activation.

I've been looking at ways to add tests for the replication changes, but
it doesn't seem to be easy to test the results of a 'zfs recv' that does
not know about ZSTD (where the values are outside of the valid range for
the enum). If anyone has any ideas here, I'd be very interested.


On 2020-07-20 23:40, Allan Jude wrote:

> This is the fifth weekly status report on the project to integrate ZSTD
> into OpenZFS.
>
> https://github.com/c0d3z3r0/zfs/pull/14/commits/9807c99169e5931a754bb0df68267ffa2f289474
> - Created a new test case to ensure that ZSTD compressed blocks survive
> replication with the -c flag. We wanted to make sure the on-disk
> compression header survived the trip.
>
> https://github.com/c0d3z3r0/zfs/pull/14/commits/94bef464fc304e9d6f5850391e41720c3955af11
> - I split the zstd.c file into OS specific bits
> (module/os/{linux,freebsd}/zstd_os.c) and also split the .h file into
> zstd.h and zstd_impl.h. This was done so that FreeBSD can use the
> existing kmem_cache mechanism, while Linux can use the vmem_alloc pool
> created in the earlier versions of this patch. I significantly changed
> the FreeBSD implementation from my earlier work, to reuse the power of 2
> zio_data_buf_cache[]'s that already exist, only adding a few additional
> kmem_caches for large blocks with high compression levels. This should
> avoid creating as many unnecessary kmem caches.
>
> https://github.com/c0d3z3r0/zfs/pull/14/commits/3d48243b77e6c8c3bf562c7a2315dd6cc571f28c
> - Lastly, in my testing I was seeing a lot of hits on the new
> compression failure kstat I added. This was caused by the ZFS "early
> abort" feature, where we give the compressor an output buffer that is
> smaller than the input, so it will fail if the block will not compress
> enough to be worth it. This helps avoid wasting CPU on uncompressible
> blocks. However, it seems the 'one-file' version of zstd we are using
> does not expose the ZSTD_ErrorCode enum. This needs to be investigated
> further to avoid issues if the value changes (although it is apparently
> stable after version 1.3.1).
>
> I am still working on a solution for zfs send stream compatibility. I am
> leaning towards creating a new flag, --zstd, to enable ZSTD compressed
> output. If the -e or -c flag are used without the --zstd flag, and the
> dataset has the zstd feature active, the idea would be to emit a warning
> but send the blocks uncompressed, so that the stream remains compatible
> with older versions of ZFS. I will be discussing this on the OpenZFS
> Leadership call tomorrow, and am open to suggestions on how to best
> handle this.
>
>
> On 2020-07-14 22:26, Allan Jude wrote:
>> In my continuing effort to complete the integration of ZSTD into
>> OpenZFS, here is my fourth weekly status report:
>>
>> https://github.com/allanjude/zfs/commit/b0b1270d4e7835ecff413208301375e3de2a4153
>> - Create a new test case to make sure that the ZSTD header we write
>> along with the data is correct. Verify that the physical size of the
>> compressed data is less than the psize for the block pointer, and verify
>> that the level matches. It uses a random level between 1 and 19 and then
>> verifies with zdb that the block was compressed with that level.
>>
>> I am still working on a solution for setting the zstd feature flag to
>> 'active' as soon as it is set, rather than only once a block is born. As
>> well as fixing up compatibility around zfs send/recv with the embedded
>> block points flag.
>>
>> This project is sponsored by the FreeBSD Foundation.
>>
>>
>

--
Allan Jude


signature.asc (851 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ZSTD Project Weekly Status Update

Allan Jude-9
This is the seventh weekly status report on the project to integrate
ZSTD into OpenZFS.

The compatibility related changes I created last week were refined and
marged into the mainline branch.

Thanks to Brian Behlendorf for reviewing my proposed change for the zstd
feature flag activation, and pointing out a better approach. I have
reworked the patch based on his suggestion and prototype:

https://github.com/allanjude/zfs/commit/2508dafcec0a05d61afc5fbd5da356e201afbe97
- Activate the per-dataset ZSTD feature flag as soon as the property is
set to ZSTD. Before, simply doing `zfs set compression=zstd dataset`
would not activate the feature flag. The feature flag would be activated
when the first block that used ZSTD compression was written (see
dsl_dataset_block_born()). This meant that if you set the property,
exported the pool, the pool would import on systems with older versions
of ZFS that did not support ZSTD, but would crash their userspace tools,
because the property value was out of bounds.


https://github.com/allanjude/zfs/commit/b8bec3fd2a8feb3a4de572eb15515d3764f92a35
- I created a test that ensures that the feature flag is activated by
`zfs set compression=zstd` and also ensures that the feature flag
reverts to the 'enabled' state once the last dataset using zstd is
destroyed.


The next step is ensuring that ZSTD compression inter-operates properly
with the L2ARC and Encryption etc.

I've also been discussing ideas with Brian about future-proofing, to
handle the case where a newer version of ZSTD might compression the same
input differently (better ratio), and how that would impact L2ARC,
nop-write, etc. One idea (originally from Pawel Dawidek) is to do
something similar to what encryption does, and split the checksum field.
Using half to checksum the original data, and half the compressed
version. This would allow ZFS to detect when the same content compressed
differently (combined with the ZSTD version header in the compressed
data), giving better compatibility as we upgrade ZSTD.


This project is sponsored by the FreeBSD Foundation.



On 2020-07-29 21:10, Allan Jude wrote:

> This is the sixth weekly status report on the project to integrate ZSTD
> into OpenZFS.
>
> https://github.com/openzfs/zfs/pull/10631 - Improved the `zfs recv`
> error handling when it receives an out-of-bounds property value.
> Specifically, if a zfs send stream is created that supports a newer
> compression or checksum type, the property will fail to be set on the
> receiving system. This is fine, but `zfs recv` would abort() and create
> a core file, rather than reporting the error, because it did not
> understand the EINVAL being returned for that property. In the case
> where the property is outside the accepted range, we now return the new
> ZFS_ERR_BADPROP value, and the correct message is displayed to the user.
> I opted not to use ERANGE because that is used for 'this property value
> should not be used on a root pool'. The idea is to get this fix merged
> into the 0.8.x branch for the next point release, to improve
> compatibility with streams generated by OpenZFS 2.0
>
>
> https://github.com/openzfs/zfs/pull/10632 - General improvement to error
> handling when the error code is EZFS_UNKNOWN.
>
>
> https://github.com/allanjude/zfs/commit/8f37c1ad8edaff20a550b3df07995dab80c06492
> - ZFS replication compatibility improvements. As discussed on the
> leadership call earlier this month, keep the compatibility simple. If
> the -c flag is given, send blocks compressed with any compression
> algorithm. The improved error handling will let the user know if their
> system can't handle ZSTD.
>
>
> https://github.com/allanjude/zfs/commit/0ffd80e281f79652973378599cd0332172f365bd
> - per-dataset feature activation. This switches the ZSTD feature flag
> from 'enabled' to 'active' as soon as the property is set, instead of
> when the first block is written. This ensures that the pool can't be
> imported on a system that does not support ZSTD that will cause the ZFS
> cli tools to panic.
>
>
> I will be working on adding some tests for the feature activation.
>
> I've been looking at ways to add tests for the replication changes, but
> it doesn't seem to be easy to test the results of a 'zfs recv' that does
> not know about ZSTD (where the values are outside of the valid range for
> the enum). If anyone has any ideas here, I'd be very interested.
>
>
> On 2020-07-20 23:40, Allan Jude wrote:
>> This is the fifth weekly status report on the project to integrate ZSTD
>> into OpenZFS.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/9807c99169e5931a754bb0df68267ffa2f289474
>> - Created a new test case to ensure that ZSTD compressed blocks survive
>> replication with the -c flag. We wanted to make sure the on-disk
>> compression header survived the trip.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/94bef464fc304e9d6f5850391e41720c3955af11
>> - I split the zstd.c file into OS specific bits
>> (module/os/{linux,freebsd}/zstd_os.c) and also split the .h file into
>> zstd.h and zstd_impl.h. This was done so that FreeBSD can use the
>> existing kmem_cache mechanism, while Linux can use the vmem_alloc pool
>> created in the earlier versions of this patch. I significantly changed
>> the FreeBSD implementation from my earlier work, to reuse the power of 2
>> zio_data_buf_cache[]'s that already exist, only adding a few additional
>> kmem_caches for large blocks with high compression levels. This should
>> avoid creating as many unnecessary kmem caches.
>>
>> https://github.com/c0d3z3r0/zfs/pull/14/commits/3d48243b77e6c8c3bf562c7a2315dd6cc571f28c
>> - Lastly, in my testing I was seeing a lot of hits on the new
>> compression failure kstat I added. This was caused by the ZFS "early
>> abort" feature, where we give the compressor an output buffer that is
>> smaller than the input, so it will fail if the block will not compress
>> enough to be worth it. This helps avoid wasting CPU on uncompressible
>> blocks. However, it seems the 'one-file' version of zstd we are using
>> does not expose the ZSTD_ErrorCode enum. This needs to be investigated
>> further to avoid issues if the value changes (although it is apparently
>> stable after version 1.3.1).
>>
>> I am still working on a solution for zfs send stream compatibility. I am
>> leaning towards creating a new flag, --zstd, to enable ZSTD compressed
>> output. If the -e or -c flag are used without the --zstd flag, and the
>> dataset has the zstd feature active, the idea would be to emit a warning
>> but send the blocks uncompressed, so that the stream remains compatible
>> with older versions of ZFS. I will be discussing this on the OpenZFS
>> Leadership call tomorrow, and am open to suggestions on how to best
>> handle this.
>>
>>
>> On 2020-07-14 22:26, Allan Jude wrote:
>>> In my continuing effort to complete the integration of ZSTD into
>>> OpenZFS, here is my fourth weekly status report:
>>>
>>> https://github.com/allanjude/zfs/commit/b0b1270d4e7835ecff413208301375e3de2a4153
>>> - Create a new test case to make sure that the ZSTD header we write
>>> along with the data is correct. Verify that the physical size of the
>>> compressed data is less than the psize for the block pointer, and verify
>>> that the level matches. It uses a random level between 1 and 19 and then
>>> verifies with zdb that the block was compressed with that level.
>>>
>>> I am still working on a solution for setting the zstd feature flag to
>>> 'active' as soon as it is set, rather than only once a block is born. As
>>> well as fixing up compatibility around zfs send/recv with the embedded
>>> block points flag.
>>>
>>> This project is sponsored by the FreeBSD Foundation.
>>>
>>>
>>
>
>

--
Allan Jude


signature.asc (851 bytes) Download Attachment