Currently, every buffer cached in the L2ARC is accompanied by a 240-byte
header in memory, leading to very high memory consumption when using very
large cache devices. These changes significantly reduce this overhead.
L1-only | 176 B | 176 B | (same)
L1 & L2 | 240 B | 208 B | (saved 32 bytes)
L2-only | 240 B | 128 B | (saved 116 bytes)
For an average blocksize of 8KB, this means that for the L2ARC, the ratio
of metadata to data has gone down from about 2.92% to 1.56%. For a
'storage optimized' EC2 instance with 1600GB of SSD and 60GB of RAM, this
means that we expect a completely full L2ARC to use (1600 GB * 0.0156) /
60GB = 41% of the available memory, down from 78%.
288546 by mav:
MFC r286556: Avoid 128K kmem allocations in mzap_upgrade()
A ZFS feature flags (large blocks) tracks its refcounts as the number of
datasets that have ever used the feature. Several features of this type
are planned to be added (new checksum functions). This code should be made
common infrastructure rather than duplicating the code for each feature.
288571 by mav:
MFC r286705: 5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
While running 'zfs recv' we noticed that every 128th 8K block required a
read. We were seeing that restore_write() was calling dmu_tx_hold_write()
and the indirect block was not cached. We should prefetch upcoming indirect
blocks to avoid having to go to disk and blocking the restore_write().
Allow an incremental send stream to be received as a clone, even if the
stream does not mark it as a clone.
288570 by mav:
MFC r286689: 5981 Deadlock in dmu_objset_find_dp
https://www.illumos.org/issues/5981 When dmu_objset_find_dp gets called with a read lock held, it fans out
the work to the task queue. Each task in turn acquires its own read
lock before calling the callback. If during this process anyone tries
to a acquire a write lock, it will stall all read lock requests.Thus
the tasks will never finish, the read lock of the caller will never
get freed and the write lock never acquired. deadlock.
https://www.illumos.org/issues/5269 When importing a pool (at boot or with zpool import) with many
filesystem, the process can take minutes. It doesn't matter whether
the pool has been exported cleanly or uncleanly. The problem is that
each dataset has its own log chain. On import, all datasets have to be
checked if there are logs to replay. The idea is to speed up this
process by paralellizing it.
https://www.illumos.org/issues/5695 In dmu_sync_ready(), a hole block pointer will have it's logical size
explicitly set as it's necessary for replay purposes. To "undo" this,
dmu_sync_done() will zero out any hole that it finds. This becomes a
problem when using the "hole_birth" feature, as this will also wipe out
any birth time that might have happened to be set on the hole.
As a fix, the logic to zero out a hole is only applied to old style
holes with a birth time of zero. Holes created with the "hole_birth"
feature enabled will have a non-zero birth time, and will be skipped
(thus preserving the ltime, type, and level information as well).
In addition, zdb was updated to also print the ltime, type, and level
information for these new style holes. Previously, only the logical
birth time would be printed.
288557 by mav:
MFC r286598: 5701 zpool list reports incorrect "alloc" value for cache devices
288556 by alc:
The conversion of kmem_alloc_attr() from operating on a vm map to a vmem
arena in r254025 introduced a bug in the case when an allocation is only
partially successful. Specifically, the vm object lock was not being
acquired before freeing the allocated pages. To address this bug, replace
the existing code by a call to kmem_unback().
Change the type of a variable in kmem_alloc_attr() so that an allocation
of two or more gigabytes won't fail.
Replace the error handling code in kmem_back() by a call to kmem_unback().