MAXPHYS bump for FreeBSD 13

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

MAXPHYS bump for FreeBSD 13

Warner Losh
Greetings,

We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
we normally use (though there are exceptions).

I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
DFLTPHYS to 1MB.

128k was good back in the 90s/2000s when memory was smaller, drives did
smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
devices can easily do 1MB or more and there's performance benefits from
scheduling larger I/Os.

Bumping this will mean larger struct buf and struct bio. Without some
concerted effort, it's hard to make this be a sysctl tunable. While that's
desirable, perhaps, it shouldn't gate this bump. The increase in size for
1MB is modest enough.

The NVMe driver currently is limited to 1MB transfers due to limitations in
the NVMe scatter gather lists and a desire to preallocate as much as
possible up front. Most NVMe drivers have maximum transfer sizes between
128k and 1MB, with larger being the trend.

The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
some cards hamper bumping it beyond about 2MB.

The AHCI driver is happy with 1MB and larger sizes.

Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large
even for our needs due to limiting factors in the upper layers making it
hard to schedule I/Os larger than 3-4MB reliably.

So this should be a relatively low risk, and high benefit.

I don't think other kernel tunables need to change, but I always run into
trouble with runningbufs :)

Comments? Anything I forgot?

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Gary Jennejohn-6
On Fri, 13 Nov 2020 11:33:30 -0700
Warner Losh <[hidden email]> wrote:

> Greetings,
>
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
> we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.
>
> 128k was good back in the 90s/2000s when memory was smaller, drives did
> smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> devices can easily do 1MB or more and there's performance benefits from
> scheduling larger I/Os.
>
> Bumping this will mean larger struct buf and struct bio. Without some
> concerted effort, it's hard to make this be a sysctl tunable. While that's
> desirable, perhaps, it shouldn't gate this bump. The increase in size for
> 1MB is modest enough.
>
> The NVMe driver currently is limited to 1MB transfers due to limitations in
> the NVMe scatter gather lists and a desire to preallocate as much as
> possible up front. Most NVMe drivers have maximum transfer sizes between
> 128k and 1MB, with larger being the trend.
>
> The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> some cards hamper bumping it beyond about 2MB.
>
> The AHCI driver is happy with 1MB and larger sizes.
>
> Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large
> even for our needs due to limiting factors in the upper layers making it
> hard to schedule I/Os larger than 3-4MB reliably.
>
> So this should be a relatively low risk, and high benefit.
>
> I don't think other kernel tunables need to change, but I always run into
> trouble with runningbufs :)
>
> Comments? Anything I forgot?
>

Seems like a good idea to me.  I tried 1MB a few months ago and saw
no problems, although that change had little effect on transfers to
my SSDs.  Still, it could be useful for spinning rust.

--
Gary Jennejohn
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Konstantin Belousov
In reply to this post by Warner Losh
On Fri, Nov 13, 2020 at 11:33:30AM -0700, Warner Losh wrote:

> Greetings,
>
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
> we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.
>
> 128k was good back in the 90s/2000s when memory was smaller, drives did
> smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> devices can easily do 1MB or more and there's performance benefits from
> scheduling larger I/Os.
>
> Bumping this will mean larger struct buf and struct bio. Without some
> concerted effort, it's hard to make this be a sysctl tunable. While that's
> desirable, perhaps, it shouldn't gate this bump. The increase in size for
> 1MB is modest enough.
To put the specific numbers, for struct buf it means increase by 1792 bytes.
For bio it does not, because it does not embed vm_page_t[] into the structure.

Worse, typical struct buf addend for excess vm_page pointers is going to be
unused, because normal size of the UFS block is 32K.  It is going to be only
used by clusters and physbufs.

So I object against bumping this value without reworking buffers handling
of b_pages[].  Most straightforward approach is stop using MAXPHYS to size
this array, and use external array for clusters.  Pbufs can embed large
array.

>
> The NVMe driver currently is limited to 1MB transfers due to limitations in
> the NVMe scatter gather lists and a desire to preallocate as much as
> possible up front. Most NVMe drivers have maximum transfer sizes between
> 128k and 1MB, with larger being the trend.
>
> The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> some cards hamper bumping it beyond about 2MB.
>
> The AHCI driver is happy with 1MB and larger sizes.
>
> Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large
> even for our needs due to limiting factors in the upper layers making it
> hard to schedule I/Os larger than 3-4MB reliably.
>
> So this should be a relatively low risk, and high benefit.
>
> I don't think other kernel tunables need to change, but I always run into
> trouble with runningbufs :)
>
> Comments? Anything I forgot?
>
> Warner
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Greg 'groggy' Lehey-3
In reply to this post by Warner Losh
On Friday, 13 November 2020 at 11:33:30 -0700, Warner Losh wrote:
> Greetings,
>
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
> we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.

Sounds long overdue to me.

Greg
--
Sent from my desktop computer.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

signature.asc (169 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Ian Lepore-3
In reply to this post by Warner Losh
On Fri, 2020-11-13 at 11:33 -0700, Warner Losh wrote:

> Greetings,
>
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
> we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.
>
> 128k was good back in the 90s/2000s when memory was smaller, drives did
> smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> devices can easily do 1MB or more and there's performance benefits from
> scheduling larger I/Os.
>
> Bumping this will mean larger struct buf and struct bio. Without some
> concerted effort, it's hard to make this be a sysctl tunable. While that's
> desirable, perhaps, it shouldn't gate this bump. The increase in size for
> 1MB is modest enough.
>
> The NVMe driver currently is limited to 1MB transfers due to limitations in
> the NVMe scatter gather lists and a desire to preallocate as much as
> possible up front. Most NVMe drivers have maximum transfer sizes between
> 128k and 1MB, with larger being the trend.
>
> The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> some cards hamper bumping it beyond about 2MB.
>
> The AHCI driver is happy with 1MB and larger sizes.
>
> Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large
> even for our needs due to limiting factors in the upper layers making it
> hard to schedule I/Os larger than 3-4MB reliably.
>
> So this should be a relatively low risk, and high benefit.
>
> I don't think other kernel tunables need to change, but I always run into
> trouble with runningbufs :)
>
> Comments? Anything I forgot?
>
> Warner
>

Will this have any negative implications for embedded systems running
slow storage such as sdcard?

-- Ian

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Warner Losh
On Fri, Nov 13, 2020 at 4:06 PM Ian Lepore <[hidden email]> wrote:

> On Fri, 2020-11-13 at 11:33 -0700, Warner Losh wrote:
> > Greetings,
> >
> > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
> that
> > we normally use (though there are exceptions).
> >
> > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> > DFLTPHYS to 1MB.
> >
> > 128k was good back in the 90s/2000s when memory was smaller, drives did
> > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> > devices can easily do 1MB or more and there's performance benefits from
> > scheduling larger I/Os.
> >
> > Bumping this will mean larger struct buf and struct bio. Without some
> > concerted effort, it's hard to make this be a sysctl tunable. While
> that's
> > desirable, perhaps, it shouldn't gate this bump. The increase in size for
> > 1MB is modest enough.
> >
> > The NVMe driver currently is limited to 1MB transfers due to limitations
> in
> > the NVMe scatter gather lists and a desire to preallocate as much as
> > possible up front. Most NVMe drivers have maximum transfer sizes between
> > 128k and 1MB, with larger being the trend.
> >
> > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> > some cards hamper bumping it beyond about 2MB.
> >
> > The AHCI driver is happy with 1MB and larger sizes.
> >
> > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too
> large
> > even for our needs due to limiting factors in the upper layers making it
> > hard to schedule I/Os larger than 3-4MB reliably.
> >
> > So this should be a relatively low risk, and high benefit.
> >
> > I don't think other kernel tunables need to change, but I always run into
> > trouble with runningbufs :)
> >
> > Comments? Anything I forgot?
> >
> > Warner
> >
>
> Will this have any negative implications for embedded systems running
> slow storage such as sdcard?
>

It will work. If you have memory pressure, you may need to compile with a
smaller MAXPHYS. The savings is about 1700 bytes per struct buf.

Warner
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Scott Long-2
In reply to this post by Warner Losh
I have mixed feelings on this.  The Netflix workload isn’t typical, and this
change represents a fairly substantial increase in memory usage for
bufs.  It’s also a config tunable, so it’s not like this represents a meaningful
diff reduction for Netflix.

The upside is that it will likely help benchmarks out of the box.  Is that
enough of an upside for the downsides of memory pressure on small memory
and high iops systems?  I’m not convinced.  I really would like to see the
years of talk about fixing this correctly put into action.

Scott


> On Nov 13, 2020, at 11:33 AM, Warner Losh <[hidden email]> wrote:
>
> Greetings,
>
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os that
> we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.
>
> 128k was good back in the 90s/2000s when memory was smaller, drives did
> smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> devices can easily do 1MB or more and there's performance benefits from
> scheduling larger I/Os.
>
> Bumping this will mean larger struct buf and struct bio. Without some
> concerted effort, it's hard to make this be a sysctl tunable. While that's
> desirable, perhaps, it shouldn't gate this bump. The increase in size for
> 1MB is modest enough.
>
> The NVMe driver currently is limited to 1MB transfers due to limitations in
> the NVMe scatter gather lists and a desire to preallocate as much as
> possible up front. Most NVMe drivers have maximum transfer sizes between
> 128k and 1MB, with larger being the trend.
>
> The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> some cards hamper bumping it beyond about 2MB.
>
> The AHCI driver is happy with 1MB and larger sizes.
>
> Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too large
> even for our needs due to limiting factors in the upper layers making it
> hard to schedule I/Os larger than 3-4MB reliably.
>
> So this should be a relatively low risk, and high benefit.
>
> I don't think other kernel tunables need to change, but I always run into
> trouble with runningbufs :)
>
> Comments? Anything I forgot?
>
> Warner
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Warner Losh
On Fri, Nov 13, 2020 at 6:23 PM Scott Long <[hidden email]> wrote:

> I have mixed feelings on this.  The Netflix workload isn’t typical, and
> this
> change represents a fairly substantial increase in memory usage for
> bufs.  It’s also a config tunable, so it’s not like this represents a
> meaningful
> diff reduction for Netflix.
>

This isn't motivated at all by Netflix's work load nor any needs to
minimize diffs at all. In fact, Netflix had nothing to do with the proposal
apart from me writing it up.

This is motivated more by the needs of more people to do larger I/Os than
128k, though maybe 1MB is too large. Alexander Motin proposed it today
during the Vendor Summit and I wrote up the idea for arch@.

The upside is that it will likely help benchmarks out of the box.  Is that
> enough of an upside for the downsides of memory pressure on small memory
> and high iops systems?  I’m not convinced.  I really would like to see the
> years of talk about fixing this correctly put into action.
>

I'd love years of inaction to end too. I'd also like FreeBSD to perform a
bit better out of the box. Would your calculation have changed had the size
been 256k or 512k? Both those options use/waste substantially fewer bytes
per I/O than 1MB.

Warner


> Scott
>
>
> > On Nov 13, 2020, at 11:33 AM, Warner Losh <[hidden email]> wrote:
> >
> > Greetings,
> >
> > We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
> that
> > we normally use (though there are exceptions).
> >
> > I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> > DFLTPHYS to 1MB.
> >
> > 128k was good back in the 90s/2000s when memory was smaller, drives did
> > smaller I/Os, etc. Now, however, it doesn't make much sense. Modern I/O
> > devices can easily do 1MB or more and there's performance benefits from
> > scheduling larger I/Os.
> >
> > Bumping this will mean larger struct buf and struct bio. Without some
> > concerted effort, it's hard to make this be a sysctl tunable. While
> that's
> > desirable, perhaps, it shouldn't gate this bump. The increase in size for
> > 1MB is modest enough.
> >
> > The NVMe driver currently is limited to 1MB transfers due to limitations
> in
> > the NVMe scatter gather lists and a desire to preallocate as much as
> > possible up front. Most NVMe drivers have maximum transfer sizes between
> > 128k and 1MB, with larger being the trend.
> >
> > The mp[rs] drivers can use larger MAXPHYS, though resource limitations on
> > some cards hamper bumping it beyond about 2MB.
> >
> > The AHCI driver is happy with 1MB and larger sizes.
> >
> > Netflix has run MAXPHYS of 8MB for years, though that's likely 2x too
> large
> > even for our needs due to limiting factors in the upper layers making it
> > hard to schedule I/Os larger than 3-4MB reliably.
> >
> > So this should be a relatively low risk, and high benefit.
> >
> > I don't think other kernel tunables need to change, but I always run into
> > trouble with runningbufs :)
> >
> > Comments? Anything I forgot?
> >
> > Warner
> > _______________________________________________
> > [hidden email] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> > To unsubscribe, send any mail to "[hidden email]"
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Alexander Motin-3
In reply to this post by Warner Losh
> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
> that we normally use (though there are exceptions).
>
> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
> DFLTPHYS to 1MB.

I am all for the MAXPHYS change, as Warner told it was my proposition on
a chat.  ZFS uses blocks and aggregates I/O up to 1MB already and can
more potentially, and having I/O size lower then this just overflows
disk queues, increases processing overheads, complicates scheduling and
in some cases causes starvation.

I'd just like to note that DFLTPHYS should probably not be changed that
straight (if at all), since it is used as a fallback for legacy code.
If it is used for anything else -- that should be reviewed and probably
migrated to some other constant(s).

--
Alexander Motin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Hans Petter Selasky-6
On 11/14/20 5:14 AM, Alexander Motin wrote:

>> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
>> that we normally use (though there are exceptions).
>>
>> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
>> DFLTPHYS to 1MB.
>
> I am all for the MAXPHYS change, as Warner told it was my proposition on
> a chat.  ZFS uses blocks and aggregates I/O up to 1MB already and can
> more potentially, and having I/O size lower then this just overflows
> disk queues, increases processing overheads, complicates scheduling and
> in some cases causes starvation.
>
> I'd just like to note that DFLTPHYS should probably not be changed that
> straight (if at all), since it is used as a fallback for legacy code.
> If it is used for anything else -- that should be reviewed and probably
> migrated to some other constant(s).
>

Beware that many USB 2.0 devices will break if you try to transfer more
than 64K. Buggy SCSI implementations!

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Alexander Motin-3
On 14.11.2020 06:22, Hans Petter Selasky wrote:

> On 11/14/20 5:14 AM, Alexander Motin wrote:
>>> We currently have a MAXPHYS of 128k. This is the maximum size of I/Os
>>> that we normally use (though there are exceptions).
>>>
>>> I'd like to propose that we bump MAXPHYS to 1MB, as well as bumping
>>> DFLTPHYS to 1MB.
>>
>> I am all for the MAXPHYS change, as Warner told it was my proposition on
>> a chat.  ZFS uses blocks and aggregates I/O up to 1MB already and can
>> more potentially, and having I/O size lower then this just overflows
>> disk queues, increases processing overheads, complicates scheduling and
>> in some cases causes starvation.
>>
>> I'd just like to note that DFLTPHYS should probably not be changed that
>> straight (if at all), since it is used as a fallback for legacy code.
>> If it is used for anything else -- that should be reviewed and probably
>> migrated to some other constant(s)>
> Beware that many USB 2.0 devices will break if you try to transfer more
> than 64K. Buggy SCSI implementations!

Yes, thanks, I remember.  The code reports MAXPHYS only for
USB_SPEED_SUPER devices, relying on DFLTPHYS fallback in CAM otherwise.
 I think slower ones could just be hardcoded to 64KB to be certain.

--
Alexander Motin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Alexander Motin-3
In reply to this post by Warner Losh
On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:

> To put the specific numbers, for struct buf it means increase by 1792
> bytes. For bio it does not, because it does not embed vm_page_t[] into
> the structure.
>
> Worse, typical struct buf addend for excess vm_page pointers is going
> to be unused, because normal size of the UFS block is 32K.  It is
> going to be only used by clusters and physbufs.
>
> So I object against bumping this value without reworking buffers
> handling of b_pages[].  Most straightforward approach is stop using
> MAXPHYS to size this array, and use external array for clusters.
> Pbufs can embed large array.

I am not very familiar with struct buf usage, so I'd appreciate some
help there.

Quickly looking on pbuf, it seems trivial to allocate external b_pages
array of any size in pbuf_init, that should easily satisfy all of pbuf
descendants.  Cluster and vnode/swap pagers code are pbuf descendants
also.  Vnode pager I guess may only need replacement for
nitems(bp->b_pages) in few places.

Could you or somebody help with vfs/ffs code, where I suppose the
smaller page lists are used?

--
Alexander Motin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Konstantin Belousov
On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:

> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> > To put the specific numbers, for struct buf it means increase by 1792
> > bytes. For bio it does not, because it does not embed vm_page_t[] into
> > the structure.
> >
> > Worse, typical struct buf addend for excess vm_page pointers is going
> > to be unused, because normal size of the UFS block is 32K.  It is
> > going to be only used by clusters and physbufs.
> >
> > So I object against bumping this value without reworking buffers
> > handling of b_pages[].  Most straightforward approach is stop using
> > MAXPHYS to size this array, and use external array for clusters.
> > Pbufs can embed large array.
>
> I am not very familiar with struct buf usage, so I'd appreciate some
> help there.
>
> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> array of any size in pbuf_init, that should easily satisfy all of pbuf
> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> also.  Vnode pager I guess may only need replacement for
> nitems(bp->b_pages) in few places.
I planned to look at making MAXPHYS a tunable.

You are right, we would need:
1. move b_pages to the end of struct buf and declaring it as flexible.
This would make KBI worse because struct buf depends on some debugging
options, and than b_pages offset depends on config.

Another option could be to change b_pages to pointer, if we are fine with
one more indirection.  But in my plan, real array is always allocated past
struct buf, so flexible array is more correct even.

2. Preallocating both normal bufs and pbufs together with the arrays.

3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate
that buffer has 'small' b_pages.  All buffers rotated through getnewbuf()/
buf_alloc() should have it set.

4. There could be some places which either malloc() or allocate struct buf
on stack (I tend to believe that I converted all later places to formed).
They would need to get special handling.

md(4) uses pbufs.

4. My larger concern is, in fact, cam and drivers.

>
> Could you or somebody help with vfs/ffs code, where I suppose the
> smaller page lists are used?
Do you plan to work on this ?  I can help, sure.

Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
DFLPHYS), a tunable, in the scope of this work.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Scott Long-2


> On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <[hidden email]> wrote:
>
> On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
>> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
>>> To put the specific numbers, for struct buf it means increase by 1792
>>> bytes. For bio it does not, because it does not embed vm_page_t[] into
>>> the structure.
>>>
>>> Worse, typical struct buf addend for excess vm_page pointers is going
>>> to be unused, because normal size of the UFS block is 32K.  It is
>>> going to be only used by clusters and physbufs.
>>>
>>> So I object against bumping this value without reworking buffers
>>> handling of b_pages[].  Most straightforward approach is stop using
>>> MAXPHYS to size this array, and use external array for clusters.
>>> Pbufs can embed large array.
>>
>> I am not very familiar with struct buf usage, so I'd appreciate some
>> help there.
>>
>> Quickly looking on pbuf, it seems trivial to allocate external b_pages
>> array of any size in pbuf_init, that should easily satisfy all of pbuf
>> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
>> also.  Vnode pager I guess may only need replacement for
>> nitems(bp->b_pages) in few places.
> I planned to look at making MAXPHYS a tunable.
>
> You are right, we would need:
> 1. move b_pages to the end of struct buf and declaring it as flexible.
> This would make KBI worse because struct buf depends on some debugging
> options, and than b_pages offset depends on config.
>
> Another option could be to change b_pages to pointer, if we are fine with
> one more indirection.  But in my plan, real array is always allocated past
> struct buf, so flexible array is more correct even.
>

I like this, and I was in the middle of writing up an email that described it.
There could be multiple malloc types or UMA zones of different sizes,
depending on the intended i/o size, or just a runtime change to the size of
a single allocation size.

> 2. Preallocating both normal bufs and pbufs together with the arrays.
>
> 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate
> that buffer has 'small' b_pages.  All buffers rotated through getnewbuf()/
> buf_alloc() should have it set.
>

This would work nicely with a variable sized allocator, yes.

> 4. There could be some places which either malloc() or allocate struct buf
> on stack (I tend to believe that I converted all later places to formed).
> They would need to get special handling.
>

I couldn’t find any places that allocated a buf on the stack or embedded it
into another structure.

> md(4) uses pbufs.
>
> 4. My larger concern is, in fact, cam and drivers.
>

Can you describe your concern?

>>
>> Could you or somebody help with vfs/ffs code, where I suppose the
>> smaller page lists are used?
> Do you plan to work on this ?  I can help, sure.
>
> Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
> DFLPHYS), a tunable, in the scope of this work.

Sounds great, thank you for looking at it.

Scott

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

John-Mark Gurney-2
In reply to this post by Warner Losh
Warner Losh wrote this message on Fri, Nov 13, 2020 at 19:16 -0700:

> On Fri, Nov 13, 2020 at 6:23 PM Scott Long <[hidden email]> wrote:
>
> > I have mixed feelings on this.  The Netflix workload isn???t typical, and
> > this
> > change represents a fairly substantial increase in memory usage for
> > bufs.  It???s also a config tunable, so it???s not like this represents a
> > meaningful
> > diff reduction for Netflix.
> >
>
> This isn't motivated at all by Netflix's work load nor any needs to
> minimize diffs at all. In fact, Netflix had nothing to do with the proposal
> apart from me writing it up.
>
> This is motivated more by the needs of more people to do larger I/Os than
> 128k, though maybe 1MB is too large. Alexander Motin proposed it today
> during the Vendor Summit and I wrote up the idea for arch@.

I ran into this problem recently w/ my work on ggate.  I was doing testing
using dd bs=1m.  Because of MAXPHYS, the physio for devices breaks down the
request into 128kB segments, which are scheduled serially...  This means
that if there is request latency, it is multiplied 8x because of the smaller
requests...

Also, some file systems, like ZFS, ignore the MAXPHYS limit, and pass down
larger IOs anyways, which clearly work well enough that no one complains
about ZFS not working on their devices...

I talked briefly w/ Warner about increasing MAXPHYS not to long ago.

> The upside is that it will likely help benchmarks out of the box.  Is that
> > enough of an upside for the downsides of memory pressure on small memory
> > and high iops systems?  I???m not convinced.  I really would like to see the
> > years of talk about fixing this correctly put into action.
>
> I'd love years of inaction to end too. I'd also like FreeBSD to perform a
> bit better out of the box. Would your calculation have changed had the size
> been 256k or 512k? Both those options use/waste substantially fewer bytes
> per I/O than 1MB.

--
  John-Mark Gurney Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Konstantin Belousov
In reply to this post by Scott Long-2
On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote:

>
>
> > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <[hidden email]> wrote:
> >
> > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> >>> To put the specific numbers, for struct buf it means increase by 1792
> >>> bytes. For bio it does not, because it does not embed vm_page_t[] into
> >>> the structure.
> >>>
> >>> Worse, typical struct buf addend for excess vm_page pointers is going
> >>> to be unused, because normal size of the UFS block is 32K.  It is
> >>> going to be only used by clusters and physbufs.
> >>>
> >>> So I object against bumping this value without reworking buffers
> >>> handling of b_pages[].  Most straightforward approach is stop using
> >>> MAXPHYS to size this array, and use external array for clusters.
> >>> Pbufs can embed large array.
> >>
> >> I am not very familiar with struct buf usage, so I'd appreciate some
> >> help there.
> >>
> >> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> >> array of any size in pbuf_init, that should easily satisfy all of pbuf
> >> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> >> also.  Vnode pager I guess may only need replacement for
> >> nitems(bp->b_pages) in few places.
> > I planned to look at making MAXPHYS a tunable.
> >
> > You are right, we would need:
> > 1. move b_pages to the end of struct buf and declaring it as flexible.
> > This would make KBI worse because struct buf depends on some debugging
> > options, and than b_pages offset depends on config.
> >
> > Another option could be to change b_pages to pointer, if we are fine with
> > one more indirection.  But in my plan, real array is always allocated past
> > struct buf, so flexible array is more correct even.
> >
>
> I like this, and I was in the middle of writing up an email that described it.
> There could be multiple malloc types or UMA zones of different sizes,
> depending on the intended i/o size, or just a runtime change to the size of
> a single allocation size.
I do not think we need new/many zones.

Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated
from pbuf_zone. That should be fixed alloc size, with small b_pages[]
for buf_zone, and large (MAXPHYS) for pbuf.

Everything else, if any, would need to pre-calculate malloc size.

>
> > 2. Preallocating both normal bufs and pbufs together with the arrays.
> >
> > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to indicate
> > that buffer has 'small' b_pages.  All buffers rotated through getnewbuf()/
> > buf_alloc() should have it set.
> >
>
> This would work nicely with a variable sized allocator, yes.
>
> > 4. There could be some places which either malloc() or allocate struct buf
> > on stack (I tend to believe that I converted all later places to formed).
> > They would need to get special handling.
> >
>
> I couldn’t find any places that allocated a buf on the stack or embedded it
> into another structure.
As I said, I did a pass to eliminate stack allocations for bufs.
As result, for instance flushbufqueues() mallocs struct buf, but it does
not use b_pages[] of the allocated sentinel.

>
> > md(4) uses pbufs.
> >
> > 4. My larger concern is, in fact, cam and drivers.
> >
>
> Can you describe your concern?
My opinion is that during this work all uses of MAXPHYS should be reviewed,
and there are a lot of drivers that reference the constant.  From the past
experience, I expect some evil ingenious (ab)use.

Same for bufs, but apart from application of pbufs in cam_periph.c, I do not
think drivers have much use of it.

>
> >>
> >> Could you or somebody help with vfs/ffs code, where I suppose the
> >> smaller page lists are used?
> > Do you plan to work on this ?  I can help, sure.
> >
> > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
> > DFLPHYS), a tunable, in the scope of this work.
>
> Sounds great, thank you for looking at it.
>
> Scott
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Warner Losh
On Sat, Nov 14, 2020, 12:43 PM Konstantin Belousov <[hidden email]>
wrote:

> On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote:
> >
> >
> > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <[hidden email]>
> wrote:
> > >
> > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> > >>> To put the specific numbers, for struct buf it means increase by 1792
> > >>> bytes. For bio it does not, because it does not embed vm_page_t[]
> into
> > >>> the structure.
> > >>>
> > >>> Worse, typical struct buf addend for excess vm_page pointers is going
> > >>> to be unused, because normal size of the UFS block is 32K.  It is
> > >>> going to be only used by clusters and physbufs.
> > >>>
> > >>> So I object against bumping this value without reworking buffers
> > >>> handling of b_pages[].  Most straightforward approach is stop using
> > >>> MAXPHYS to size this array, and use external array for clusters.
> > >>> Pbufs can embed large array.
> > >>
> > >> I am not very familiar with struct buf usage, so I'd appreciate some
> > >> help there.
> > >>
> > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> > >> array of any size in pbuf_init, that should easily satisfy all of pbuf
> > >> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> > >> also.  Vnode pager I guess may only need replacement for
> > >> nitems(bp->b_pages) in few places.
> > > I planned to look at making MAXPHYS a tunable.
> > >
> > > You are right, we would need:
> > > 1. move b_pages to the end of struct buf and declaring it as flexible.
> > > This would make KBI worse because struct buf depends on some debugging
> > > options, and than b_pages offset depends on config.
> > >
> > > Another option could be to change b_pages to pointer, if we are fine
> with
> > > one more indirection.  But in my plan, real array is always allocated
> past
> > > struct buf, so flexible array is more correct even.
> > >
> >
> > I like this, and I was in the middle of writing up an email that
> described it.
> > There could be multiple malloc types or UMA zones of different sizes,
> > depending on the intended i/o size, or just a runtime change to the size
> of
> > a single allocation size.
> I do not think we need new/many zones.
>
> Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated
> from pbuf_zone. That should be fixed alloc size, with small b_pages[]
> for buf_zone, and large (MAXPHYS) for pbuf.
>
> Everything else, if any, would need to pre-calculate malloc size.
>

How will this affect clustered reads for things like read ahead?

>
> > > 2. Preallocating both normal bufs and pbufs together with the arrays.
> > >
> > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to
> indicate
> > > that buffer has 'small' b_pages.  All buffers rotated through
> getnewbuf()/
> > > buf_alloc() should have it set.
> > >
> >
> > This would work nicely with a variable sized allocator, yes.
> >
> > > 4. There could be some places which either malloc() or allocate struct
> buf
> > > on stack (I tend to believe that I converted all later places to
> formed).
> > > They would need to get special handling.
> > >
> >
> > I couldn’t find any places that allocated a buf on the stack or embedded
> it
> > into another structure.
> As I said, I did a pass to eliminate stack allocations for bufs.
> As result, for instance flushbufqueues() mallocs struct buf, but it does
> not use b_pages[] of the allocated sentinel.
>

Yea. I recall both the pass and looking for them later and not finding any
either...

>
> > > md(4) uses pbufs.
> > >
> > > 4. My larger concern is, in fact, cam and drivers.
> > >
> >
> > Can you describe your concern?
> My opinion is that during this work all uses of MAXPHYS should be reviewed,
> and there are a lot of drivers that reference the constant.  From the past
> experience, I expect some evil ingenious (ab)use.
>
> Same for bufs, but apart from application of pbufs in cam_periph.c, I do
> not
> think drivers have much use of it.
>

Do you have precise definitions for DFLTPHYS and MAXPHYS? That might help
ferret out the differences between the two. I have seen several places that
use one or the other of these that seem incorrect, but that I can't quite
articulate precisely why... having a good definition articulated would
help. There are some places that likely want a fixed constant to reflect
hardware, not a FreeBSD tuning parameter.

As an aside, there are times I want to do transfers of arbitrary sizes for
certain pass through commands that are vendor specific and that have no way
to read the results in chunks. Thankfully most newer drives don't have this
restriction, but it still comes up. But that's way below the buf layer and
handled today by cam_periph and the pbufs there. These types of operations
are rare and typically when the system is mostly idle, so low memory
situations can be ignored beyond error handling and retry in the user
program. Would this work make those possible? Or would MAXPHYS, however
set, still limit them?

Warner

>
> > >>
> > >> Could you or somebody help with vfs/ffs code, where I suppose the
> > >> smaller page lists are used?
> > > Do you plan to work on this ?  I can help, sure.
> > >
> > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same
> as
> > > DFLPHYS), a tunable, in the scope of this work.
> >
> > Sounds great, thank you for looking at it.
> >
> > Scott
> >
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Konstantin Belousov
On Sat, Nov 14, 2020 at 01:39:32PM -0700, Warner Losh wrote:

> On Sat, Nov 14, 2020, 12:43 PM Konstantin Belousov <[hidden email]>
> wrote:
>
> > On Sat, Nov 14, 2020 at 11:48:34AM -0700, Scott Long wrote:
> > >
> > >
> > > > On Nov 14, 2020, at 11:37 AM, Konstantin Belousov <[hidden email]>
> > wrote:
> > > >
> > > > On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> > > >> On Fri, 13 Nov 2020 21:09:37 +0200 Konstantin Belousov wrote:
> > > >>> To put the specific numbers, for struct buf it means increase by 1792
> > > >>> bytes. For bio it does not, because it does not embed vm_page_t[]
> > into
> > > >>> the structure.
> > > >>>
> > > >>> Worse, typical struct buf addend for excess vm_page pointers is going
> > > >>> to be unused, because normal size of the UFS block is 32K.  It is
> > > >>> going to be only used by clusters and physbufs.
> > > >>>
> > > >>> So I object against bumping this value without reworking buffers
> > > >>> handling of b_pages[].  Most straightforward approach is stop using
> > > >>> MAXPHYS to size this array, and use external array for clusters.
> > > >>> Pbufs can embed large array.
> > > >>
> > > >> I am not very familiar with struct buf usage, so I'd appreciate some
> > > >> help there.
> > > >>
> > > >> Quickly looking on pbuf, it seems trivial to allocate external b_pages
> > > >> array of any size in pbuf_init, that should easily satisfy all of pbuf
> > > >> descendants.  Cluster and vnode/swap pagers code are pbuf descendants
> > > >> also.  Vnode pager I guess may only need replacement for
> > > >> nitems(bp->b_pages) in few places.
> > > > I planned to look at making MAXPHYS a tunable.
> > > >
> > > > You are right, we would need:
> > > > 1. move b_pages to the end of struct buf and declaring it as flexible.
> > > > This would make KBI worse because struct buf depends on some debugging
> > > > options, and than b_pages offset depends on config.
> > > >
> > > > Another option could be to change b_pages to pointer, if we are fine
> > with
> > > > one more indirection.  But in my plan, real array is always allocated
> > past
> > > > struct buf, so flexible array is more correct even.
> > > >
> > >
> > > I like this, and I was in the middle of writing up an email that
> > described it.
> > > There could be multiple malloc types or UMA zones of different sizes,
> > > depending on the intended i/o size, or just a runtime change to the size
> > of
> > > a single allocation size.
> > I do not think we need new/many zones.
> >
> > Queued (getnewbuf()) bufs come from buf_zone, and pbufs are allocated
> > from pbuf_zone. That should be fixed alloc size, with small b_pages[]
> > for buf_zone, and large (MAXPHYS) for pbuf.
> >
> > Everything else, if any, would need to pre-calculate malloc size.
> >
>
> How will this affect clustered reads for things like read ahead?
kern/vfs_cluster.c uses pbufs to create temporal buf by combining pages
from the constituent normal (queued) buffers.  According to the discussion,
pbufs would have b_pages[]/KVA reserved by MAXPHYS.  This allows cluster
to fill the large request for read-ahead or background write.

>
> >
> > > > 2. Preallocating both normal bufs and pbufs together with the arrays.
> > > >
> > > > 3. I considered adding B_SMALLPAGES flag to b_flags and use it to
> > indicate
> > > > that buffer has 'small' b_pages.  All buffers rotated through
> > getnewbuf()/
> > > > buf_alloc() should have it set.
> > > >
> > >
> > > This would work nicely with a variable sized allocator, yes.
> > >
> > > > 4. There could be some places which either malloc() or allocate struct
> > buf
> > > > on stack (I tend to believe that I converted all later places to
> > formed).
> > > > They would need to get special handling.
> > > >
> > >
> > > I couldn’t find any places that allocated a buf on the stack or embedded
> > it
> > > into another structure.
> > As I said, I did a pass to eliminate stack allocations for bufs.
> > As result, for instance flushbufqueues() mallocs struct buf, but it does
> > not use b_pages[] of the allocated sentinel.
> >
>
> Yea. I recall both the pass and looking for them later and not finding any
> either...
>
> >
> > > > md(4) uses pbufs.
> > > >
> > > > 4. My larger concern is, in fact, cam and drivers.
> > > >
> > >
> > > Can you describe your concern?
> > My opinion is that during this work all uses of MAXPHYS should be reviewed,
> > and there are a lot of drivers that reference the constant.  From the past
> > experience, I expect some evil ingenious (ab)use.
> >
> > Same for bufs, but apart from application of pbufs in cam_periph.c, I do
> > not
> > think drivers have much use of it.
> >
>
> Do you have precise definitions for DFLTPHYS and MAXPHYS? That might help
> ferret out the differences between the two. I have seen several places that
> use one or the other of these that seem incorrect, but that I can't quite
> articulate precisely why... having a good definition articulated would
> help. There are some places that likely want a fixed constant to reflect
> hardware, not a FreeBSD tuning parameter.
Right now VFS guarantees that it never creates io request (bio ?) larger
than MAXPHYS.  In fact, VMIO buffers simply do not allow to express such
request because there is no place to put more pages.

DFLTPHYS seems to be only used by drivers (and some geoms), and typical
driver' usage of it is to clamp the max io request more than MAXPHYS.
I see that dump code tries to not write more than DFLTPHYS one time, to
ease life of drivers, and physio() sanitize maxio at DFLTPHYS, but this
is for really broken drivers.

>
> As an aside, there are times I want to do transfers of arbitrary sizes for
> certain pass through commands that are vendor specific and that have no way
> to read the results in chunks. Thankfully most newer drives don't have this
> restriction, but it still comes up. But that's way below the buf layer and
> handled today by cam_periph and the pbufs there. These types of operations
> are rare and typically when the system is mostly idle, so low memory
> situations can be ignored beyond error handling and retry in the user
> program. Would this work make those possible? Or would MAXPHYS, however
> set, still limit them?
MAXPHYS would still limit them, at least in the scope of work we are
discussing.

>
> Warner
>
> >
> > > >>
> > > >> Could you or somebody help with vfs/ffs code, where I suppose the
> > > >> smaller page lists are used?
> > > > Do you plan to work on this ?  I can help, sure.
> > > >
> > > > Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same
> > as
> > > > DFLPHYS), a tunable, in the scope of this work.
> > >
> > > Sounds great, thank you for looking at it.
> > >
> > > Scott
> > >
> > _______________________________________________
> > [hidden email] mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> > To unsubscribe, send any mail to "[hidden email]"
> >
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Poul-Henning Kamp
--------
Konstantin Belousov writes:

> DFLTPHYS seems to be only used by drivers (and some geoms), and typical
> driver' usage of it is to clamp the max io request more than MAXPHYS.
> I see that dump code tries to not write more than DFLTPHYS one time, to
> ease life of drivers, and physio() sanitize maxio at DFLTPHYS, but this
> is for really broken drivers.

DFLTPHYS is the antique version of g_provider->stripesize, and
should be replaced by it throughout.

The history behind DFLTPHYS is that tape-drives were limited to
MAXPHYS sized tape-blocks, so you wanted it large.

For performance reasons disk operations should not span cylinders,
a topic I'm sure Kirk can elaborate on if provoked, so DFLTPHYS was
reduce them to a tunable size.

Peak performance was when fs-blocks divided DFLTPHYS and DFLTPHYS
divided the cylinder of the disk.

Seagate ST82500[1] with standard formatting had 0x616 sectors per
cylinder, (19 heads, 82 sectors each).  Formatting with a generous
22 spare sectors per cylinder brought the "usable" cylindersize
down to precisly 0x600 sectors, which resulted in around 5-10%
higher overall system performance on a heavily loaded Tahoe.

Poul-Henning

[1] The STI82500 "Sabre" is amusingly available for order in certain
web-shops but, alas, "not currently in stock".

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
[hidden email]         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: MAXPHYS bump for FreeBSD 13

Alexander Motin-3
In reply to this post by Konstantin Belousov
On 14.11.2020 13:37, Konstantin Belousov wrote:
> On Sat, Nov 14, 2020 at 10:01:05AM -0500, Alexander Motin wrote:
> 4. My larger concern is, in fact, cam and drivers.

I am actually the least concerned about this part.  I've already
reviewed/cleaned it once, and can do again if needed.  We have some
drivers unaware about MAXPHYS, and they should safely be limited to
DFLTPHYS, the others should properly adapt.  And if you like to make
MAXPHYS tunable -- I'd be happy to take this part.

>> Could you or somebody help with vfs/ffs code, where I suppose the
>> smaller page lists are used?
> Do you plan to work on this ?  I can help, sure.

Honestly, I haven't planned it.  But if that is a price to finally close
this topic forever, I could probably figure out something.  Otherwise I
was mostly looking for somebody to take this part of the project into
capable hands.

> Still, I wanted to make MAXPHYS (and 'small' MAXPHYS, this is not same as
> DFLPHYS), a tunable, in the scope of this work.

+1

--
Alexander Motin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
12