What value HZ?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

What value HZ?

Warner Losh
I'm top posting here, since this thread from a recent commit gives the
context

I'm proposing basically the following patch:

diff --git a/sys/kern/subr_param.c b/sys/kern/subr_param.c
index c0025c07eed..bb92afb6449 100644
--- a/sys/kern/subr_param.c
+++ b/sys/kern/subr_param.c
@@ -61,11 +61,7 @@ __FBSDID("$FreeBSD$");
  */

 #ifndef HZ
-#  if defined(__mips__) || defined(__arm__) || defined(__riscv)
-#    define    HZ 100
-#  else
-#    define    HZ 1000
-#  endif
+#  define      HZ 1000
 #  ifndef HZ_VM
 #    define    HZ_VM 100
 #  endif

Along with removing HZ from almost all the kernel config files in arm and
mips where it is already 1000. I'm agnostic about riscv, so would also be
open to just removing the first two clauses from the #if the diff shows me
removing.

So on arm, only some of the armv5 ports have a HZ of 100: DB-78XX,
DB-88F6XXX (but not DB-88F5XXX), DOCKSTAR, DREAMPLUG (but not SHEVAPLUG)
and RT1310. All the armv6 and armv7 have HZ=1000. Since armv5 is slated to
go away before 13 branches, we should just change it now. All the marvell
parts likely should be able to cope with 1000HZ anyway, and only the RT1310
is slow enough to maybe need HZ=100. I can't say for sure, though, since I
can't get mine to work. armv[67] is 99%+ of the current install base, due
to FreeBSD's need for more memory than most of the old *PLUG computers have
anyway. So the case for arm seems fairly straight forward: bump it to 1000
and maybe add an option line for RT1310.

For mips, the situation is similar. All the atheros boards run at 1000Hz.
BERI is the only one that sets something specific (either 100 or 200) and
can remain. The JZ4780 stuff uses the default of 100Hz, and likely is in
the range of machines that's neither helped nor hurt by 1000HZ.  MALTA*
also uses the default of 100HZ, but it's 100% emulation these days (or near
enough) that we should likely keep it there. The Mediatek stuff also uses
the default of 100HZ. It's more likely to benefit from 1000HZ, than JZ4780,
but not by much. The XLP stuff is 1000HZ. Octeon is 100HZ, but is plenty
fast for 1000HZ and likely would benefit from the change. So the proposal
for that lot is to change to 1000HZ, leave BERI as is, at HZ=100 to MALTA*
and let the rest tick over to 1000HZ by default. Should any problems arise,
we can bump those back down to something more sane. I suspect changing to
1000 won't matter at all given the current mix of systems that are
supported, with the possible exception of MALTA* (I'll defer to more recent
users of that, though, since it has been a while for me).

For riscv, which kicked all this off, I'd be inclined to leave it at 1000.
But I don't know that market segment well enough to have an educated
opinion.

Comments?

Warner

On Fri, Sep 6, 2019 at 10:23 PM Philip Paeps <[hidden email]> wrote:

> On 2019-09-07 12:06:32 (+0800), Warner Losh wrote:
> > On Fri, Sep 6, 2019 at 9:54 PM Philip Paeps <[hidden email]>
> > wrote:
> >> On 2019-09-06 22:18:36 (+0800), Ian Lepore wrote:
> >>> On Fri, 2019-09-06 at 12:15 +0800, Philip Paeps wrote:
> >>>> On 2019-09-06 11:15:12 (+0800), Ian Lepore wrote:
> >>>>> On Fri, 2019-09-06 at 01:19 +0000, Philip Paeps wrote:
> >>>>>> Log:
> >>>>>>   riscv: default to HZ=100
> >>>>>
> >>>>> This seems like a bad idea.  I've run a 90mhz armv4 chip with
> >>>>> HZ=1000 and didn't notice any performance hit from doing so.
> >>>>> Almost all arm kernel config files set HZ as an option, so that
> >>>>> define doesn't do much for arm these days.  It probably does still
> >>>>> set HZ for various mips platforms.
> >>>>>
> >>>>> I would think 1000 is appropriate for anything modern running at
> >>>>> 200mhz or more.
> >>>>>
> >>>>> Setting it to 100 has the bad side effect of making things like
> >>>>> msleep(), tsleep(), and pause() (which show up in plenty of
> >>>>> drivers) all have a minimum timeout of 10ms, which is a long long
> >>>>> time on modern hardware.
> >>>>>
> >>>>> What benefit do you think you'll get from the lower number?
> >>>>
> >>>> On systems running at 10s of MHz (or slower, ick), with HZ=1000 you
> >>>> spend an awful lot of time servicing the timer interrupt and not
> >>>> very much time doing anything else.
> >>>>
> >>>> My rationale was that most RISC-V systems (including emulation and
> >>>> FPGA prototypes) I've encountered are running slower than the
> >>>> tipping point where HZ=1000 makes sense.  With the default of
> >>>> HZ=100, faster exceptions can still set HZ=1000 in their individual
> >>>> configs.
> >>>>
> >>>> When the RISC-V world evolves to having more actual silicon and
> >>>> fewer slow prototypes, I definitely agree this default should be
> >>>> flipped again for HZ=1000 by default and HZ=100 in the config files
> >>>> for the exceptions.
> >>>
> >>> Wait a second... are you saying that the riscv implementation
> >>> doesn't support event timers and uses an old-style periodic tick
> >>> based on HZ?
> >>
> >> Depending on the hardware, there may not be an event timer (yet)...
> >>
> >> As I wrote: I would be more than happy to revert this change when
> >> more silicon becomes available.  Presently, there is exactly one
> >> silicon RISC-V implementation commercially available (HiFive FU540)
> >> and even that one is kind of difficult to source.  Most people
> >> running RISC-V are doing so in emulation or on FPGAs.
> >>
> >> Given how long these things take to boot to userland (where you
> >> really notice how slow things are), HZ=100 feels like a more sensible
> >> default than HZ=1000.
> >
> > I think it show more that the defaults are bad for MIPS and ARM. All
> > the MIPS files, except BERI/CHERI are 1000Hz. Well, Octeon is also
> > 100Hz, due to the defaults, but it will be fine at 1000Hz, so maybe we
> > need to attend to this as well. Arm !=v5 is also 1000Hz, so it should
> > be changed...
> >
> >> I don't feel terribly strongly about this though.  I've just been
> >> bitten several times in the last week on a <15MHz FPGA forgetting to
> >> set HZ=100 in config and figured I'd save others the trouble. ;-)
> >
> > 15MHz FPGA? FreeBSD 1.0 barely ran on 25MHz i386 machines of the
> > time....  How common are these beasts and how well does FreeBSD do on
> > them. I assume these are early prototypes?
>
> These are early prototypes indeed.
>
> FreeBSD runs remarkably well on them.  Slowly of course.  Booting takes
> several minutes and running anything non-trivial can be frustrating.
>
> > I have no strong opinion on riscv, but do think mips and arm should
> > change.
>
> I will revert r351918 and r351919 since there is clearly no consensus.
>
> Let's take this discussion to arch@?
>
> Philip
>
> --
> Philip Paeps
> Senior Reality Engineer
> Alternative Enterprises
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What value HZ?

Constantine A. Murenin-2
For the sake of discussion, I recall that there was a measurable power
consumption difference between FreeBSD (HZ=1000) and OpenBSD (HZ=100) when
I played with a Kill-A-Watt a number of years ago.

Would it perhaps be prudent to try to qualify whether, and by how much,
this change affects power consumption on MIPS/ARM/RISCV, not just whether
each arch could cope with the change?

P.S. For folks who skipped the original message, this doesn't necessarily
change HZ for actual builds, as many individual conf files already define
it at 1000.  http://bxr.su/f/s?q=%22options+HZ%3D1000%22

Cheers,
Constantine.SU.

On Fri, 6 Sep 2019 at 22:46, Warner Losh <[hidden email]> wrote:

> I'm top posting here, since this thread from a recent commit gives the
> context
>
> I'm proposing basically the following patch:
>
> diff --git a/sys/kern/subr_param.c b/sys/kern/subr_param.c
> index c0025c07eed..bb92afb6449 100644
> --- a/sys/kern/subr_param.c
> +++ b/sys/kern/subr_param.c
> @@ -61,11 +61,7 @@ __FBSDID("$FreeBSD$");
>   */
>
>  #ifndef HZ
> -#  if defined(__mips__) || defined(__arm__) || defined(__riscv)
> -#    define    HZ 100
> -#  else
> -#    define    HZ 1000
> -#  endif
> +#  define      HZ 1000
>  #  ifndef HZ_VM
>  #    define    HZ_VM 100
>  #  endif
>
> Along with removing HZ from almost all the kernel config files in arm and
> mips where it is already 1000. I'm agnostic about riscv, so would also be
> open to just removing the first two clauses from the #if the diff shows me
> removing.
>
> So on arm, only some of the armv5 ports have a HZ of 100: DB-78XX,
> DB-88F6XXX (but not DB-88F5XXX), DOCKSTAR, DREAMPLUG (but not SHEVAPLUG)
> and RT1310. All the armv6 and armv7 have HZ=1000. Since armv5 is slated to
> go away before 13 branches, we should just change it now. All the marvell
> parts likely should be able to cope with 1000HZ anyway, and only the RT1310
> is slow enough to maybe need HZ=100. I can't say for sure, though, since I
> can't get mine to work. armv[67] is 99%+ of the current install base, due
> to FreeBSD's need for more memory than most of the old *PLUG computers have
> anyway. So the case for arm seems fairly straight forward: bump it to 1000
> and maybe add an option line for RT1310.
>
> For mips, the situation is similar. All the atheros boards run at 1000Hz.
> BERI is the only one that sets something specific (either 100 or 200) and
> can remain. The JZ4780 stuff uses the default of 100Hz, and likely is in
> the range of machines that's neither helped nor hurt by 1000HZ.  MALTA*
> also uses the default of 100HZ, but it's 100% emulation these days (or near
> enough) that we should likely keep it there. The Mediatek stuff also uses
> the default of 100HZ. It's more likely to benefit from 1000HZ, than JZ4780,
> but not by much. The XLP stuff is 1000HZ. Octeon is 100HZ, but is plenty
> fast for 1000HZ and likely would benefit from the change. So the proposal
> for that lot is to change to 1000HZ, leave BERI as is, at HZ=100 to MALTA*
> and let the rest tick over to 1000HZ by default. Should any problems arise,
> we can bump those back down to something more sane. I suspect changing to
> 1000 won't matter at all given the current mix of systems that are
> supported, with the possible exception of MALTA* (I'll defer to more recent
> users of that, though, since it has been a while for me).
>
> For riscv, which kicked all this off, I'd be inclined to leave it at 1000.
> But I don't know that market segment well enough to have an educated
> opinion.
>
> Comments?
>
> Warner
>
> On Fri, Sep 6, 2019 at 10:23 PM Philip Paeps <[hidden email]> wrote:
>
> > On 2019-09-07 12:06:32 (+0800), Warner Losh wrote:
> > > On Fri, Sep 6, 2019 at 9:54 PM Philip Paeps <[hidden email]>
> > > wrote:
> > >> On 2019-09-06 22:18:36 (+0800), Ian Lepore wrote:
> > >>> On Fri, 2019-09-06 at 12:15 +0800, Philip Paeps wrote:
> > >>>> On 2019-09-06 11:15:12 (+0800), Ian Lepore wrote:
> > >>>>> On Fri, 2019-09-06 at 01:19 +0000, Philip Paeps wrote:
> > >>>>>> Log:
> > >>>>>>   riscv: default to HZ=100
> > >>>>>
> > >>>>> This seems like a bad idea.  I've run a 90mhz armv4 chip with
> > >>>>> HZ=1000 and didn't notice any performance hit from doing so.
> > >>>>> Almost all arm kernel config files set HZ as an option, so that
> > >>>>> define doesn't do much for arm these days.  It probably does still
> > >>>>> set HZ for various mips platforms.
> > >>>>>
> > >>>>> I would think 1000 is appropriate for anything modern running at
> > >>>>> 200mhz or more.
> > >>>>>
> > >>>>> Setting it to 100 has the bad side effect of making things like
> > >>>>> msleep(), tsleep(), and pause() (which show up in plenty of
> > >>>>> drivers) all have a minimum timeout of 10ms, which is a long long
> > >>>>> time on modern hardware.
> > >>>>>
> > >>>>> What benefit do you think you'll get from the lower number?
> > >>>>
> > >>>> On systems running at 10s of MHz (or slower, ick), with HZ=1000 you
> > >>>> spend an awful lot of time servicing the timer interrupt and not
> > >>>> very much time doing anything else.
> > >>>>
> > >>>> My rationale was that most RISC-V systems (including emulation and
> > >>>> FPGA prototypes) I've encountered are running slower than the
> > >>>> tipping point where HZ=1000 makes sense.  With the default of
> > >>>> HZ=100, faster exceptions can still set HZ=1000 in their individual
> > >>>> configs.
> > >>>>
> > >>>> When the RISC-V world evolves to having more actual silicon and
> > >>>> fewer slow prototypes, I definitely agree this default should be
> > >>>> flipped again for HZ=1000 by default and HZ=100 in the config files
> > >>>> for the exceptions.
> > >>>
> > >>> Wait a second... are you saying that the riscv implementation
> > >>> doesn't support event timers and uses an old-style periodic tick
> > >>> based on HZ?
> > >>
> > >> Depending on the hardware, there may not be an event timer (yet)...
> > >>
> > >> As I wrote: I would be more than happy to revert this change when
> > >> more silicon becomes available.  Presently, there is exactly one
> > >> silicon RISC-V implementation commercially available (HiFive FU540)
> > >> and even that one is kind of difficult to source.  Most people
> > >> running RISC-V are doing so in emulation or on FPGAs.
> > >>
> > >> Given how long these things take to boot to userland (where you
> > >> really notice how slow things are), HZ=100 feels like a more sensible
> > >> default than HZ=1000.
> > >
> > > I think it show more that the defaults are bad for MIPS and ARM. All
> > > the MIPS files, except BERI/CHERI are 1000Hz. Well, Octeon is also
> > > 100Hz, due to the defaults, but it will be fine at 1000Hz, so maybe we
> > > need to attend to this as well. Arm !=v5 is also 1000Hz, so it should
> > > be changed...
> > >
> > >> I don't feel terribly strongly about this though.  I've just been
> > >> bitten several times in the last week on a <15MHz FPGA forgetting to
> > >> set HZ=100 in config and figured I'd save others the trouble. ;-)
> > >
> > > 15MHz FPGA? FreeBSD 1.0 barely ran on 25MHz i386 machines of the
> > > time....  How common are these beasts and how well does FreeBSD do on
> > > them. I assume these are early prototypes?
> >
> > These are early prototypes indeed.
> >
> > FreeBSD runs remarkably well on them.  Slowly of course.  Booting takes
> > several minutes and running anything non-trivial can be frustrating.
> >
> > > I have no strong opinion on riscv, but do think mips and arm should
> > > change.
> >
> > I will revert r351918 and r351919 since there is clearly no consensus.
> >
> > Let's take this discussion to arch@?
> >
> > Philip
> >
> > --
> > Philip Paeps
> > Senior Reality Engineer
> > Alternative Enterprises
> >
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What value HZ?

Conrad Meyer-2
Hi Constantine,

On Sat, Sep 7, 2019 at 3:05 PM Constantine A. Murenin
<[hidden email]> wrote:
>
> For the sake of discussion, I recall that there was a measurable power
> consumption difference between FreeBSD (HZ=1000) and OpenBSD (HZ=100) when
> I played with a Kill-A-Watt a number of years ago.

Unfortunately, this is an apples to oranges comparison, and perhaps
worse, vague and dated.  It would be more interesting to compare
HZ=1000 to HZ=100 with an otherwise identical CURRENT FreeBSD system.

As far as I can tell, FreeBSD grew "tickless" timer support in r212541
in 2010 (thanks mav@).  This is easily observed on my idle HZ=1000
amd64 system with 'vmstat -i': "cpu0:timer" is firing at an average
rate of 21 Hz — not 1000.  And that is the most frequent interrupt.

> Would it perhaps be prudent to try to qualify whether, and by how much,
> this change affects power consumption on MIPS/ARM/RISCV, not just whether
> each arch could cope with the change?

If someone has the time and inclination, then of course, it could be
an interesting test to run.

Best,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What value HZ?

John Baldwin
In reply to this post by Warner Losh
On 9/6/19 9:46 PM, Warner Losh wrote:

> I'm top posting here, since this thread from a recent commit gives the
> context
>
> I'm proposing basically the following patch:
>
> diff --git a/sys/kern/subr_param.c b/sys/kern/subr_param.c
> index c0025c07eed..bb92afb6449 100644
> --- a/sys/kern/subr_param.c
> +++ b/sys/kern/subr_param.c
> @@ -61,11 +61,7 @@ __FBSDID("$FreeBSD$");
>   */
>
>  #ifndef HZ
> -#  if defined(__mips__) || defined(__arm__) || defined(__riscv)
> -#    define    HZ 100
> -#  else
> -#    define    HZ 1000
> -#  endif
> +#  define      HZ 1000
>  #  ifndef HZ_VM
>  #    define    HZ_VM 100
>  #  endif
>
> Along with removing HZ from almost all the kernel config files in arm and
> mips where it is already 1000. I'm agnostic about riscv, so would also be
> open to just removing the first two clauses from the #if the diff shows me
> removing.

I think this sounds fine.  On x86 we use hz=100 instead of 1000 in VMs via
a runtime test.  I suspect if anything we might want to take that same factor
into account here.  Thus, it makes sense for mips MALTA configs (most often
run under qemu) to use 100, and/or to add a runtime test for qemu that switches
from 1000 to 100.  Similarly, for riscv we probably want to use 100 under
qemu and spike, but 1000 on actual hardware via either kernel config options
or runtime checks.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What value HZ?

Ian Lepore-3
In reply to this post by Conrad Meyer-2
On Sat, 2019-09-07 at 18:58 -0700, Conrad Meyer wrote:

> Hi Constantine,
>
> On Sat, Sep 7, 2019 at 3:05 PM Constantine A. Murenin
> <[hidden email]> wrote:
> >
> > For the sake of discussion, I recall that there was a measurable power
> > consumption difference between FreeBSD (HZ=1000) and OpenBSD (HZ=100) when
> > I played with a Kill-A-Watt a number of years ago.
>
> Unfortunately, this is an apples to oranges comparison, and perhaps
> worse, vague and dated.  It would be more interesting to compare
> HZ=1000 to HZ=100 with an otherwise identical CURRENT FreeBSD system.
>
> As far as I can tell, FreeBSD grew "tickless" timer support in r212541
> in 2010 (thanks mav@).  This is easily observed on my idle HZ=1000
> amd64 system with 'vmstat -i': "cpu0:timer" is firing at an average
> rate of 21 Hz — not 1000.  And that is the most frequent interrupt.
>
> > Would it perhaps be prudent to try to qualify whether, and by how much,
> > this change affects power consumption on MIPS/ARM/RISCV, not just whether
> > each arch could cope with the change?
>
> If someone has the time and inclination, then of course, it could be
> an interesting test to run.

In this mail, when I say HZ, I mean also 'hz' the global var that's
tunable...

When I saw this conversation moved to arch@ my first thought was "Good,
we can finally have the conversation I've been trying to have with
people for years."  But, alas, the focus even here is still just "100
vs 1000", not the base question I really want to see answered.  I'm
replying to this message in the thread because it comes closest to
touching on that base question, which is:

   What, exactly, does HZ control and affect these days?

It no longer controls how often periodic timer interrupts happen,
except on a few rare systems that don't support one-shot style event
timers.  I'm not sure any such systems even still exist, now that we've
nuked most of the old armv4 support.  If they do exist, I'm pretty sure
it's only in the MIPS arch.

It no longer controls the scheduling quantum for a process or thread.
Or so I've been told, I'm pretty ignorant about the workings of both
schedulers.

It no longer controls the minimum amount of time a userland process can
sleep when calling the usleep() or nanosleep() functions.

The one thing I know for sure that HZ affects is whatever code still
exists that schedules timeouts in terms of 'ticks' rather than using
the newer SBT flavor of sleeping functions.  Old-school pause(9) and
msleep(9) and others that take a timeout in 'ticks' will have the
timeout granularity implied by HZ, and there is still such code around
in various device drivers and other places.

Aside from old-school tick-based sleep functions, does HZ do anything
anymore?

-- Ian

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: What value HZ?

Ian Lepore-3
In reply to this post by John Baldwin
On Mon, 2019-09-09 at 13:38 -0400, John Baldwin wrote:

> On 9/6/19 9:46 PM, Warner Losh wrote:
> > I'm top posting here, since this thread from a recent commit gives the
> > context
> >
> > I'm proposing basically the following patch:
> >
> > diff --git a/sys/kern/subr_param.c b/sys/kern/subr_param.c
> > index c0025c07eed..bb92afb6449 100644
> > --- a/sys/kern/subr_param.c
> > +++ b/sys/kern/subr_param.c
> > @@ -61,11 +61,7 @@ __FBSDID("$FreeBSD$");
> >   */
> >
> >  #ifndef HZ
> > -#  if defined(__mips__) || defined(__arm__) || defined(__riscv)
> > -#    define    HZ 100
> > -#  else
> > -#    define    HZ 1000
> > -#  endif
> > +#  define      HZ 1000
> >  #  ifndef HZ_VM
> >  #    define    HZ_VM 100
> >  #  endif
> >
> > Along with removing HZ from almost all the kernel config files in arm and
> > mips where it is already 1000. I'm agnostic about riscv, so would also be
> > open to just removing the first two clauses from the #if the diff shows me
> > removing.
>
> I think this sounds fine.  On x86 we use hz=100 instead of 1000 in VMs via
> a runtime test.

Why?  Was this done before or after the advent of event timers and the
tickless kernel?  If done before, does it still make sense today?

>   I suspect if anything we might want to take that same factor
> into account here.  Thus, it makes sense for mips MALTA configs (most often
> run under qemu) to use 100, and/or to add a runtime test for qemu that switches
> from 1000 to 100.  Similarly, for riscv we probably want to use 100 under
> qemu and spike, but 1000 on actual hardware via either kernel config options
> or runtime checks.

Why?  I dunno about MALTA, but for riscv, when I look at the code I see
a timer.c that implements one-shot event timers (and actually doesn't
support periodic mode event timers at all).  The timer.c code is
standard, not optional, so one-shot event timers are always available,
and the kernel should always run in tickless mode.  That being the
case, what advantage is there to a lower HZ?

-- Ian


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"