Another ZFS ARC memory question

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Another ZFS ARC memory question

Luke Marsden-2
Hi all,

Just wanted to get your opinion on best practices for ZFS.

We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
but have been having trouble with short spikes in application memory
usage resulting in huge amounts of swapping, bringing the whole machine
to its knees and crashing it hard.  I suspect this is because when there
is a sudden spike in memory usage the zfs arc reclaim thread is unable
to free system memory fast enough.

This most recently happened yesterday as you can see from the following
munin graphs:

E.g. http://hybrid-logic.co.uk/memory-day.png
     http://hybrid-logic.co.uk/swap-day.png

Our response has been to start limiting the ZFS ARC cache to 4GB on our
production machines - trading performance for stability is fine with me
(and we have L2ARC on SSD so we still get good levels of caching).

My questions are:

      * is this a known problem?
      * what is the community's advice for production machines running
        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
        that there's enough actually free memory to handle a spike in
        application memory usage) the best solution to this
        spike-in-memory-means-crash problem?
      * has FreeBSD 9.0 / ZFS v28 solved this problem?
      * rather than setting a hard limit on the ARC cache size, is it
        possible to adjust the auto-tuning variables to leave more free
        memory for spiky memory situations?  e.g. set the auto-tuning to
        make arc eat 80% of memory instead of ~95% like it is at
        present?
      * could the arc reclaim thread be made to drop ARC pages with
        higher priority before the system starts swapping out
        application pages?

Thank you for any/all answers, and thank you for making FreeBSD
awesome :-)

Best Regards,
Luke Marsden

--
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com



_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Tom Evans-3
On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
<[hidden email]> wrote:

> Hi all,
>
> Just wanted to get your opinion on best practices for ZFS.
>
> We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> but have been having trouble with short spikes in application memory
> usage resulting in huge amounts of swapping, bringing the whole machine
> to its knees and crashing it hard.  I suspect this is because when there
> is a sudden spike in memory usage the zfs arc reclaim thread is unable
> to free system memory fast enough.
>
> This most recently happened yesterday as you can see from the following
> munin graphs:
>
> E.g. http://hybrid-logic.co.uk/memory-day.png
>     http://hybrid-logic.co.uk/swap-day.png
>
> Our response has been to start limiting the ZFS ARC cache to 4GB on our
> production machines - trading performance for stability is fine with me
> (and we have L2ARC on SSD so we still get good levels of caching).
>
> My questions are:
>
>      * is this a known problem?
>      * what is the community's advice for production machines running
>        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
>        that there's enough actually free memory to handle a spike in
>        application memory usage) the best solution to this
>        spike-in-memory-means-crash problem?
>      * has FreeBSD 9.0 / ZFS v28 solved this problem?
>      * rather than setting a hard limit on the ARC cache size, is it
>        possible to adjust the auto-tuning variables to leave more free
>        memory for spiky memory situations?  e.g. set the auto-tuning to
>        make arc eat 80% of memory instead of ~95% like it is at
>        present?
>      * could the arc reclaim thread be made to drop ARC pages with
>        higher priority before the system starts swapping out
>        application pages?
>
> Thank you for any/all answers, and thank you for making FreeBSD
> awesome :-)

It's not a problem, it's a feature!

By default the ARC will attempt to cache as much as it can - it
assumes the box is a ZFS filer, and doesn't need RAM for applications.
The solution, as you've found out, is to limit how much ARC can take
up.

In practice, you should be doing this anyway. You should know, or have
an idea, of how much RAM is required for the applications on that box,
and you need to limit ZFS to not eat into that required RAM.

Cheers

Tom
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Luke Marsden-2
On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:

> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> <[hidden email]> wrote:
> > Hi all,
> >
> > Just wanted to get your opinion on best practices for ZFS.
> >
> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> > but have been having trouble with short spikes in application memory
> > usage resulting in huge amounts of swapping, bringing the whole machine
> > to its knees and crashing it hard.  I suspect this is because when there
> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> > to free system memory fast enough.
> >
> > This most recently happened yesterday as you can see from the following
> > munin graphs:
> >
> > E.g. http://hybrid-logic.co.uk/memory-day.png
> >     http://hybrid-logic.co.uk/swap-day.png
> >
> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> > production machines - trading performance for stability is fine with me
> > (and we have L2ARC on SSD so we still get good levels of caching).
> >
> > My questions are:
> >
> >      * is this a known problem?
> >      * what is the community's advice for production machines running
> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >        that there's enough actually free memory to handle a spike in
> >        application memory usage) the best solution to this
> >        spike-in-memory-means-crash problem?
> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> >      * rather than setting a hard limit on the ARC cache size, is it
> >        possible to adjust the auto-tuning variables to leave more free
> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> >        make arc eat 80% of memory instead of ~95% like it is at
> >        present?
> >      * could the arc reclaim thread be made to drop ARC pages with
> >        higher priority before the system starts swapping out
> >        application pages?
> >
> > Thank you for any/all answers, and thank you for making FreeBSD
> > awesome :-)
>
> It's not a problem, it's a feature!
>
> By default the ARC will attempt to cache as much as it can - it
> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> The solution, as you've found out, is to limit how much ARC can take
> up.
>
> In practice, you should be doing this anyway. You should know, or have
> an idea, of how much RAM is required for the applications on that box,
> and you need to limit ZFS to not eat into that required RAM.

Thanks for your reply, Tom!  I agree that the ARC cache is a great
feature, but for a general purpose filesystem it does seem like a
reasonable expectation that filesystem cache will be evicted before
application data is swapped, even if the spike in memory usage is rather
aggressive.  A complete server crash in this scenario is rather
unfortunate.

My question stands - is this an area which has been improved on in the
ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
standard practice to guess how much memory the applications running on
the server might need and set the arc_max boot.loader tweak
appropriately?  This is reasonably tricky when providing general purpose
web application hosting and so we'll often end up erring on the side of
caution and leaving lots of RAM free "just in case".

If the latter is indeed the case in the latest stable releases then I
would like to update http://wiki.freebsd.org/ZFSTuningGuide which
currently states:

        FreeBSD 7.2+ has improved kernel memory allocation strategy and
        no tuning may be necessary on systems with more than 2 GB of
        RAM.

Thank you!

Best Regards,
Luke Marsden

--
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Tom Evans-3
On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
<[hidden email]> wrote:

> On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
>> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
>> <[hidden email]> wrote:
>> > Hi all,
>> >
>> > Just wanted to get your opinion on best practices for ZFS.
>> >
>> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
>> > but have been having trouble with short spikes in application memory
>> > usage resulting in huge amounts of swapping, bringing the whole machine
>> > to its knees and crashing it hard.  I suspect this is because when there
>> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
>> > to free system memory fast enough.
>> >
>> > This most recently happened yesterday as you can see from the following
>> > munin graphs:
>> >
>> > E.g. http://hybrid-logic.co.uk/memory-day.png
>> >     http://hybrid-logic.co.uk/swap-day.png
>> >
>> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
>> > production machines - trading performance for stability is fine with me
>> > (and we have L2ARC on SSD so we still get good levels of caching).
>> >
>> > My questions are:
>> >
>> >      * is this a known problem?
>> >      * what is the community's advice for production machines running
>> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
>> >        that there's enough actually free memory to handle a spike in
>> >        application memory usage) the best solution to this
>> >        spike-in-memory-means-crash problem?
>> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
>> >      * rather than setting a hard limit on the ARC cache size, is it
>> >        possible to adjust the auto-tuning variables to leave more free
>> >        memory for spiky memory situations?  e.g. set the auto-tuning to
>> >        make arc eat 80% of memory instead of ~95% like it is at
>> >        present?
>> >      * could the arc reclaim thread be made to drop ARC pages with
>> >        higher priority before the system starts swapping out
>> >        application pages?
>> >
>> > Thank you for any/all answers, and thank you for making FreeBSD
>> > awesome :-)
>>
>> It's not a problem, it's a feature!
>>
>> By default the ARC will attempt to cache as much as it can - it
>> assumes the box is a ZFS filer, and doesn't need RAM for applications.
>> The solution, as you've found out, is to limit how much ARC can take
>> up.
>>
>> In practice, you should be doing this anyway. You should know, or have
>> an idea, of how much RAM is required for the applications on that box,
>> and you need to limit ZFS to not eat into that required RAM.
>
> Thanks for your reply, Tom!  I agree that the ARC cache is a great
> feature, but for a general purpose filesystem it does seem like a
> reasonable expectation that filesystem cache will be evicted before
> application data is swapped, even if the spike in memory usage is rather
> aggressive.  A complete server crash in this scenario is rather
> unfortunate.
>
> My question stands - is this an area which has been improved on in the
> ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> standard practice to guess how much memory the applications running on
> the server might need and set the arc_max boot.loader tweak
> appropriately?  This is reasonably tricky when providing general purpose
> web application hosting and so we'll often end up erring on the side of
> caution and leaving lots of RAM free "just in case".
>
> If the latter is indeed the case in the latest stable releases then I
> would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> currently states:
>
>        FreeBSD 7.2+ has improved kernel memory allocation strategy and
>        no tuning may be necessary on systems with more than 2 GB of
>        RAM.
>
> Thank you!
>
> Best Regards,
> Luke Marsden
>

Hmm. That comment is really talking about that you no longer need to
tune vm.kmem_size.

I get what you are saying about applications suddenly using a lot of
RAM should not cause the server to fall over. Do you know why it fell
over? IE, was it a panic, a deadlock, etc.

FreeBSD does not cope well when you have used up all RAM and swap
(well, what does?), and from your graphs it does look like the ARC is
not super massive when you had the problem - around 30-40% of RAM?

Cheers

Tom
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Luke Marsden-2
On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:

> On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> <[hidden email]> wrote:
> > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> >> <[hidden email]> wrote:
> >> > Hi all,
> >> >
> >> > Just wanted to get your opinion on best practices for ZFS.
> >> >
> >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> >> > but have been having trouble with short spikes in application memory
> >> > usage resulting in huge amounts of swapping, bringing the whole machine
> >> > to its knees and crashing it hard.  I suspect this is because when there
> >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> >> > to free system memory fast enough.
> >> >
> >> > This most recently happened yesterday as you can see from the following
> >> > munin graphs:
> >> >
> >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> >> >     http://hybrid-logic.co.uk/swap-day.png
> >> >
> >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> >> > production machines - trading performance for stability is fine with me
> >> > (and we have L2ARC on SSD so we still get good levels of caching).
> >> >
> >> > My questions are:
> >> >
> >> >      * is this a known problem?
> >> >      * what is the community's advice for production machines running
> >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >> >        that there's enough actually free memory to handle a spike in
> >> >        application memory usage) the best solution to this
> >> >        spike-in-memory-means-crash problem?
> >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> >> >      * rather than setting a hard limit on the ARC cache size, is it
> >> >        possible to adjust the auto-tuning variables to leave more free
> >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> >> >        make arc eat 80% of memory instead of ~95% like it is at
> >> >        present?
> >> >      * could the arc reclaim thread be made to drop ARC pages with
> >> >        higher priority before the system starts swapping out
> >> >        application pages?
> >> >
> >> > Thank you for any/all answers, and thank you for making FreeBSD
> >> > awesome :-)
> >>
> >> It's not a problem, it's a feature!
> >>
> >> By default the ARC will attempt to cache as much as it can - it
> >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> >> The solution, as you've found out, is to limit how much ARC can take
> >> up.
> >>
> >> In practice, you should be doing this anyway. You should know, or have
> >> an idea, of how much RAM is required for the applications on that box,
> >> and you need to limit ZFS to not eat into that required RAM.
> >
> > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > feature, but for a general purpose filesystem it does seem like a
> > reasonable expectation that filesystem cache will be evicted before
> > application data is swapped, even if the spike in memory usage is rather
> > aggressive.  A complete server crash in this scenario is rather
> > unfortunate.
> >
> > My question stands - is this an area which has been improved on in the
> > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > standard practice to guess how much memory the applications running on
> > the server might need and set the arc_max boot.loader tweak
> > appropriately?  This is reasonably tricky when providing general purpose
> > web application hosting and so we'll often end up erring on the side of
> > caution and leaving lots of RAM free "just in case".
> >
> > If the latter is indeed the case in the latest stable releases then I
> > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > currently states:
> >
> >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> >        no tuning may be necessary on systems with more than 2 GB of
> >        RAM.
> >
> > Thank you!
> >
> > Best Regards,
> > Luke Marsden
> >
>
> Hmm. That comment is really talking about that you no longer need to
> tune vm.kmem_size.

http://wiki.freebsd.org/ZFSTuningGuide

"No tuning may be necessary" seems to indicate that no changes need to
be made to boot.loader.  I'm happy to provide a patch for the wiki which
makes it clearer that for servers which may experience sudden spikes in
application memory usage (i.e. all servers running user-supplied
applications), the speed of ARC eviction is insufficient to ensure
stability and arc_max should be tuned downwards.

> I get what you are saying about applications suddenly using a lot of
> RAM should not cause the server to fall over. Do you know why it fell
> over? IE, was it a panic, a deadlock, etc.

If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
see a huge spike in swap at the point at which the last line of pixels
at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
increase in memory usage (by 3GB in active memory usage if you look
closely).  Since the graph stops at that point it indicates that the
server became completely unresponsive (e.g. including munin probe
requests).  I did manage to log in just before it became completely
unresponsive, but at that point the incoming requests weren't being
serviced fast enough due to the excessive swapping and the server
eventually became completely unresponsive (e.g. 'top' output froze and
never came back).  It continued to respond to pings though and may have
eventually recovered if I had disabled inbound network traffic.  I don't
have any evidence of a panic or deadlock, we just hard rebooted the
machine about 15 minutes later after it failed to recover from the
swap-storm.

> FreeBSD does not cope well when you have used up all RAM and swap
> (well, what does?), and from your graphs it does look like the ARC is
> not super massive when you had the problem - around 30-40% of RAM?

The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
35%.  I guess what I'd like is for FreeBSD to detect an emergency
out-of-memory condition and aggressively drop much or all of the ARC
cache *before* swapping out application memory which causes the system
to grind to a halt.

Is this a reasonable request, and is there anything I can do to help
implement it?

If not can we update the wiki to make it clearer that ARC limiting is
necessary, even with high RAM boxes, to ensure stability under spiky
memory conditions?

Thanks!

Best Regards,
Luke Marsden

--
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Ian Downes-2
On Fri, Feb 24, 2012 at 01:42:14PM +0000, Luke Marsden wrote:

> On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:
> > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> > <[hidden email]> wrote:
> > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> > >> <[hidden email]> wrote:
> > >> > Hi all,
> > >> >
> > >> > Just wanted to get your opinion on best practices for ZFS.
> > >> >
> > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> > >> > but have been having trouble with short spikes in application memory
> > >> > usage resulting in huge amounts of swapping, bringing the whole machine
> > >> > to its knees and crashing it hard.  I suspect this is because when there
> > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> > >> > to free system memory fast enough.
> > >> >
> > >> > This most recently happened yesterday as you can see from the following
> > >> > munin graphs:
> > >> >
> > >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> > >> >     http://hybrid-logic.co.uk/swap-day.png
> > >> >
> > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> > >> > production machines - trading performance for stability is fine with me
> > >> > (and we have L2ARC on SSD so we still get good levels of caching).
> > >> >
> > >> > My questions are:
> > >> >
> > >> >      * is this a known problem?
> > >> >      * what is the community's advice for production machines running
> > >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> > >> >        that there's enough actually free memory to handle a spike in
> > >> >        application memory usage) the best solution to this
> > >> >        spike-in-memory-means-crash problem?
> > >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> > >> >      * rather than setting a hard limit on the ARC cache size, is it
> > >> >        possible to adjust the auto-tuning variables to leave more free
> > >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> > >> >        make arc eat 80% of memory instead of ~95% like it is at
> > >> >        present?
> > >> >      * could the arc reclaim thread be made to drop ARC pages with
> > >> >        higher priority before the system starts swapping out
> > >> >        application pages?
> > >> >
> > >> > Thank you for any/all answers, and thank you for making FreeBSD
> > >> > awesome :-)
> > >>
> > >> It's not a problem, it's a feature!
> > >>
> > >> By default the ARC will attempt to cache as much as it can - it
> > >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> > >> The solution, as you've found out, is to limit how much ARC can take
> > >> up.
> > >>
> > >> In practice, you should be doing this anyway. You should know, or have
> > >> an idea, of how much RAM is required for the applications on that box,
> > >> and you need to limit ZFS to not eat into that required RAM.
> > >
> > > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > > feature, but for a general purpose filesystem it does seem like a
> > > reasonable expectation that filesystem cache will be evicted before
> > > application data is swapped, even if the spike in memory usage is rather
> > > aggressive.  A complete server crash in this scenario is rather
> > > unfortunate.
> > >
> > > My question stands - is this an area which has been improved on in the
> > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > > standard practice to guess how much memory the applications running on
> > > the server might need and set the arc_max boot.loader tweak
> > > appropriately?  This is reasonably tricky when providing general purpose
> > > web application hosting and so we'll often end up erring on the side of
> > > caution and leaving lots of RAM free "just in case".
> > >
> > > If the latter is indeed the case in the latest stable releases then I
> > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > > currently states:
> > >
> > >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> > >        no tuning may be necessary on systems with more than 2 GB of
> > >        RAM.
> > >
> > > Thank you!
> > >
> > > Best Regards,
> > > Luke Marsden
> > >
> >
> > Hmm. That comment is really talking about that you no longer need to
> > tune vm.kmem_size.
>
> http://wiki.freebsd.org/ZFSTuningGuide
>
> "No tuning may be necessary" seems to indicate that no changes need to
> be made to boot.loader.  I'm happy to provide a patch for the wiki which
> makes it clearer that for servers which may experience sudden spikes in
> application memory usage (i.e. all servers running user-supplied
> applications), the speed of ARC eviction is insufficient to ensure
> stability and arc_max should be tuned downwards.
>
> > I get what you are saying about applications suddenly using a lot of
> > RAM should not cause the server to fall over. Do you know why it fell
> > over? IE, was it a panic, a deadlock, etc.
>
> If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
> see a huge spike in swap at the point at which the last line of pixels
> at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
> increase in memory usage (by 3GB in active memory usage if you look
> closely).  Since the graph stops at that point it indicates that the
> server became completely unresponsive (e.g. including munin probe
> requests).  I did manage to log in just before it became completely
> unresponsive, but at that point the incoming requests weren't being
> serviced fast enough due to the excessive swapping and the server
> eventually became completely unresponsive (e.g. 'top' output froze and
> never came back).  It continued to respond to pings though and may have
> eventually recovered if I had disabled inbound network traffic.  I don't
> have any evidence of a panic or deadlock, we just hard rebooted the
> machine about 15 minutes later after it failed to recover from the
> swap-storm.
>
> > FreeBSD does not cope well when you have used up all RAM and swap
> > (well, what does?), and from your graphs it does look like the ARC is
> > not super massive when you had the problem - around 30-40% of RAM?
>
> The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
> 35%.  I guess what I'd like is for FreeBSD to detect an emergency
> out-of-memory condition and aggressively drop much or all of the ARC
> cache *before* swapping out application memory which causes the system
> to grind to a halt.
>
> Is this a reasonable request, and is there anything I can do to help
> implement it?
>
> If not can we update the wiki to make it clearer that ARC limiting is
> necessary, even with high RAM boxes, to ensure stability under spiky
> memory conditions?
>

Are you sure that it is the ARC data that is causing the issue? I've got
boxes where the ARC *meta* skyrockets and consumes all RAM, greatly
exceeding the arc_meta_limit. E.g. on a very unresponsive local box:

vfs.zfs.arc_meta_limit: 1610612736
vfs.zfs.arc_meta_used: 12183379056

Setting arc_max helps (and seems to be respected), but I don't know why
arc_meta_used exceeds arc_meta_limit.

Ian
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Peter Jeremy-6
In reply to this post by Luke Marsden-2
On 2012-Feb-24 11:06:52 +0000, Luke Marsden <[hidden email]> wrote:
>We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
>but have been having trouble with short spikes in application memory
>usage resulting in huge amounts of swapping, bringing the whole machine
>to its knees and crashing it hard.  I suspect this is because when there
>is a sudden spike in memory usage the zfs arc reclaim thread is unable
>to free system memory fast enough.

There were a large number of fairly serious ZFS bugs that have been
fixed since 8.2-RELEASE and I would suggest you look at upgrading.
That said, I haven't seen the specific problem you are reporting.

>      * is this a known problem?

I'm unaware of it specifically as it relates to ZFS.  You don't mention
how big the memory usage spike is but unless there is sufficient free+
cache available to cope with a usage spike then you will have problems
whether it's UFS or ZFS (though it's possibly worse with ZFS).
FreeBSD is known not to cope well with running out of memory.

>      * what is the community's advice for production machines running
>        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
>        that there's enough actually free memory to handle a spike in
>        application memory usage) the best solution to this
>        spike-in-memory-means-crash problem?

Are you swapping onto a ZFS vdev?  If so, change back to a raw (or
geom) device - swapping to ZFS is known to be problematic.  If you
have very spiky memory requirements, increasing vm.v_cache_min and/or
vm.v_free_reserved might give you better results.

>      * has FreeBSD 9.0 / ZFS v28 solved this problem?

The ZFS code is the same in 9.0 and 8.3.  Since 8.3 is less of a jump,
I'd recommend that you try 8.3-prerelease in a test box and see how
it handles your load.  Note that there's no need to upgrade your pools
from v15 to v28 unless you want the ZFS features - the actual ZFS
code is independent of pool version.

>      * rather than setting a hard limit on the ARC cache size, is it
>        possible to adjust the auto-tuning variables to leave more free
>        memory for spiky memory situations?  e.g. set the auto-tuning to
>        make arc eat 80% of memory instead of ~95% like it is at
>        present?

Memory spikes are absorbed by vm.v_cache_min and vm.v_free_reserved in
the first instance.  The current vfs.zfs.arc_max default may be a bit
high for some workloads but at this point in time, you will need to
tune it manually.

--
Peter Jeremy

attachment0 (203 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Slawa Olhovchenkov
On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote:

> >      * what is the community's advice for production machines running
> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >        that there's enough actually free memory to handle a spike in
> >        application memory usage) the best solution to this
> >        spike-in-memory-means-crash problem?
>
> Are you swapping onto a ZFS vdev?  If so, change back to a raw (or
> geom) device - swapping to ZFS is known to be problematic.  If you

I see kernel stuck when swapping to ZFS. This is only known problem?
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Alexander Leidinger
Quoting Slawa Olhovchenkov <[hidden email]> (from Thu, 1 Mar 2012  
18:28:26 +0400):

> On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote:
>
>> >      * what is the community's advice for production machines running
>> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
>> >        that there's enough actually free memory to handle a spike in
>> >        application memory usage) the best solution to this
>> >        spike-in-memory-means-crash problem?
>>
>> Are you swapping onto a ZFS vdev?  If so, change back to a raw (or
>> geom) device - swapping to ZFS is known to be problematic.  If you
>
> I see kernel stuck when swapping to ZFS. This is only known problem?

This is a known problem. Don't use swap on a zpool. If you want fault  
tollerance use gmirror for the swap paritions instead (make sure the  
swap partition does end _before_ the last sector of the disk in this  
case).

Bye,
Alexander.

--
As of next Thursday, UNIX will be flushed in favor of TOPS-10.
Please update your programs.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Luke Marsden-2
On Fri, 2012-03-02 at 10:25 +0100, Alexander Leidinger wrote:

> Quoting Slawa Olhovchenkov <[hidden email]> (from Thu, 1 Mar 2012  
> 18:28:26 +0400):
>
> > On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote:
> >
> >> >      * what is the community's advice for production machines running
> >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> >> >        that there's enough actually free memory to handle a spike in
> >> >        application memory usage) the best solution to this
> >> >        spike-in-memory-means-crash problem?
> >>
> >> Are you swapping onto a ZFS vdev?

We are not swapping onto a ZFS vdev (we've been down that road and know
it's a bad idea).  Our issue is primarily with ARC cache eviction not
happening fast enough or at all when there is a spike in memory usage,
causing machines to hang.

We are presently working around it by limiting arc_max to 4G on our 24G
RAM production boxes (which seems like a massive waste of performance)
and by doing very careful/aggressive application level management of
memory usage to ensure stability (limits.conf didn't work for us, so we
rolled our own).  A better solution would be welcome, though, so that we
can utilise all the free memory we're presently keeping around as a
safety margin - ideally it would be used as ARC cache.

Two more questions, again wrt 8.2-RELEASE:

1.  Is it expected that with a 4G limited arc_max value that we should
see wired memory usage around 7-8G?  I understand that the kernel has to
use some memory, but really 3-4G of non-ARC data?

2.  We have some development machines with only 3G of RAM.  Previously
they had no arc_max set and were left to tune themselves.  They were
quite unstable.  Now we've set arc_max to 256M but things have got
worse: we've seen a disk i/o big performance hit (untarring a ports
tarball now takes 20 minutes), and wired memory usage is up around
2.5GB, the machines are swapping a lot, and crashing more frequently.
Follows is arc_summary.pl from one of the troubled dev machines showing
the ARC using 500% of the memory it should be.  Also uname follows.  My
second question is: have there been fixes between 8.2-RELEASE and
8.3-BETA1 or 9.0-RELEASE which solve this ARC over-usage problem?

hybrid@node5:~$ ./arc_summary.pl

------------------------------------------------------------------------
ZFS Subsystem Report                            Fri Mar  2 09:55:00 2012
------------------------------------------------------------------------

System Memory:

        8.92%   264.89  MiB Active,     6.43%   190.75  MiB Inact
        80.91%  2.35    GiB Wired,      1.97%   58.46   MiB Cache
        1.74%   51.70   MiB Free,       0.03%   864.00  KiB Gap

        Real Installed:                         3.00    GiB
        Real Available:                 99.56%  2.99    GiB
        Real Managed:                   97.04%  2.90    GiB

        Logical Total:                          3.00    GiB
        Logical Used:                   90.20%  2.71    GiB
        Logical Free:                   9.80%   300.91  MiB

Kernel Memory:                                  1.08    GiB
        Data:                           98.75%  1.06    GiB
        Text:                           1.25%   13.76   MiB

Kernel Memory Map:                              2.83    GiB
        Size:                           26.80%  775.56  MiB
        Free:                           73.20%  2.07    GiB
                                                                Page:  1
------------------------------------------------------------------------

ARC Summary: (THROTTLED)
        Storage pool Version:                   15
        Filesystem Version:                     4
        Memory Throttle Count:                  53.77m

ARC Misc:
        Deleted:                                1.99m
        Recycle Misses:                         6.84m
        Mutex Misses:                           6.96k
        Evict Skips:                            6.96k

ARC Size:                               552.16% 1.38    GiB
        Target Size: (Adaptive)         100.00% 256.00  MiB
        Min Size (Hard Limit):          36.23%  92.75   MiB
        Max Size (High Water):          2:1     256.00  MiB

ARC Size Breakdown:
        Recently Used Cache Size:       16.97%  239.90  MiB
        Frequently Used Cache Size:     83.03%  1.15    GiB

ARC Hash Breakdown:
        Elements Max:                           83.19k
        Elements Current:               84.72%  70.48k
        Collisions:                             2.53m
        Chain Max:                              9
        Chains:                                 18.94k
                                                                Page:  2
------------------------------------------------------------------------

ARC Efficiency:                                 126.65m
        Cache Hit Ratio:                95.07%  120.41m
        Cache Miss Ratio:               4.93%   6.24m
        Actual Hit Ratio:               95.07%  120.41m

        Data Demand Efficiency:         99.45%  111.87m
        Data Prefetch Efficiency:       0.00%   235.34k

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           4.14%   4.99m
          Most Frequently Used:         95.85%  115.42m
          Most Recently Used Ghost:     0.24%   292.53k
          Most Frequently Used Ghost:   3.73%   4.50m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  92.40%  111.26m
          Prefetch Data:                0.00%   0
          Demand Metadata:              7.60%   9.15m
          Prefetch Metadata:            0.00%   2.73k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  9.79%   610.82k
          Prefetch Data:                3.77%   235.34k
          Demand Metadata:              85.67%  5.35m
          Prefetch Metadata:            0.78%   48.47k
                                                                Page:  3
------------------------------------------------------------------------

VDEV Cache Summary:                             5.33m
        Hit Ratio:                      91.14%  4.86m
        Miss Ratio:                     8.59%   458.07k
        Delegations:                    0.27%   14.34k
                                                                Page:  6
------------------------------------------------------------------------

ZFS Tunable (sysctl):
        kern.maxusers                           384
        vm.kmem_size                            3112275968
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        329853485875
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_lsize            4866048
        vfs.zfs.mfu_ghost_metadata_lsize        185315328
        vfs.zfs.mfu_ghost_size                  190181376
        vfs.zfs.mfu_data_lsize                  4608
        vfs.zfs.mfu_metadata_lsize              3072
        vfs.zfs.mfu_size                        254041600
        vfs.zfs.mru_ghost_data_lsize            0
        vfs.zfs.mru_ghost_metadata_lsize        0
        vfs.zfs.mru_ghost_size                  0
        vfs.zfs.mru_data_lsize                  0
        vfs.zfs.mru_metadata_lsize              0
        vfs.zfs.mru_size                        520685568
        vfs.zfs.anon_data_lsize                 0
        vfs.zfs.anon_metadata_lsize             0
        vfs.zfs.anon_size                       20846592
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                0
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  67108864
        vfs.zfs.arc_meta_used                   1479184192
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.arc_min                         97258624
        vfs.zfs.arc_max                         268435456
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.block_cap                256
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                1
        vfs.zfs.check_hostid                    1
        vfs.zfs.recover                         0
        vfs.zfs.txg.write_limit_override        0
        vfs.zfs.txg.synctime                    5
        vfs.zfs.txg.timeout                     30
        vfs.zfs.scrub_limit                     10
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 10485760
        vfs.zfs.vdev.cache.max                  16384
        vfs.zfs.vdev.aggregation_limit          131072
        vfs.zfs.vdev.ramp_rate                  2
        vfs.zfs.vdev.time_shift                 6
        vfs.zfs.vdev.min_pending                4
        vfs.zfs.vdev.max_pending                10
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.zil_disable                     0
        vfs.zfs.zio.use_uma                     0
        vfs.zfs.version.zpl                     4
        vfs.zfs.version.spa                     15
        vfs.zfs.version.dmu_backup_stream       1
        vfs.zfs.version.dmu_backup_header       2
        vfs.zfs.version.acl                     1
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
                                                                Page:  7
------------------------------------------------------------------------

hybrid@node5:~$ uname -a
FreeBSD node5.hybridlogiclabs.com 8.2-RELEASE FreeBSD 8.2-RELEASE #0:
Thu Feb 17 02:41:51 UTC 2011
[hidden email]:/usr/obj/usr/src/sys/GENERIC  amd64

Thanks!
Luke Marsden

--
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Another ZFS ARC memory question

Peter Jeremy-6
[Cc list pruned]

On 2012-Mar-02 10:16:06 +0000, Luke Marsden <[hidden email]> wrote:
>We are presently working around it by limiting arc_max to 4G on our 24G
>RAM production boxes (which seems like a massive waste of performance)
>and by doing very careful/aggressive application level management of
>memory usage to ensure stability (limits.conf didn't work for us, so we
>rolled our own).  A better solution would be welcome, though, so that we
>can utilise all the free memory we're presently keeping around as a
>safety margin - ideally it would be used as ARC cache.

Have you tried increasing vm.v_cache_min to cover your spikes?

>1.  Is it expected that with a 4G limited arc_max value that we should
>see wired memory usage around 7-8G?  I understand that the kernel has to
>use some memory, but really 3-4G of non-ARC data?

Yes, that sounds possible.

>2.  We have some development machines with only 3G of RAM.  Previously
>they had no arc_max set and were left to tune themselves.  They were
>quite unstable.  Now we've set arc_max to 256M but things have got
>worse: we've seen a disk i/o big performance hit (untarring a ports
>tarball now takes 20 minutes), and wired memory usage is up around
>2.5GB, the machines are swapping a lot, and crashing more frequently.

That's stress-testing ZFS more than anything else.  You definitely
can't use those results as a guide to tune your production boxes
(other than what not to do).  That said, I have 3.5GB in my $work
desktop (running 8.2-stable from about a month ago) and don't have
any stability issues with either it or a buildbox with 2GB RAM.

>Follows is arc_summary.pl from one of the troubled dev machines showing
>the ARC using 500% of the memory it should be.  Also uname follows.  My
>second question is: have there been fixes between 8.2-RELEASE and
>8.3-BETA1 or 9.0-RELEASE which solve this ARC over-usage problem?

There definitely have been some commits to ensure that arc_max is
treated much more as a hard limit but I can't quickly find them so
I'm not sure if they pre- or post-date 8.2-RELEASE.

--
Peter Jeremy

attachment0 (203 bytes) Download Attachment