|
Hello folks,
I'm trying to discover if ZFS under FreeBSD will automatically pull in a hot spare if one is required. This raised the issue back in March 2010, and refers to a PR opened in May 2009 * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 In turn, the PR refers to this March 2010 post referring to using devd to accomplish this task. http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html Does the above represent the the current state? I ask because I just ordered two more HDD to use as spares. Whether they sit on the shelf or in the box is open to discussion. -- Dan Langille - http://langille.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On 04/01/2011 03:08, Dan Langille wrote:
> Hello folks, > > I'm trying to discover if ZFS under FreeBSD will automatically pull in a > hot spare if one is required. > > This raised the issue back in March 2010, and refers to a PR opened in > May 2009 > > * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html > * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 > > In turn, the PR refers to this March 2010 post referring to using devd > to accomplish this task. > > http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html > > Does the above represent the the current state? > > I ask because I just ordered two more HDD to use as spares. Whether they > sit on the shelf or in the box is open to discussion. As far as our testing could discover, it's not automatic. I wrote some Ugly Perl that's called by devd when it spots a drive-fail event, which seemed to DTRT when simulating a failure by pulling a drive. -- JH-R _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On 1/4/2011 11:52 AM, John Hawkes-Reed wrote:
> On 04/01/2011 03:08, Dan Langille wrote: >> Hello folks, >> >> I'm trying to discover if ZFS under FreeBSD will automatically pull in a >> hot spare if one is required. >> >> This raised the issue back in March 2010, and refers to a PR opened in >> May 2009 >> >> * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html >> * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 >> >> In turn, the PR refers to this March 2010 post referring to using devd >> to accomplish this task. >> >> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html >> >> Does the above represent the the current state? >> >> I ask because I just ordered two more HDD to use as spares. Whether they >> sit on the shelf or in the box is open to discussion. > > As far as our testing could discover, it's not automatic. > > I wrote some Ugly Perl that's called by devd when it spots a drive-fail > event, which seemed to DTRT when simulating a failure by pulling a drive. Without such a script, what is the value in creating hot spares? -- Dan Langille - http://langille.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
Dan Langille wrote: > On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: [...] >> As far as our testing could discover, it's not automatic. >> >> I wrote some Ugly Perl that's called by devd when it spots a drive-fail >> event, which seemed to DTRT when simulating a failure by pulling a drive. > > Without such a script, what is the value in creating hot spares? IMHO hot spares are totally useless in the current state (in FreeBSD). I think there should be some strong warning somewhere (in man zpool?). Some users can be misleaded otherwise. Miroslav Lachman _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Dan Langille
On 11/01/2011 03:38, Dan Langille wrote:
> On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: >> On 04/01/2011 03:08, Dan Langille wrote: >>> Hello folks, >>> >>> I'm trying to discover if ZFS under FreeBSD will automatically pull in a >>> hot spare if one is required. >>> >>> This raised the issue back in March 2010, and refers to a PR opened in >>> May 2009 >>> >>> * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html >>> * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 >>> >>> In turn, the PR refers to this March 2010 post referring to using devd >>> to accomplish this task. >>> >>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html >>> >>> Does the above represent the the current state? >>> >>> I ask because I just ordered two more HDD to use as spares. Whether they >>> sit on the shelf or in the box is open to discussion. >> >> As far as our testing could discover, it's not automatic. >> >> I wrote some Ugly Perl that's called by devd when it spots a drive-fail >> event, which seemed to DTRT when simulating a failure by pulling a drive. > > Without such a script, what is the value in creating hot spares? We went through that loop in the office. We're used to the way the Netapps work here, where often one's first notice of a failed disk is a visit from the courier with a replacement. (I'm only half joking) In the end, writing enough perl to swap in the spare disk made much more sense than paging the relevant admin on disk-fail and expecting them to be able to type straight at 4AM. Our thinking is that having a hot spare allows us to do the physical disk-swap in office hours, rather than (for instance) running in a degraded state over a long weekend. If it's of interest, I'll see if I can share the code. -- JH-R _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
Interesting, I was just testing Solaris 11 Express's ability to handle a pulled drive today. It handles it quite well. However, my Areca 1880 drive (arcmsr0) crashes when you reinsert the drive.. but that's another topic, and an issue for Areca tech support..
..back to the point: Solaris runs a separate process called Fault Management Daemon (fmd) that looks to handle this logic - This means that it's really not inside the ZFS code to handle this, and FreeBSD would need something similar, hopefully less kludgy than a user script. I wonder if anyone has been eyeing the fma code in the cddl with a thought for porting it - It looks to be a really neat bit of code - I'm still quite new with it, having only been working with Solaris the last few months. Here's two links to a bit of info on the Solaris daemon: http://www.princeton.edu/~unix/Solaris/troubleshoot/fm.html http://hub.opensolaris.org/bin/view/Community+Group+fm/ Here's my log of the event in Solaris 11 Express: Jan 12 21:28:47 solaris fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major Jan 12 21:28:47 solaris EVENT-TIME: Wed Jan 12 21:28:47 UTC 2011 Jan 12 21:28:47 solaris PLATFORM: PowerEdge-T710, CSN: 39SLQN1, HOSTNAME: solaris Jan 12 21:28:47 solaris SOURCE: zfs-diagnosis, REV: 1.0 Jan 12 21:28:47 solaris EVENT-ID: ccfa7a23-838b-ebc8-decf-c2607afb390d Jan 12 21:28:47 solaris DESC: The number of I/O errors associated with a ZFS device exceeded Jan 12 21:28:47 solaris acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. Jan 12 21:28:47 solaris AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt Jan 12 21:28:47 solaris will be made to activate a hot spare if available. Jan 12 21:28:47 solaris IMPACT: Fault tolerance of the pool may be compromised. Jan 12 21:28:47 solaris REC-ACTION: Run 'zpool status -x' and replace the bad device. -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of John Hawkes-Reed Sent: Tuesday, January 11, 2011 12:11 PM To: Dan Langille Cc: freebsd-stable Subject: Re: ZFS - hot spares : automatic or not? On 11/01/2011 03:38, Dan Langille wrote: > On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: >> On 04/01/2011 03:08, Dan Langille wrote: >>> Hello folks, >>> >>> I'm trying to discover if ZFS under FreeBSD will automatically pull in a >>> hot spare if one is required. >>> >>> This raised the issue back in March 2010, and refers to a PR opened in >>> May 2009 >>> >>> * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html >>> * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 >>> >>> In turn, the PR refers to this March 2010 post referring to using devd >>> to accomplish this task. >>> >>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html >>> >>> Does the above represent the the current state? >>> >>> I ask because I just ordered two more HDD to use as spares. Whether they >>> sit on the shelf or in the box is open to discussion. >> >> As far as our testing could discover, it's not automatic. >> >> I wrote some Ugly Perl that's called by devd when it spots a drive-fail >> event, which seemed to DTRT when simulating a failure by pulling a drive. > > Without such a script, what is the value in creating hot spares? We went through that loop in the office. We're used to the way the Netapps work here, where often one's first notice of a failed disk is a visit from the courier with a replacement. (I'm only half joking) In the end, writing enough perl to swap in the spare disk made much more sense than paging the relevant admin on disk-fail and expecting them to be able to type straight at 4AM. Our thinking is that having a hot spare allows us to do the physical disk-swap in office hours, rather than (for instance) running in a degraded state over a long weekend. If it's of interest, I'll see if I can share the code. -- JH-R _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On 01/12/11 19:32, Chris Forgeron wrote:
> Interesting, I was just testing Solaris 11 Express's ability to handle a pulled drive today. It handles it quite well. However, my Areca 1880 drive (arcmsr0) crashes when you reinsert the drive.. but that's another topic, and an issue for Areca tech support.. > > ..back to the point: > > Solaris runs a separate process called Fault Management Daemon (fmd) that looks to handle this logic - This means that it's really not inside the ZFS code to handle this, and FreeBSD would need something similar, hopefully less kludgy than a user script. > > I wonder if anyone has been eyeing the fma code in the cddl with a thought for porting it - It looks to be a really neat bit of code - I'm still quite new with it, having only been working with Solaris the last few months. > > Here's two links to a bit of info on the Solaris daemon: > > http://www.princeton.edu/~unix/Solaris/troubleshoot/fm.html > http://hub.opensolaris.org/bin/view/Community+Group+fm/ > > > Here's my log of the event in Solaris 11 Express: > > Jan 12 21:28:47 solaris fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major > Jan 12 21:28:47 solaris EVENT-TIME: Wed Jan 12 21:28:47 UTC 2011 > Jan 12 21:28:47 solaris PLATFORM: PowerEdge-T710, CSN: 39SLQN1, HOSTNAME: solaris > Jan 12 21:28:47 solaris SOURCE: zfs-diagnosis, REV: 1.0 > Jan 12 21:28:47 solaris EVENT-ID: ccfa7a23-838b-ebc8-decf-c2607afb390d > Jan 12 21:28:47 solaris DESC: The number of I/O errors associated with a ZFS device exceeded > Jan 12 21:28:47 solaris acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. > Jan 12 21:28:47 solaris AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt > Jan 12 21:28:47 solaris will be made to activate a hot spare if available. > Jan 12 21:28:47 solaris IMPACT: Fault tolerance of the pool may be compromised. > Jan 12 21:28:47 solaris REC-ACTION: Run 'zpool status -x' and replace the bad device. After a cursory glance at their fault-management infrastructure, I noticed that it also deals with other kinds of stuff like CPU and memory problems, which might make a port painful or impractical. Would the people with custom hot-spare scripts, or nothing automated at all, be content if the sysutils/geomWatch program grew support for hot spares in a future version? I already became somewhat familiar with the userland ZFS API when I added ZFS support to it. -Boris _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
I think we'd be happy with whatever solution someone was kind enough to donate the time towards.
Although, stripping the Solaris FMD stuff down to just the ZFS parts would help keep Solaris/FreeBSD a bit closer in their ZFS implementations, which is of arguable importance, but I do like standardization. Eventually porting more of the FMD may be really useful, Solaris has a lot of very handy things in it that impress me. ..then again I'm not volunteering the time to do it, so I don't have much say. :-) -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of Boris Kochergin Sent: Wednesday, January 12, 2011 8:51 PM To: Chris Forgeron Cc: freebsd-stable Subject: Re: ZFS - hot spares : automatic or not? >After a cursory glance at their fault-management infrastructure, I >noticed that it also deals with other kinds of stuff like CPU and memory >problems, which might make a port painful or impractical. Would the >people with custom hot-spare scripts, or nothing automated at all, be >content if the sysutils/geomWatch program grew support for hot spares in >a future version? I already became somewhat familiar with the userland >ZFS API when I added ZFS support to it. > >-Boris _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by Boris Kochergin
Quoting Boris Kochergin <[hidden email]> (from Wed, 12 Jan 2011
19:50:41 -0500): > On 01/12/11 19:32, Chris Forgeron wrote: >> Solaris runs a separate process called Fault Management Daemon >> (fmd) that looks to handle this logic - This means that it's really >> not inside the ZFS code to handle this, and FreeBSD would need >> something similar, hopefully less kludgy than a user script. >> >> I wonder if anyone has been eyeing the fma code in the cddl with a >> thought for porting it - It looks to be a really neat bit of code - >> I'm still quite new with it, having only been working with Solaris >> the last few months. It depends upon a lot of standardized kernel notifications. Basically (big picture view) it is the same as our devd (reacting to events) with some logig what to do with it (which we can do without our devd too). > Would the people with custom hot-spare scripts, or nothing automated > at all, be content if the sysutils/geomWatch program grew support > for hot spares in a future version? I already became somewhat > familiar with the userland ZFS API when I added ZFS support to it. I had a look at geomWatch and it seems it is polling based. For something like zfs hotspare replacement you normally want to have the reaction event based (= devd). I even go further and think that things which geomWatch is doing, should be done with devd (may it be directly, or by delegating some events via a non-existing-yet interface (which could be even script driven) to another daemon). It may be that this would need some more events to be produced by different geom parts. IMO it would be great if those people with hotspare-scripts would publish them. This way a joined effort could be initiated to come up with some generic way of handling this which could be included in the base system. Bye, Alexander. -- Whatever creates the greatest inconvenience for the largest number must happen. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On 01/13/11 09:42, Alexander Leidinger wrote:
> Quoting Boris Kochergin <[hidden email]> (from Wed, 12 Jan 2011 > 19:50:41 -0500): > >> On 01/12/11 19:32, Chris Forgeron wrote: > >>> Solaris runs a separate process called Fault Management Daemon (fmd) >>> that looks to handle this logic - This means that it's really not >>> inside the ZFS code to handle this, and FreeBSD would need something >>> similar, hopefully less kludgy than a user script. >>> >>> I wonder if anyone has been eyeing the fma code in the cddl with a >>> thought for porting it - It looks to be a really neat bit of code - >>> I'm still quite new with it, having only been working with Solaris >>> the last few months. > > It depends upon a lot of standardized kernel notifications. Basically > (big picture view) it is the same as our devd (reacting to events) > with some logig what to do with it (which we can do without our devd > too). > >> Would the people with custom hot-spare scripts, or nothing automated >> at all, be content if the sysutils/geomWatch program grew support for >> hot spares in a future version? I already became somewhat familiar >> with the userland ZFS API when I added ZFS support to it. > > I had a look at geomWatch and it seems it is polling based. For > something like zfs hotspare replacement you normally want to have the > reaction event based (= devd). I even go further and think that things > which geomWatch is doing, should be done with devd (may it be > directly, or by delegating some events via a non-existing-yet > interface (which could be even script driven) to another daemon). It > may be that this would need some more events to be produced by > different geom parts. > > IMO it would be great if those people with hotspare-scripts would > publish them. This way a joined effort could be initiated to come up > with some generic way of handling this which could be included in the > base system. > > Bye, > Alexander. > Did a little research. In at least the ZFS case, it appears that events are available through devctl(4) and are therefore accessible through devd: http://2007.asiabsdcon.org/papers/P16-paper.pdf - section 3.7 -Boris _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
On Thu, Jan 13, 2011 at 7:07 AM, Boris Kochergin <[hidden email]> wrote:
> Did a little research. In at least the ZFS case, it appears that events are > available through devctl(4) and are therefore accessible through devd: > > http://2007.asiabsdcon.org/papers/P16-paper.pdf - section 3.7 PC-BSD has the following additions to their /etc/devd.conf file: # Sample ZFS problem reports handling. notify 10 { match "system" "ZFS"; match "type" "zpool"; action "logger -p kern.err 'ZFS: failed to load zpool $pool'"; }; notify 10 { match "system" "ZFS"; match "type" "vdev"; action "logger -p kern.err 'ZFS: vdev failure, zpool=$pool type=$type'"; }; notify 10 { match "system" "ZFS"; match "type" "data"; action "logger -p kern.warn 'ZFS: zpool I/O failure, zpool=$pool error=$zio_err'"; }; notify 10 { match "system" "ZFS"; match "type" "io"; action "logger -p kern.warn 'ZFS: vdev I/O failure, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size error=$zio_err'"; }; notify 10 { match "system" "ZFS"; match "type" "checksum"; action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; }; So it's very (relatively) easy to configure devd to do this. We just need some scripts to plug into the action lines above. :) -- Freddie Cash [hidden email] _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
|
In reply to this post by John Hawkes-Reed
On 1/11/2011 11:10 AM, John Hawkes-Reed wrote:
> On 11/01/2011 03:38, Dan Langille wrote: >> On 1/4/2011 11:52 AM, John Hawkes-Reed wrote: >>> On 04/01/2011 03:08, Dan Langille wrote: >>>> Hello folks, >>>> >>>> I'm trying to discover if ZFS under FreeBSD will automatically pull >>>> in a >>>> hot spare if one is required. >>>> >>>> This raised the issue back in March 2010, and refers to a PR opened in >>>> May 2009 >>>> >>>> * http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007943.html >>>> * http://www.freebsd.org/cgi/query-pr.cgi?pr=134491 >>>> >>>> In turn, the PR refers to this March 2010 post referring to using devd >>>> to accomplish this task. >>>> >>>> http://lists.freebsd.org/pipermail/freebsd-stable/2010-March/055686.html >>>> >>>> >>>> Does the above represent the the current state? >>>> >>>> I ask because I just ordered two more HDD to use as spares. Whether >>>> they >>>> sit on the shelf or in the box is open to discussion. >>> >>> As far as our testing could discover, it's not automatic. >>> >>> I wrote some Ugly Perl that's called by devd when it spots a drive-fail >>> event, which seemed to DTRT when simulating a failure by pulling a >>> drive. >> >> Without such a script, what is the value in creating hot spares? > > We went through that loop in the office. > > We're used to the way the Netapps work here, where often one's first > notice of a failed disk is a visit from the courier with a replacement. > (I'm only half joking) > > In the end, writing enough perl to swap in the spare disk made much more > sense than paging the relevant admin on disk-fail and expecting them to > be able to type straight at 4AM. > > Our thinking is that having a hot spare allows us to do the physical > disk-swap in office hours, rather than (for instance) running in a > degraded state over a long weekend. > > If it's of interest, I'll see if I can share the code. I think this very much of interest. :) -- Dan Langille - http://langille.org/ _______________________________________________ [hidden email] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[hidden email]" |
| Free forum by Nabble | Edit this page |
