Storage 'failover' largely kills FreeBSD 10.x under XenServer?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2

Hi All,

We recently experienced an "unplanned storage" fail over on our XenServer
pool. The pool is 7.1 based (on certified HP kit), and runs a mix of
FreeBSD (all 10.3 based except for a legacy 9.x VM) - and a few Windows
VM's - storage is provided by two Citrix certified Synology storage boxes.

During the fail over - Xen see's the storage paths go down, and come up
again (re-attaching when they are available again). Timing this - it takes
around a minute, worst case.

The process killed 99% of our FreeBSD VM's :(

The earlier 9.x FreeBSD box survived, and all the Windows VM's survived.

Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of
the I/O delays that occur during a storage fail over?

I've enclosed some of the error we observed below. I realise a full storage
fail over is a 'stressful time' for VM's - but the Windows VM's, and
earlier FreeBSD version survived without issue. All the 10.3 boxes logged
I/O errors, and then panic'd / rebooted.

We've setup a test lab with the same kit - and can now replicate this at
will (every time most to all the FreeBSD 10.x boxes panic and reboot, but
Windows prevails) - so we can test any potential fixes.

So if anyone can suggest anything we can tweak to minimize the chances of
this happening (i.e. make I/O more timeout tolerant, or set larger
timeouts?) that'd be great.

Thanks,

-Karl


Errors we observed:

ada0: disk error cmd=write 11339752-11339767 status: ffffffff
ada0: disk error cmd=write
g_vfs_done():11340544-11340607gpt/root[WRITE(offset=4731097088,
length=8192)] status: ffffffff error = 5
(repeated a couple of times with different values)

Machine then goes on to panic:

g_vfs_done():panic: softdep_setup_freeblocks: inode busy
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8098e810 at kdb_backtrace+0x60
#1 0xffffffff809514e6 at vpanic+0x126
#2 0xffffffff809513b3 at panic+0x43
#3 0xffffffff80b9c685 at softdep_setup_freeblocks+0xaf5
#4 0xffffffff80b86bae at ffs_truncate+0x44e
#5 0xffffffff80bbec49 at ufs_setattr+0x769
#6 0xffffffff80e81891 at VOP_SETATTR_APV+0xa1
#7 0xffffffff80a053c5 at vn_trunacte+0x165
#8 0xffffffff809ff236 at kern_openat+0x326
#9 0xffffffff80d56e6f at amd64_syscall+0x40f
#10 0xffffffff80d3c0cb at Xfast_syscall+0xfb


Another box also logged:

ada0: disk error cmd=read 9970080-9970082 status: ffffffff
g_vfs_done():gpt/root[READ(offset=4029825024, length=1536)]error = 5
vnode_pager_getpages: I/O read error
vm_fault: pager read error, pid 24219 (make)

And again, went on to panic shortly thereafter.

I had to hand transcribe the above from screen shots / video, so apologies
if any errors crept in.

I'm hoping there's just a magic sysctl / kernel option we can set to up the
timeouts? (if it is as simple as timeouts killing things)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Roger Pau Monné
On Wed, Sep 20, 2017 at 11:35:26AM +0100, Karl Pielorz wrote:

>
> Hi All,
>
> We recently experienced an "unplanned storage" fail over on our XenServer
> pool. The pool is 7.1 based (on certified HP kit), and runs a mix of FreeBSD
> (all 10.3 based except for a legacy 9.x VM) - and a few Windows VM's -
> storage is provided by two Citrix certified Synology storage boxes.
>
> During the fail over - Xen see's the storage paths go down, and come up
> again (re-attaching when they are available again). Timing this - it takes
> around a minute, worst case.
>
> The process killed 99% of our FreeBSD VM's :(
>
> The earlier 9.x FreeBSD box survived, and all the Windows VM's survived.
>
> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of
> the I/O delays that occur during a storage fail over?

Do you know whether the VMs saw the disks disconnecting and then
connecting again?

> I've enclosed some of the error we observed below. I realise a full storage
> fail over is a 'stressful time' for VM's - but the Windows VM's, and earlier
> FreeBSD version survived without issue. All the 10.3 boxes logged I/O
> errors, and then panic'd / rebooted.
>
> We've setup a test lab with the same kit - and can now replicate this at
> will (every time most to all the FreeBSD 10.x boxes panic and reboot, but
> Windows prevails) - so we can test any potential fixes.
>
> So if anyone can suggest anything we can tweak to minimize the chances of
> this happening (i.e. make I/O more timeout tolerant, or set larger
> timeouts?) that'd be great.

Hm, I have the feeling that part of the problem is that in-flight
requests are basically lost when a disconnect/reconnect happens.

Thanks, Roger.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2


--On 20 September 2017 at 12:44:18 +0100 Roger Pau Monné
<[hidden email]> wrote:

>> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant
>> of the I/O delays that occur during a storage fail over?
>
> Do you know whether the VMs saw the disks disconnecting and then
> connecting again?

I can't see any evidence the drives actually get 'disconnected' from the
VM's point of view. Plenty of I/O errors - but no "device destroyed" type
stuff.

I have seen that kind of error logged on our test kit - when deliberately
failed non-HA storage, but I don't see it this time.

> Hm, I have the feeling that part of the problem is that in-flight
> requests are basically lost when a disconnect/reconnect happens.

So if a disconnect doesn't happen (as it appears it isn't) - is there any
tunable to set the I/O timeout?

'sysctl -a | grep timeout' finds things like:

  kern.cam.ada.default_timeout=30

I might see if that has any effect (from memory - as I'm out of the office
now - it did seem to be about 30 seconds before the VM's started logging
I/O related errors to the console).

As it's a pure test setup - I can try adjusting this without fear of
breaking anything :)

Though I'm open to other suggestions...

fwiw - Who's responsibility is it to re-send lost "in flight" data, e.g. if
a write is 'in flight' when an I/O error occurs in the lower layers of
XenServer is it XenServers responsibility to retry that - before giving up,
or does it just push the error straight back to the VM - expecting the VM
to retry it? [or a bit of both?] - just curious.

-Karl


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Miroslav Lachman
Karl Pielorz wrote on 2017/09/20 16:54:

>
>
> --On 20 September 2017 at 12:44:18 +0100 Roger Pau Monné
> <[hidden email]> wrote:
>
>>> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant
>>> of the I/O delays that occur during a storage fail over?
>>
>> Do you know whether the VMs saw the disks disconnecting and then
>> connecting again?
>
> I can't see any evidence the drives actually get 'disconnected' from the
> VM's point of view. Plenty of I/O errors - but no "device destroyed"
> type stuff.
>
> I have seen that kind of error logged on our test kit - when
> deliberately failed non-HA storage, but I don't see it this time.
>
>> Hm, I have the feeling that part of the problem is that in-flight
>> requests are basically lost when a disconnect/reconnect happens.
>
> So if a disconnect doesn't happen (as it appears it isn't) - is there
> any tunable to set the I/O timeout?
>
> 'sysctl -a | grep timeout' finds things like:
>
>   kern.cam.ada.default_timeout=30


Yes, you can try to set kern.cam.ada.default_timeout to 60 or more, but
it can has downside too.

Miroslav Lachman

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Rodney W. Grimes-4
In reply to this post by Karl Pielorz-2
> Hi All,
>
> We recently experienced an "unplanned storage" fail over on our XenServer
> pool. The pool is 7.1 based (on certified HP kit), and runs a mix of
> FreeBSD (all 10.3 based except for a legacy 9.x VM) - and a few Windows
> VM's - storage is provided by two Citrix certified Synology storage boxes.
>
> During the fail over - Xen see's the storage paths go down, and come up
> again (re-attaching when they are available again). Timing this - it takes
> around a minute, worst case.
>
> The process killed 99% of our FreeBSD VM's :(
>
> The earlier 9.x FreeBSD box survived, and all the Windows VM's survived.
>
> Is there some 'tuneable' we can set to make the 10.3 boxes more tolerant of
> the I/O delays that occur during a storage fail over?
>
> I've enclosed some of the error we observed below. I realise a full storage
> fail over is a 'stressful time' for VM's - but the Windows VM's, and
> earlier FreeBSD version survived without issue. All the 10.3 boxes logged
> I/O errors, and then panic'd / rebooted.
>
> We've setup a test lab with the same kit - and can now replicate this at
> will (every time most to all the FreeBSD 10.x boxes panic and reboot, but
> Windows prevails) - so we can test any potential fixes.
>
> So if anyone can suggest anything we can tweak to minimize the chances of
> this happening (i.e. make I/O more timeout tolerant, or set larger
> timeouts?) that'd be great.

As you found one of these let me point out the pair of them:
kern.cam.ada.default_timeout: 30
kern.cam.ada.retry_count: 4

Rather than increasing default_timeout you might try increasing
retry_count.  Though it would seem that the default settings should
of allowed for a 2 minute failure window, it may be that these
are not working as I expect in this situation.

...
>
> Errors we observed:
>
> ada0: disk error cmd=write 11339752-11339767 status: ffffffff
> ada0: disk error cmd=write
Did you actually get this 4 times, then it fell through to
the next error?  There should be some retry counts in here
some place counting up to 4, then cam/ada should give up
and pass the error up the stack.

> g_vfs_done():11340544-11340607gpt/root[WRITE(offset=4731097088,
> length=8192)] status: ffffffff error = 5
> (repeated a couple of times with different values)
>
> Machine then goes on to panic:

Ah, okay, so it is repeating.. these messages should be
30 seconds apart, there should be exactly 4 of them,
then you get the panic.   If that is the case try cranking
kern.cam.ada.retry_count up and see if that resolves your
issue.

> g_vfs_done():panic: softdep_setup_freeblocks: inode busy
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8098e810 at kdb_backtrace+0x60
> #1 0xffffffff809514e6 at vpanic+0x126
> #2 0xffffffff809513b3 at panic+0x43
> #3 0xffffffff80b9c685 at softdep_setup_freeblocks+0xaf5
> #4 0xffffffff80b86bae at ffs_truncate+0x44e
> #5 0xffffffff80bbec49 at ufs_setattr+0x769
> #6 0xffffffff80e81891 at VOP_SETATTR_APV+0xa1
> #7 0xffffffff80a053c5 at vn_trunacte+0x165
> #8 0xffffffff809ff236 at kern_openat+0x326
> #9 0xffffffff80d56e6f at amd64_syscall+0x40f
> #10 0xffffffff80d3c0cb at Xfast_syscall+0xfb
>
>
> Another box also logged:
>
> ada0: disk error cmd=read 9970080-9970082 status: ffffffff
> g_vfs_done():gpt/root[READ(offset=4029825024, length=1536)]error = 5
> vnode_pager_getpages: I/O read error
> vm_fault: pager read error, pid 24219 (make)
>
> And again, went on to panic shortly thereafter.
>
> I had to hand transcribe the above from screen shots / video, so apologies
> if any errors crept in.
>
> I'm hoping there's just a magic sysctl / kernel option we can set to up the
> timeouts? (if it is as simple as timeouts killing things)

Yes, freebsd does not live long when its disk drive goes away... 2.5 minutes
to panic in almost all cases of a drive failure.

--
Rod Grimes                                                 [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2

--On 20 September 2017 11:15 -0700 "Rodney W. Grimes"
<[hidden email]> wrote:

> As you found one of these let me point out the pair of them:
> kern.cam.ada.default_timeout: 30
> kern.cam.ada.retry_count: 4

Adjusting these doesn't seem to make any difference at all.

All the VM's (the control one, running defaults) - and the 3 others (one
running longer timeouts, one running more retries - and one running both
longer timeouts and retries) - all start throwing I/O errors on the console
at the same time (e.g. regardless if the timeout is set to default 30
seconds, or extended out to 120) - I/O errors pop up at the same time.

It looks like they're either ignored, or 'not applicable' for this scenario.
Of particular concern is - if I adjust the 'timeout' value from 30 to 100
seconds, and 'time' how long the first I/O error takes to appear on the
console, it's still ~30 seconds.

I'm going to re-setup the test VM's - with a 'stable' boot disk (that won't
go away during the switch) to give me something to log to - I should be
able to work out the timings involved then, to see if the first I/O error
really does surface after 30 seconds, or not.

If the timeout is set to, say 100 seconds - I shouldn't see any console
errors until then, should I? - unless some other part of the storage stack
is still timing out first at 30 seconds?

-Karl
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Rainer Duffner
Am 2017-09-21 13:33, schrieb Karl Pielorz:
> --On 20 September 2017 11:15 -0700 "Rodney W. Grimes"
> <[hidden email]> wrote:
>
>> As you found one of these let me point out the pair of them:
>> kern.cam.ada.default_timeout: 30
>> kern.cam.ada.retry_count: 4
>
> Adjusting these doesn't seem to make any difference at all.



I asked myself already if the disks from Xen(Server) are really
CAM-disks.

They certainly don't show up with camcontrol devlist.

If they don't show-up there, why should any cam timeouts apply?

BTW: storage-failures also kill various Linux hosts.
They usually turn their filesystem into read-only mode and then you've
got to reboot anyway.

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2


--On 21 September 2017 14:04 +0200 [hidden email] wrote:

> I asked myself already if the disks from Xen(Server) are really CAM-disks.
>
> They certainly don't show up with camcontrol devlist.

So they don't. I presumed they were cam, as they're presented as 'ada0'.

> If they don't show-up there, why should any cam timeouts apply?

It appears they don't :) (at least, so far).

> BTW: storage-failures also kill various Linux hosts.
> They usually turn their filesystem into read-only mode and then you've
> got to reboot anyway.

Yes, I know - it's a bit of an upheaval to cope with storage fail over -
annoyingly the windows boxes (though they go 'comatose' while it's
happening) all seem to survive.

I could cope with a few VM's rebooting - but to see so many just fold and
panic, adds a lot of "insult to injury" at fail over time :(

[And I know, if I/O is unavailable you're going to be queuing up a whole
'world of pain' anyway for when it returns, such as listen queues, pending
disk I/O, hung processes waiting for I/O etc. etc.) - but to have a
fighting chance of unwinding it all when I/O recovers - would be good.

-Karl




_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Rodney W. Grimes-4
> --On 21 September 2017 14:04 +0200 [hidden email] wrote:
>
> > I asked myself already if the disks from Xen(Server) are really CAM-disks.
> >
> > They certainly don't show up with camcontrol devlist.
>
> So they don't. I presumed they were cam, as they're presented as 'ada0'.

Ok, need to sort that out for certain to get much further.

What are you running for Dom0?
Did you do the sysctl's in Dom0 or in the guest?
To be effective I would think they would need to be run
in the guest, but if DOM0 is timing out and returning
an I/O error then they well have to be adjusted there first.

> > If they don't show-up there, why should any cam timeouts apply?
>
> It appears they don't :) (at least, so far).

Are these timeouts coming from Dom0 or from a VM in a DomU?

> > BTW: storage-failures also kill various Linux hosts.
> > They usually turn their filesystem into read-only mode and then you've
> > got to reboot anyway.
>
> Yes, I know - it's a bit of an upheaval to cope with storage fail over -
> annoyingly the windows boxes (though they go 'comatose' while it's
> happening) all seem to survive.

Windows has horrible long timeouts and large retry counts, and worse
they dont warn the user that it is having issues other than event logs
and things usually go to the state of drive catastrophic failure before
the user ever sees an error.

> I could cope with a few VM's rebooting - but to see so many just fold and
> panic, adds a lot of "insult to injury" at fail over time :(
>
> [And I know, if I/O is unavailable you're going to be queuing up a whole
> 'world of pain' anyway for when it returns, such as listen queues, pending
> disk I/O, hung processes waiting for I/O etc. etc.) - but to have a
> fighting chance of unwinding it all when I/O recovers - would be good.
>
> -Karl
>
>
>
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-xen
> To unsubscribe, send any mail to "[hidden email]"
>

--
Rod Grimes                                                 [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2


--On 21 September 2017 07:23 -0700 "Rodney W. Grimes"
<[hidden email]> wrote:

>> So they don't. I presumed they were cam, as they're presented as 'ada0'.
>
> Ok, need to sort that out for certain to get much further.
>
> What are you running for Dom0?

XenServer 7.1 - i.e. the official ISO distribution from www.xenserver.org

> Did you do the sysctl's in Dom0 or in the guest?

In the guest. I don't have access to the equivalent (or, rather shouldn't
have access to the equivalent in Dom0 as it's the official installation -
i.e. black boxed).

> To be effective I would think they would need to be run
> in the guest, but if DOM0 is timing out and returning
> an I/O error then they well have to be adjusted there first.

dom0 (i.e. XenServer grumbles about paths going down, shows some I/O errors
- that get re-tried, but doesn't invalidate the storage).

As soon as the paths are available again - you can see it re-attach to them.

> Are these timeouts coming from Dom0 or from a VM in a DomU?

domU - as above, dom0 grumbles, but generally seems OK about things. dom0
doesn't do anything silly like invalidate the VM's disks or anything.

> Windows has horrible long timeouts and large retry counts, and worse
> they dont warn the user that it is having issues other than event logs
> and things usually go to the state of drive catastrophic failure before
> the user ever sees an error.

I can believe that - I've seen the figure of 60 seconds bandied around (as
opposed to 30 seconds for Linux / FreeBSD).

Sadly, making FreeBSD have a similar timeout (at least just to test) may
fix the issue.

-Karl
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Storage 'failover' largely kills FreeBSD 10.x under XenServer?

Karl Pielorz-2

--On 21 September 2017 15:49 +0100 Karl Pielorz <[hidden email]>
wrote:

>> Are these timeouts coming from Dom0 or from a VM in a DomU?
>
> domU - as above, dom0 grumbles, but generally seems OK about things. dom0
> doesn't do anything silly like invalidate the VM's disks or anything.

I've chased this down in the code - having briefly looked at
blkfront/blkback - I can see all the mechanisms in place for performing I/O
- but I cannot see there's any timeouts set anywhere (in that code).

I can see the callback that fires when the I/O fails.

It looks like for the purposes of xbd I/O requests are just gathered up,
processed - and then fired off to XenServer (i.e. upstream). If they fail,
callbacks are fired - and action taken.

But nowhere can I see where there are any timeouts either specified, or
specifiable by FreeBSD - nor can I see (certainly at that level) that there
are any I/O retries in that code.

So,

  - Timeouts may be set by Xen (i.e. outside of FreeBSD's scope)
  - I/O may be retried by 'higher levels' than blkfront/blkback - but I
can't see where.

It may simply be that I/O from FreeBSD through XenServer is a 'fire and
forget' process, where FreeBSD has no control over timeouts, and currently
has no code (at that level) to perform retries.

I'd need to figure out what sits above 'blkfront/blkback' - and whether
that's likely to do any retries.

It's certainly not CAM running the storage - so those timeout/retry sysctl
values are completely irrelevant.

More study, and maybe a quick post to -hackers to see what lies above
blkfront/back etc.

-Kp
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "[hidden email]"