zfs corruption (again) due to interupted resilver and power faults.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

zfs corruption (again) due to interupted resilver and power faults.

Michelle Sullivan
So lost my pool again...  this time looks more hopeless, but for simpler
reasons.

Basically a significant power issue occurred whilst in a resilver
process which I am lucky no-one was killed - though if I find the idiot
there is still time...

anyhoo, to the issue at hand...

FreeBSD 12 (memstick boot)

Can't import the pool 'storage'

All devices are now online (one wasn't but that issue has been resolved
by byte copying the drive to a new drive and inserting)

Its a zraid2 set... only the one drive had failed and was in the middle
of the resilver when the power issue occurred.

Pool refuses to import (even with -FfX)

zdb reports 4 metdata errors and no (zero) data errors... see:
http://flashback.sorbs.net/packages/zfs/image4.jpeg

'zdb -bcd' asserts out in space_map_load (123 of 163) see:
http://flashback.sorbs.net/packages/zfs/image5.jpeg

Thoughts on how I might recover this?

(Really need to as was in the middle of blowing away the original
(backup) server for a rebuild and new backup.. so I have quite literally
no backups, and every photo I have taken are in that zvol on the pool -
there should be no reason why it's corrupt as with the exception of the
resilver - there was no active writing to the pool.)

Yours hopefully,

Michelle

--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: zfs corruption (again) due to interupted resilver and power faults.

Xin LI-5
0x7a is 122, which is ECKSUM, it seems that your space map is corrupt.

Have you tried importing with vfs.zfs.recover=1 set in loader and import
the pool with -o readonly?

Cheers,

On Tue, Mar 19, 2019 at 6:15 AM Michelle Sullivan <[hidden email]>
wrote:

> So lost my pool again...  this time looks more hopeless, but for simpler
> reasons.
>
> Basically a significant power issue occurred whilst in a resilver
> process which I am lucky no-one was killed - though if I find the idiot
> there is still time...
>
> anyhoo, to the issue at hand...
>
> FreeBSD 12 (memstick boot)
>
> Can't import the pool 'storage'
>
> All devices are now online (one wasn't but that issue has been resolved
> by byte copying the drive to a new drive and inserting)
>
> Its a zraid2 set... only the one drive had failed and was in the middle
> of the resilver when the power issue occurred.
>
> Pool refuses to import (even with -FfX)
>
> zdb reports 4 metdata errors and no (zero) data errors... see:
> http://flashback.sorbs.net/packages/zfs/image4.jpeg
>
> 'zdb -bcd' asserts out in space_map_load (123 of 163) see:
> http://flashback.sorbs.net/packages/zfs/image5.jpeg
>
> Thoughts on how I might recover this?
>
> (Really need to as was in the middle of blowing away the original
> (backup) server for a rebuild and new backup.. so I have quite literally
> no backups, and every photo I have taken are in that zvol on the pool -
> there should be no reason why it's corrupt as with the exception of the
> resilver - there was no active writing to the pool.)
>
> Yours hopefully,
>
> Michelle
>
> --
> Michelle Sullivan
> http://www.mhix.org/
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: zfs corruption (again) due to interupted resilver and power faults.

Michelle Sullivan
Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl)

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 20 Mar 2019, at 05:21, Xin LI <[hidden email]> wrote:
>
> 0x7a is 122, which is ECKSUM, it seems that your space map is corrupt.
>
> Have you tried importing with vfs.zfs.recover=1 set in loader and import the pool with -o readonly?
>
> Cheers,
>
>> On Tue, Mar 19, 2019 at 6:15 AM Michelle Sullivan <[hidden email]> wrote:
>> So lost my pool again...  this time looks more hopeless, but for simpler
>> reasons.
>>
>> Basically a significant power issue occurred whilst in a resilver
>> process which I am lucky no-one was killed - though if I find the idiot
>> there is still time...
>>
>> anyhoo, to the issue at hand...
>>
>> FreeBSD 12 (memstick boot)
>>
>> Can't import the pool 'storage'
>>
>> All devices are now online (one wasn't but that issue has been resolved
>> by byte copying the drive to a new drive and inserting)
>>
>> Its a zraid2 set... only the one drive had failed and was in the middle
>> of the resilver when the power issue occurred.
>>
>> Pool refuses to import (even with -FfX)
>>
>> zdb reports 4 metdata errors and no (zero) data errors... see:
>> http://flashback.sorbs.net/packages/zfs/image4.jpeg
>>
>> 'zdb -bcd' asserts out in space_map_load (123 of 163) see:
>> http://flashback.sorbs.net/packages/zfs/image5.jpeg
>>
>> Thoughts on how I might recover this?
>>
>> (Really need to as was in the middle of blowing away the original
>> (backup) server for a rebuild and new backup.. so I have quite literally
>> no backups, and every photo I have taken are in that zvol on the pool -
>> there should be no reason why it's corrupt as with the exception of the
>> resilver - there was no active writing to the pool.)
>>
>> Yours hopefully,
>>
>> Michelle
>>
>> --
>> Michelle Sullivan
>> http://www.mhix.org/
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "[hidden email]"
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: zfs corruption (again) due to interupted resilver and power faults.

Michelle Sullivan
Michelle Sullivan wrote:
> Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl)

Failed with the same old...

http://flashback.sorbs.net/packages/zfs/image6.jpeg


Michelle

> Michelle Sullivan
> http://www.mhix.org/
> Sent from my iPad
>
>> On 20 Mar 2019, at 05:21, Xin LI <[hidden email]> wrote:
>>
>> 0x7a is 122, which is ECKSUM, it seems that your space map is corrupt.
>>
>> Have you tried importing with vfs.zfs.recover=1 set in loader and import the pool with -o readonly?
>>
>> Cheers,
>>
>>> On Tue, Mar 19, 2019 at 6:15 AM Michelle Sullivan <[hidden email]> wrote:
>>> So lost my pool again...  this time looks more hopeless, but for simpler
>>> reasons.
>>>
>>> Basically a significant power issue occurred whilst in a resilver
>>> process which I am lucky no-one was killed - though if I find the idiot
>>> there is still time...
>>>
>>> anyhoo, to the issue at hand...
>>>
>>> FreeBSD 12 (memstick boot)
>>>
>>> Can't import the pool 'storage'
>>>
>>> All devices are now online (one wasn't but that issue has been resolved
>>> by byte copying the drive to a new drive and inserting)
>>>
>>> Its a zraid2 set... only the one drive had failed and was in the middle
>>> of the resilver when the power issue occurred.
>>>
>>> Pool refuses to import (even with -FfX)
>>>
>>> zdb reports 4 metdata errors and no (zero) data errors... see:
>>> http://flashback.sorbs.net/packages/zfs/image4.jpeg
>>>
>>> 'zdb -bcd' asserts out in space_map_load (123 of 163) see:
>>> http://flashback.sorbs.net/packages/zfs/image5.jpeg
>>>
>>> Thoughts on how I might recover this?
>>>
>>> (Really need to as was in the middle of blowing away the original
>>> (backup) server for a rebuild and new backup.. so I have quite literally
>>> no backups, and every photo I have taken are in that zvol on the pool -
>>> there should be no reason why it's corrupt as with the exception of the
>>> resilver - there was no active writing to the pool.)
>>>
>>> Yours hopefully,
>>>
>>> Michelle
>>>
>>> --
>>> Michelle Sullivan
>>> http://www.mhix.org/
>>>
>>> _______________________________________________
>>> [hidden email] mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "[hidden email]"
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"


--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: zfs corruption (again) due to interupted resilver and power faults.

Michelle Sullivan
Stefan Esser wrote:

> Am 20.03.19 um 08:15 schrieb Michelle Sullivan:
>> Michelle Sullivan wrote:
>>> Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl)
>> Failed with the same old...
>>
>> http://flashback.sorbs.net/packages/zfs/image6.jpeg
> Hi Michelle,
>
> when I was in a somewhat similar situation, I recovered my pool
> (at least to copy it to new disk drives) by patching the ZFS code
> to ignore certain error aborts.
>
> Testing is possible with zdb, since it uses the same source files
> as the kernel module for all ZFS accesses.
>
> I identified the test that failed and made it non-fatal (issue a
> warning but continue). This lead to inconsistent checksums, since
> they were not correctly updated in the failure case. I had to make
> these checksum checks non-fatal, too.
>
> All testing can be done by issuing zdb commands, but I do not
> remember the exact options. Option -AAA is at least required, to
> make most checks non-fatal, but it was not sufficient.
>
> I cannot offer any more specific help, I'm afraid.
>
> Good luck in recovering your pool!
>
> Regards, STefan
Finally made progress..

Booted 12-STABLE on a USB key - installed to a USB external drive and
booted that.

Built a debug kernel, installed and booted it, then installed mdb...
after playing with it and getting no symbol errors finally worked it
out...  This worked.

*root@colossus:/usr/src # mdb -Mkwe "spa_load_verify_metadata/W 0"
Preloading module symbols: [ kernel uhid.ko ums.ko mac_ntpd.ko zfs.ko
opensolaris.ko ]
zfs.ko`spa_load_verify_metadata:0x1             =       0x0
Segmentation fault (core dumped)
root@colossus:/usr/src #*

(I had already worked out with mdb *spa_load_verify_metadata=0* causes a
'LOADED' state)...

Then I was able to run the following, I had already noted and identified
transaction 24628146 was the latest, but the latest that was 'complete'
(commited/uncorrupt) is 24628138 so...

root@colossus:/usr/src # zpool import -fT 24628138 storage
cannot mount 'storage': Input/output error
Unsupported share protocol: 1.
root@colossus:/usr/src # zpool status -v
   pool: storage
  state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
     continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
   scan: resilver in progress since Thu Mar  7 19:06:14 2019
     14.9T scanned at 2.06G/s, 13.4T issued at 615M/s, 28.8T total
     863G resilvered, 46.39% done, 0 days 07:19:25 to go
config:

     NAME                    STATE     READ WRITE CKSUM
     storage                 ONLINE       0     0     2
       raidz2-0              ONLINE       0     0     8
         mfid8               ONLINE       0     0     0
         mfid7               ONLINE       0     0     0
         mfid12              ONLINE       0     0     0
         mfid11              ONLINE       0     0     0
         mfid0               ONLINE       0     0     0
         mfid5               ONLINE       0     0     0
         mfid4               ONLINE       0     0     0
         mfid3               ONLINE       0     0     0
         mfid2               ONLINE       0     0     0
         spare-9             ONLINE       0     0 4.38K
           mfid14            ONLINE       0     0     0
           mfid15            ONLINE       0     0     0
         mfid10              ONLINE       0     0     0
         mfid6               ONLINE       0     0     0
         mfid13              ONLINE       0     0     0
         mfid9               ONLINE       0     0     0
         mfid1               ONLINE       0     0     0
     spares
       12144659313369122799  INUSE     was /dev/mfid15

errors: Permanent errors have been detected in the following files:

         <metadata>:<0x5d>
         storage:<0x0>
root@colossus:/usr/src #

So currently it appears imported but not mounted (don't care) and it's
currently resilvering.  When complete I intend to scrub, export and
reimport which hopefully will have resolved the issues... will let you
all know... but for the forums and archives....

This is a God-send:
https://www.delphix.com/blog/openzfs-pool-import-recovery

To get mdb working you *must* currently use -M to preload the modules.

Regards,

Michelle

--
Michelle Sullivan
http://www.mhix.org/

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"