Serious ZFS Bootcode Problem (GPT NON-UEFI)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Serious ZFS Bootcode Problem (GPT NON-UEFI)

Karl Denninger
FreeBSD 12.0-STABLE r343809

After upgrading to this (without material incident) zfs was telling me
that the pools could be upgraded (this machine was running 11.1, then 11.2.)

I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr
-p /boot/gptzfsboot -i .... da... /on both of the candidate (mirrored
ZFS boot disk) devices, in the correct partition.

Then I rebooted to test and..... /could not find the zsboot pool
containing the kernel./

I booted the rescue image off my SD and checked -- the copy of
gptzfsboot that I put on the boot partition is exactly identical to the
one on the rescue image SD.

Then, to be /absolutely sure /I wasn't going insane I grabbed the
mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot.

/Nope; that won't boot either!/

Fortunately I had a spare drive slot so I stuck in a piece of spinning
rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote
bootcode on that, mounted the ZFS "zsboot" filesystem and copied it
over.  That boots fine (of course) and mounts the root pool, and off it
goes.

I'm going to blow away the entire /usr/obj tree and rebuild the kernel
to see if that gets me anything that's more-sane, but right now this
looks pretty bad.

BTW just to be absolutely sure I blew away the entire /usr/obj directory
and rebuilt -- same size and checksum on the binary that I have
installed, so.....

Not sure what's going on here -- did something get moved?

--
Karl Denninger
[hidden email] <mailto:[hidden email]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

Karl Denninger
On 2/10/2019 09:28, Allan Jude wrote:
> Are you sure it is non-UEFI? As the instructions you followed,
> overwriting da0p1 with gptzfsboot, will make quite a mess if that
> happens to be the EFI system partition, rather than the freebsd-boot
> partition.

Absolutely certain.  The system board in this machine (and a bunch I
have in the field) are SuperMicro X8DTL-IFs which do not support UEFI at
all (they have no available EFI-capable bios.)

They have encrypted root pools but due to the inability of gptzfsboot to
read them they have a small freebsd-zfs partition that, when upgraded, I
copy /boot/* to after the kernel upgrade is done but before they are
rebooted.  That partition is not mounted during normal operation; it's
only purpose is to load the kernel (and pre-boot .kos such as geli.)

> Can you show 'gpart show' output?
[karl@NewFS ~]$ gpart show da1
=>       34  468862061  da1  GPT  (224G)
         34       2014       - free -  (1.0M)
       2048       1024    1  freebsd-boot  (512K)
       3072       1024       - free -  (512K)
       4096   20971520    2  freebsd-zfs  [bootme]  (10G)
   20975616  134217728    3  freebsd-swap  (64G)
  155193344  313667584    4  freebsd-zfs  (150G)
  468860928       1167       - free -  (584K)

Partition "2" is the one that should boot.

There is also a da2 that has an identical layout (mirrored; the drives
are 240Gb Intel 730 SSDs)

> What is the actual boot error?

It says it can't load the kernel and gives me a prompt.  "lsdev" shows
all the disks and all except the two (zfs mirror) that have the "bootme"
partition on them don't show up as zfs pools at all (they're
geli-encrypted, so that's not unexpected.)  I don't believe the loader
ever gets actually loaded.

An attempt to use "ls" from the bootloader to look inside that "bootme"
partition fails; gptzfsboot cannot get it open.

My belief was that I screwed up and wrote the old 11.1 gptzfsboot to the
freebsd-boot partition originally -- but that is clearly not the case.

Late last night I took my "rescue media" (which is a "make memstick"
from the build of -STABLE), booted that on my sandbox machine, stuck two
disks in there and made a base system -- which booted.  Thus whatever is
going on here it is not as simple as it first appears as that system had
the spacemap_v2 flag on and active once it came up.

This may be my own foot-shooting since I was able to make a bootable
system on my sandbox using the same media (a clone hardware-wise so also
no EFI) -- there may have been some part of the /boot hierarchy that
didn't get copied over, and if so that would explain it.

Update: Indeed that appears to be what it was -- a couple of the *other*
files in the boot partition didn't get copied from the -STABLE build
(although the entire kernel directory did)....  I need to look at why
that happened as the update process is my own due to the dual-partition
requirement for booting with non-EFI but that's not your problem -- it's
mine.

Sorry about this one; turns out to be something in my update scripts
that failed to move over some of the files to the non-encrypted /boot....

BTW am I correct that gptzfsboot did *not* get the ability to read
geli-encrypted pools in 12.0?  The UEFI loader does know how (which I'm
using on my laptop) but I was under the impression that for non-UEFI
systems you still needed the unencrypted boot partition from which to
load the kernel.

--
Karl Denninger
[hidden email] <mailto:[hidden email]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

Ian Lepore-3
On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:

> On 2/10/2019 09:28, Allan Jude wrote:
> > Are you sure it is non-UEFI? As the instructions you followed,
> > overwriting da0p1 with gptzfsboot, will make quite a mess if that
> > happens to be the EFI system partition, rather than the freebsd-
> > boot
> > partition.
>
> [...]
>
> BTW am I correct that gptzfsboot did *not* get the ability to read
> geli-encrypted pools in 12.0?  The UEFI loader does know how (which I'm
> using on my laptop) but I was under the impression that for non-UEFI
> systems you still needed the unencrypted boot partition from which to
> load the kernel.
>

Nope, that's not correct. GELI support was added to the boot and loader
programs for both ufs and zfs in freebsd 12. You must set the geli '-g'
option to be prompted for the passphrase while booting (this is
separate from the '-b' flag that enables mounting the encrypted
partition as the rootfs). You can use "geli configure -g" to turn on
the flag on any existing geli partition.

-- Ian

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

Karl Denninger
On 2/10/2019 11:50, Ian Lepore wrote:

> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>> On 2/10/2019 09:28, Allan Jude wrote:
>>> Are you sure it is non-UEFI? As the instructions you followed,
>>> overwriting da0p1 with gptzfsboot, will make quite a mess if that
>>> happens to be the EFI system partition, rather than the freebsd-
>>> boot
>>> partition.
>> [...]
>>
>> BTW am I correct that gptzfsboot did *not* get the ability to read
>> geli-encrypted pools in 12.0?  The UEFI loader does know how (which I'm
>> using on my laptop) but I was under the impression that for non-UEFI
>> systems you still needed the unencrypted boot partition from which to
>> load the kernel.
>>
> Nope, that's not correct. GELI support was added to the boot and loader
> programs for both ufs and zfs in freebsd 12. You must set the geli '-g'
> option to be prompted for the passphrase while booting (this is
> separate from the '-b' flag that enables mounting the encrypted
> partition as the rootfs). You can use "geli configure -g" to turn on
> the flag on any existing geli partition.
>
> -- Ian
Excellent - this will eliminate the need for me to run down the
foot-shooting that occurred in my update script since the unencrypted
kernel partition is no longer needed at all.  That also significantly
reduces the attack surface on such a machine (although you could still
tamper with the contents of freebsd-boot of course.)

The "-g" flag I knew about from experience in putting 12 on my X1 Carbon
(which works really well incidentally; the only issue I'm aware of is
that there's no 5Ghz WiFi support.)

--
Karl Denninger
[hidden email] <mailto:[hidden email]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Serious ZFS Bootcode Problem (GPT NON-UEFI)

Ian Lepore-3
On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:

> On 2/10/2019 11:50, Ian Lepore wrote:
> > On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
> > > On 2/10/2019 09:28, Allan Jude wrote:
> > > > Are you sure it is non-UEFI? As the instructions you followed,
> > > > overwriting da0p1 with gptzfsboot, will make quite a mess if
> > > > that
> > > > happens to be the EFI system partition, rather than the
> > > > freebsd-
> > > > boot
> > > > partition.
> > >
> > > [...]
> > >
> > > BTW am I correct that gptzfsboot did *not* get the ability to
> > > read
> > > geli-encrypted pools in 12.0?  The UEFI loader does know how
> > > (which I'm
> > > using on my laptop) but I was under the impression that for non-
> > > UEFI
> > > systems you still needed the unencrypted boot partition from
> > > which to
> > > load the kernel.
> > >
> >
> > Nope, that's not correct. GELI support was added to the boot and
> > loader
> > programs for both ufs and zfs in freebsd 12. You must set the geli
> > '-g'
> > option to be prompted for the passphrase while booting (this is
> > separate from the '-b' flag that enables mounting the encrypted
> > partition as the rootfs). You can use "geli configure -g" to turn
> > on
> > the flag on any existing geli partition.
> >
> > -- Ian
>
> Excellent - this will eliminate the need for me to run down the
> foot-shooting that occurred in my update script since the unencrypted
> kernel partition is no longer needed at all.  That also significantly
> reduces the attack surface on such a machine (although you could
> still
> tamper with the contents of freebsd-boot of course.)
>
> The "-g" flag I knew about from experience in putting 12 on my X1
> Carbon
> (which works really well incidentally; the only issue I'm aware of is
> that there's no 5Ghz WiFi support.)
>

One thing that is rather unfortunate... if you have multiple geli
encrypted partitions that all have the same passphrase, you will be
required to enter that passphrase twice while booting -- once in
gpt[zfs]boot, then again during kernel startup when the rest of the
drives/partitions get tasted by geom. This is because APIs within the
boot process got changed to pass keys instead of the passphrase itself
from one stage of booting to the next, and the fallout of that is the
key for the rootfs is available to the kernel for mountroot, but the
passphrase is not available to the system when geom is probing all the
devices, so you get prompted for it again.

-- Ian

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)

Karl Denninger
On 2/10/2019 12:01, Ian Lepore wrote:

> On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:
>> On 2/10/2019 11:50, Ian Lepore wrote:
>>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>>>
>>>> [...]
>>>>
>>>> BTW am I correct that gptzfsboot did *not* get the ability to
>>>> read
>>>> geli-encrypted pools in 12.0?  The UEFI loader does know how
>>>> (which I'm
>>>> using on my laptop) but I was under the impression that for non-
>>>> UEFI
>>>> systems you still needed the unencrypted boot partition from
>>>> which to
>>>> load the kernel.
>>>>
>>> Nope, that's not correct. GELI support was added to the boot and
>>> loader
>>> programs for both ufs and zfs in freebsd 12. You must set the geli
>>> '-g'
>>> option to be prompted for the passphrase while booting (this is
>>> separate from the '-b' flag that enables mounting the encrypted
>>> partition as the rootfs). You can use "geli configure -g" to turn
>>> on
>>> the flag on any existing geli partition.
>>>
>>> -- Ian
>> Excellent - this will eliminate the need for me to run down the
>> foot-shooting that occurred in my update script since the unencrypted
>> kernel partition is no longer needed at all.  That also significantly
>> reduces the attack surface on such a machine (although you could
>> still
>> tamper with the contents of freebsd-boot of course.)
>>
>> The "-g" flag I knew about from experience in putting 12 on my X1
>> Carbon
>> (which works really well incidentally; the only issue I'm aware of is
>> that there's no 5Ghz WiFi support.)
>>
> One thing that is rather unfortunate... if you have multiple geli
> encrypted partitions that all have the same passphrase, you will be
> required to enter that passphrase twice while booting -- once in
> gpt[zfs]boot, then again during kernel startup when the rest of the
> drives/partitions get tasted by geom. This is because APIs within the
> boot process got changed to pass keys instead of the passphrase itself
> from one stage of booting to the next, and the fallout of that is the
> key for the rootfs is available to the kernel for mountroot, but the
> passphrase is not available to the system when geom is probing all the
> devices, so you get prompted for it again.
>
> -- Ian
Let me see if I understand this before I do it then... :-)

I have the following layout:

1. Two SSDs that contain the OS as a two-provider ZFS pool, which has
"-b" set on both members; I get the "GELI Passphrase:" prompt from the
loader and those two providers (along with encrypted swap) attach early
in the boot process.  The same SSDs contain a mirrored non-encrypted
pool that has /boot (and only /boot) on it because previously you
couldn't boot from an EFI-encrypted pool at all.

Thus:

[\u@NewFS /root]# gpart show da1
=>       34  468862061  da1  GPT  (224G)
         34       2014       - free -  (1.0M)
       2048       1024    1  freebsd-boot  (512K)
       3072       1024       - free -  (512K)
       4096   20971520    2  freebsd-zfs  [bootme]  (10G)
   20975616  134217728    3  freebsd-swap  (64G)
  155193344  313667584    4  freebsd-zfs  (150G)
  468860928       1167       - free -  (584K)

There is of course a "da2" that is identical.  The actual encrypted root
pool is on partition 4 with "-b" set at present.  I get prompted from
loader as a result after the unencrypted partition (#2) boots.

2. Multiple additional "user space" pools on a bunch of other disks.

Right now #2 is using geli groups.  Prior to 12.0 they were handled
using a custom /etc/rc.d script I wrote that did basically the same
thing that geli groups does because all use the same passphrase and
entering the same thing over and over on a boot was a pain in the butt. 
It prompted cleanly with no echo, took a password and then iterated over
a list of devices attaching them one at a time.  That requirement is now
gone with geli groups, which is nice since mergemaster always complained
about it being a "non-standard" thing; it *had* to go in /etc/rc.d and
not in /usr/etc/rc.d else I couldn't get it to run early enough --
unfortunately.

So if I remove the non-encrypted freebsd-zfs mirror that the system
boots from in favor of setting "-g" on the root pool (both providers)
gptzfsboot will find and prompt for the password to boot before loader
gets invoked at all, much like the EFI loader does.  That's good.  (My
assumption is that the "-g" is sufficient; I don't need (or want)
"bootme" set -- correct?)

/However, /once the kernel boots somewhere in the mishmash of boot-time
messages, and probably not where it's instantly obvious nor where it
will halt the cascade display on the console, I'm going to get asked for
that passphrase again?  I assume I want to remove
'geom_eli_passphrase_prompt="YES"' from loader.conf as well -- or would
leaving it in there save me from the prompt that's hard to find in the
cascade?

Or, even better, would that situation of a double-prompt only apply if I
had "-b" set on something /other than /the boot device pool vdevs (I
don't -- those are handled by #2 for this exact reason.)

--
Karl Denninger
[hidden email] <mailto:[hidden email]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)

Ian Lepore-3
On Sun, 2019-02-10 at 12:35 -0600, Karl Denninger wrote:

> On 2/10/2019 12:01, Ian Lepore wrote:
> > On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:
> > > On 2/10/2019 11:50, Ian Lepore wrote:
> > > > On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
> > > >
> > > > > [...]
> > > > >
> > > > > BTW am I correct that gptzfsboot did *not* get the ability to
> > > > > read
> > > > > geli-encrypted pools in 12.0?  The UEFI loader does know how
> > > > > (which I'm
> > > > > using on my laptop) but I was under the impression that for
> > > > > non-
> > > > > UEFI
> > > > > systems you still needed the unencrypted boot partition from
> > > > > which to
> > > > > load the kernel.
> > > > >
> > > >
> > > > Nope, that's not correct. GELI support was added to the boot
> > > > and
> > > > loader
> > > > programs for both ufs and zfs in freebsd 12. You must set the
> > > > geli
> > > > '-g'
> > > > option to be prompted for the passphrase while booting (this is
> > > > separate from the '-b' flag that enables mounting the encrypted
> > > > partition as the rootfs). You can use "geli configure -g" to
> > > > turn
> > > > on
> > > > the flag on any existing geli partition.
> > > >
> > > > -- Ian
> > >
> > > Excellent - this will eliminate the need for me to run down the
> > > foot-shooting that occurred in my update script since the
> > > unencrypted
> > > kernel partition is no longer needed at all.  That also
> > > significantly
> > > reduces the attack surface on such a machine (although you could
> > > still
> > > tamper with the contents of freebsd-boot of course.)
> > >
> > > The "-g" flag I knew about from experience in putting 12 on my X1
> > > Carbon
> > > (which works really well incidentally; the only issue I'm aware
> > > of is
> > > that there's no 5Ghz WiFi support.)
> > >
> >
> > One thing that is rather unfortunate... if you have multiple geli
> > encrypted partitions that all have the same passphrase, you will be
> > required to enter that passphrase twice while booting -- once in
> > gpt[zfs]boot, then again during kernel startup when the rest of the
> > drives/partitions get tasted by geom. This is because APIs within
> > the
> > boot process got changed to pass keys instead of the passphrase
> > itself
> > from one stage of booting to the next, and the fallout of that is
> > the
> > key for the rootfs is available to the kernel for mountroot, but
> > the
> > passphrase is not available to the system when geom is probing all
> > the
> > devices, so you get prompted for it again.
> >
> > -- Ian
>
> Let me see if I understand this before I do it then... :-)
>
> I have the following layout:
>
> 1. Two SSDs that contain the OS as a two-provider ZFS pool, which has
> "-b" set on both members; I get the "GELI Passphrase:" prompt from
> the
> loader and those two providers (along with encrypted swap) attach
> early
> in the boot process.  The same SSDs contain a mirrored non-encrypted
> pool that has /boot (and only /boot) on it because previously you
> couldn't boot from an EFI-encrypted pool at all.
>
> Thus:
>
> [\u@NewFS /root]# gpart show da1
> =>       34  468862061  da1  GPT  (224G)
>          34       2014       - free -  (1.0M)
>        2048       1024    1  freebsd-boot  (512K)
>        3072       1024       - free -  (512K)
>        4096   20971520    2  freebsd-zfs  [bootme]  (10G)
>    20975616  134217728    3  freebsd-swap  (64G)
>   155193344  313667584    4  freebsd-zfs  (150G)
>   468860928       1167       - free -  (584K)
>
> There is of course a "da2" that is identical.  The actual encrypted
> root
> pool is on partition 4 with "-b" set at present.  I get prompted from
> loader as a result after the unencrypted partition (#2) boots.
>
> 2. Multiple additional "user space" pools on a bunch of other disks.
>
> Right now #2 is using geli groups.  Prior to 12.0 they were handled
> using a custom /etc/rc.d script I wrote that did basically the same
> thing that geli groups does because all use the same passphrase and
> entering the same thing over and over on a boot was a pain in the
> butt.
> It prompted cleanly with no echo, took a password and then iterated
> over
> a list of devices attaching them one at a time.  That requirement is
> now
> gone with geli groups, which is nice since mergemaster always
> complained
> about it being a "non-standard" thing; it *had* to go in /etc/rc.d
> and
> not in /usr/etc/rc.d else I couldn't get it to run early enough --
> unfortunately.
>
> So if I remove the non-encrypted freebsd-zfs mirror that the system
> boots from in favor of setting "-g" on the root pool (both providers)
> gptzfsboot will find and prompt for the password to boot before
> loader
> gets invoked at all, much like the EFI loader does.  That's good.
> (My
> assumption is that the "-g" is sufficient; I don't need (or want)
> "bootme" set -- correct?)
>
> /However, /once the kernel boots somewhere in the mishmash of boot-
> time
> messages, and probably not where it's instantly obvious nor where it
> will halt the cascade display on the console, I'm going to get asked
> for
> that passphrase again?  I assume I want to remove
> 'geom_eli_passphrase_prompt="YES"' from loader.conf as well -- or
> would
> leaving it in there save me from the prompt that's hard to find in
> the
> cascade?
>
> Or, even better, would that situation of a double-prompt only apply
> if I
> had "-b" set on something /other than /the boot device pool vdevs (I
> don't -- those are handled by #2 for this exact reason.)
>

I think at this point I have to ease out of the conversation, because I
know almost nothing about zfs, despite having somehow managed to add
geli support to the zfs code in loader. I did so without understanding
zfs in any way, because I added the support at a more generic "disk
drive support" layer in loader, and did all my testing using automated
scripts Alan and Warner created to test zfs booting using qemu.

-- Ian

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Geli prompts on gptzfsboot (Was:: Serious ZFS Bootcode Problem (GPT NON-UEFI -- RESOLVED)

Karl Denninger

On 2/10/2019 12:40, Ian Lepore wrote:

> On Sun, 2019-02-10 at 12:35 -0600, Karl Denninger wrote:
>> On 2/10/2019 12:01, Ian Lepore wrote:
>>> On Sun, 2019-02-10 at 11:54 -0600, Karl Denninger wrote:
>>>> On 2/10/2019 11:50, Ian Lepore wrote:
>>>>> On Sun, 2019-02-10 at 11:37 -0600, Karl Denninger wrote:
>>>>>
>>>>>> [...]
>>>>>>
>>>>>> BTW am I correct that gptzfsboot did *not* get the ability to
>>>>>> read
>>>>>> geli-encrypted pools in 12.0?  The UEFI loader does know how
>>>>>> (which I'm
>>>>>> using on my laptop) but I was under the impression that for
>>>>>> non-
>>>>>> UEFI
>>>>>> systems you still needed the unencrypted boot partition from
>>>>>> which to
>>>>>> load the kernel.
>>>>>>
>>>>> Nope, that's not correct. GELI support was added to the boot
>>>>> and
>>>>> loader
>>>>> programs for both ufs and zfs in freebsd 12. You must set the
>>>>> geli
>>>>> '-g'
>>>>> option to be prompted for the passphrase while booting (this is
>>>>> separate from the '-b' flag that enables mounting the encrypted
>>>>> partition as the rootfs). You can use "geli configure -g" to
>>>>> turn
>>>>> on
>>>>> the flag on any existing geli partition.
>>>>>
>>>>> -- Ian
>>>> Excellent - this will eliminate the need for me to run down the
>>>> foot-shooting that occurred in my update script since the
>>>> unencrypted
>>>> kernel partition is no longer needed at all.  That also
>>>> significantly
>>>> reduces the attack surface on such a machine (although you could
>>>> still
>>>> tamper with the contents of freebsd-boot of course.)
>>>>
>>>> The "-g" flag I knew about from experience in putting 12 on my X1
>>>> Carbon
>>>> (which works really well incidentally; the only issue I'm aware
>>>> of is
>>>> that there's no 5Ghz WiFi support.)
>>>>
>>> One thing that is rather unfortunate... if you have multiple geli
>>> encrypted partitions that all have the same passphrase, you will be
>>> required to enter that passphrase twice while booting -- once in
>>> gpt[zfs]boot, then again during kernel startup when the rest of the
>>> drives/partitions get tasted by geom. This is because APIs within
>>> the
>>> boot process got changed to pass keys instead of the passphrase
>>> itself
>>> from one stage of booting to the next, and the fallout of that is
>>> the
>>> key for the rootfs is available to the kernel for mountroot, but
>>> the
>>> passphrase is not available to the system when geom is probing all
>>> the
>>> devices, so you get prompted for it again.
>>>
>>> -- Ian
>> Let me see if I understand this before I do it then... :-)
>>
>> I have the following layout:
>>
>> 1. Two SSDs that contain the OS as a two-provider ZFS pool, which has
>> "-b" set on both members; I get the "GELI Passphrase:" prompt from
>> the
>> loader and those two providers (along with encrypted swap) attach
>> early
>> in the boot process.  The same SSDs contain a mirrored non-encrypted
>> pool that has /boot (and only /boot) on it because previously you
>> couldn't boot from an EFI-encrypted pool at all.
>>
>> Thus:
>>
>> [\u@NewFS /root]# gpart show da1
>> =>       34  468862061  da1  GPT  (224G)
>>          34       2014       - free -  (1.0M)
>>        2048       1024    1  freebsd-boot  (512K)
>>        3072       1024       - free -  (512K)
>>        4096   20971520    2  freebsd-zfs  [bootme]  (10G)
>>    20975616  134217728    3  freebsd-swap  (64G)
>>   155193344  313667584    4  freebsd-zfs  (150G)
>>   468860928       1167       - free -  (584K)
>>
>> There is of course a "da2" that is identical.  The actual encrypted
>> root
>> pool is on partition 4 with "-b" set at present.  I get prompted from
>> loader as a result after the unencrypted partition (#2) boots.
>>
>> 2. Multiple additional "user space" pools on a bunch of other disks.
>>
>> Right now #2 is using geli groups.  Prior to 12.0 they were handled
>> using a custom /etc/rc.d script I wrote that did basically the same
>> thing that geli groups does because all use the same passphrase and
>> entering the same thing over and over on a boot was a pain in the
>> butt.
>> It prompted cleanly with no echo, took a password and then iterated
>> over
>> a list of devices attaching them one at a time.  That requirement is
>> now
>> gone with geli groups, which is nice since mergemaster always
>> complained
>> about it being a "non-standard" thing; it *had* to go in /etc/rc.d
>> and
>> not in /usr/etc/rc.d else I couldn't get it to run early enough --
>> unfortunately.
>>
>> So if I remove the non-encrypted freebsd-zfs mirror that the system
>> boots from in favor of setting "-g" on the root pool (both providers)
>> gptzfsboot will find and prompt for the password to boot before
>> loader
>> gets invoked at all, much like the EFI loader does.  That's good.
>> (My
>> assumption is that the "-g" is sufficient; I don't need (or want)
>> "bootme" set -- correct?)
>>
>> /However, /once the kernel boots somewhere in the mishmash of boot-
>> time
>> messages, and probably not where it's instantly obvious nor where it
>> will halt the cascade display on the console, I'm going to get asked
>> for
>> that passphrase again?  I assume I want to remove
>> 'geom_eli_passphrase_prompt="YES"' from loader.conf as well -- or
>> would
>> leaving it in there save me from the prompt that's hard to find in
>> the
>> cascade?
>>
>> Or, even better, would that situation of a double-prompt only apply
>> if I
>> had "-b" set on something /other than /the boot device pool vdevs (I
>> don't -- those are handled by #2 for this exact reason.)
>>
> I think at this point I have to ease out of the conversation, because I
> know almost nothing about zfs, despite having somehow managed to add
> geli support to the zfs code in loader. I did so without understanding
> zfs in any way, because I added the support at a more generic "disk
> drive support" layer in loader, and did all my testing using automated
> scripts Alan and Warner created to test zfs booting using qemu.
>
> -- Ian
I can confirm that this boots and comes up cleanly without re-prompting
for the boot pool password.

The machines I have in the field in this config, during the next upgrade
cycle, are going to get set up this way.  When it makes sense to replace
these with UEFI boards (likely when Coffee Lake Xeons and Mobos that can
handle them get a bit more reasonable and start showing up with IPMI/kvm
ports) I'll likely start getting rid of these older devices simply on
the performance-for-power equation, but these are likely to be out there
for me, anyway, for the next few years.

In short very nice work -- and thank you!

--
Karl Denninger
[hidden email] <mailto:[hidden email]>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Serious ZFS Bootcode Problem (GPT NON-UEFI)

Mel Pilgrim
In reply to this post by Karl Denninger
On 02/09/2019 14:30, Karl Denninger wrote:

> FreeBSD 12.0-STABLE r343809
>
> After upgrading to this (without material incident) zfs was telling me
> that the pools could be upgraded (this machine was running 11.1, then 11.2.)
>
> I did so, /and put the new bootcode on with gpart bootcode -b /boot/pmbr
> -p /boot/gptzfsboot -i .... da... /on both of the candidate (mirrored
> ZFS boot disk) devices, in the correct partition.
>
> Then I rebooted to test and..... /could not find the zsboot pool
> containing the kernel./
>
> I booted the rescue image off my SD and checked -- the copy of
> gptzfsboot that I put on the boot partition is exactly identical to the
> one on the rescue image SD.
>
> Then, to be /absolutely sure /I wasn't going insane I grabbed the
> mini-memstick img for 12-RELEASE and tried THAT copy of gptzfsboot.
>
> /Nope; that won't boot either!/
>
> Fortunately I had a spare drive slot so I stuck in a piece of spinning
> rust, gpart'ed THAT with an old-style UFS boot filesystem, wrote
> bootcode on that, mounted the ZFS "zsboot" filesystem and copied it
> over.  That boots fine (of course) and mounts the root pool, and off it
> goes.
>
> I'm going to blow away the entire /usr/obj tree and rebuild the kernel
> to see if that gets me anything that's more-sane, but right now this
> looks pretty bad.
>
> BTW just to be absolutely sure I blew away the entire /usr/obj directory
> and rebuilt -- same size and checksum on the binary that I have
> installed, so.....
>
> Not sure what's going on here -- did something get moved?

I smashed my head against the wall for days with a very similar-sounding
problem: pure ZFS with a GELI root and separate /boot pool that would
not import the /boot pool at boot, resulting in the kernel not having
the keys to attach the GELI+ZFS root.

That configuration needs some extra bits in loader.conf so that
zpool.cache and the GELI keys get loaded for the kernel by the loader.

This loads the zpool.cache into the kernel so it imports everything
before /etc/rc.d/zfs can run (the case where you have a ZFS /boot that
isn't imported after a reboot:

zpool_cache_load="YES"
zpool_cache_name="/boot/zfs/zpool.cache"
zpool_cache_type="/boot/zfs/zpool.cache"

Run geli init with -b so the providers are flagged for attachment at
boot (instead of by /etc/rc.d/geli), then add this for every GELI
provider you want the kernel to attach before starting the userland:

geli_FOO_keyfile0_load="YES"
geli_FOO_keyfile0_name="/boot/path/to/key"
geli_FOO_keyfile0_type="devicename:geli_keyfile0"

FOO can be any alphanumeric string, and needs to be consistent for all
three lines and unique per device.  The "devicename" is gpt/BAR for a
device with a GPT label of BAR.  It can also be the unlabeled device
(e.g., da0p3), but using GPT labels is recommended because it makes the
keys follow a device renumber.

For example, my GELI+ZFS root is a mirror of partitions with nvmezfs0
and nvmezfs1 GPT labels, so I have in my loader.conf:

geli_nvmezfs0_keyfile0_load="YES"
geli_nvmezfs0_keyfile0_name="/boot/gelikeys/nvmezfs0.key"
geli_nvmezfs0_keyfile0_type="gpt/nvmezfs0:geli_keyfile0"
geli_nvmezfs1_keyfile0_load="YES"
geli_nvmezfs1_keyfile0_name="/boot/gelikeys/nvmezfs1.key"
geli_nvmezfs1_keyfile0_type="gpt/nvmezfs1:geli_keyfile0"

If you use GPT labels, you can safely ignore the "GEOM_ELI: Found no key
files in loader.conf for DEVICE" messages where DEVICE is the unlabeled
device--the GELI module doesn't currently recognize that the unlabeled
and labeled devices are the same provider.

This doesn't appear to be documented in the Handbook or any man pages
that I could find.  The zpool_cache_load trick is mentioned in a FreeBSD
wiki page[1], and the geli_* config is pulled from the zfsboot script
used by bsdinstall to install a pure-ZFS system with GELI root.

I'm not sure if this is exactly your problem, but maybe it helps?

1: https://wiki.freebsd.org/MasonLoringBliss/UEFIandZFSandGELIbyHAND
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"