Fatal trap 18 on boot after OpenZFS import

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fatal trap 18 on boot after OpenZFS import

Tomoaki AOKI
Hi.

Encountering boot failure with fatal trap 18 on boot,
happening at (maybe) just before init() starts. Possibly on
root remount by kernel or zpool import by rc.d script.
The last revision tried is r365316 (r364788 is the last tried
clean rebuild).

The last health revision is r364744, just before actual switch
to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU.

r364751 with diff of r364777 and r364788 (to successfully built
Without unrelated-to-OpenZFS changes) fails.

Any suggestions and fixes are appreciated.


Trap screen is something like below (text attached),
typed up from relatively clear photo, so could be some typo.

This is shown just after usual kernel startup outputs.
boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi
properly, and loader.efi seems to boot kernel properly.

As even single user shell selection doesn't appear, loader.efi
is of r364744. But they works even if I proceeded irregular
process,

  1)Update src tree
  2)Clean obj tree
  3)buildworld
  4)etcupdate -p
  5)buildkernel
  6)installkernel
  7)shutdown to single user WITHOUT reboot  <- Irregular!
  8)installworld
  9)etcupdate
 10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...)
 11)reboot

loader.efi looks doing its job and panics after kernel startup ends.
Needless to say, rolling back to r364744 state from stable/12 on nvd0
Fixes the issue.

Regards.

=====

Fatal trap 18: integer divide fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer     = 0x20:0xffffffff82bfa320
stack pointer           = 0x28:0xfffffe00e20c6900
frame pointer           = 0x28:0xfffffe00e20c6960
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 27 (vdev_open)
trap number             = 18
panic: integer divide fault
cpuid = 2
time = 16
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660
panic() at panic+0x43/frame fffffe00e20c66c0
trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720
trap() at trap+0x8e/frame fffffe00e20c6830
calltrap() at calltrap+0x8/frame fffffe00e20c6830
--- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp
= 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame
0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame
0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame
0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame
0xfffffe00e20c6a00 taskqueue_run_locked() at
taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80
taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame
0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame
0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame
0xfffffe00e20c6af0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 27 tid 100570 ]
Stopped at      kdb_enter+0x37: movq    $0,0x1091556(%rip)
db>

=====

Additional info:
 *Clean build with killing CPUTYPE from command line and
  make.conf (so should be equivalent with nocona) didn't help.

 *Clean build with commenting out WITH_KERNEL_RETPOLINE line
  and WITH_RETPOLINE line in src.conf didn't help.

 *Combination of the above two didn't help, too (at r364788).

 *There are two root pools in different physical drive.
  stable/12 on nvd0 (primary) and head on ada0 (secondary).

 *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC)
  kernel.

--
Tomoaki AOKI    <[hidden email]>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"

Fatal_trap_18_on_head_after_r364744.log (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fatal trap 18 on boot after OpenZFS import

Tomoaki AOKI
Filed PR.
Bug 249147 - [ZFS][Panic]Fatal trap 18 on boot after OpenZFS import

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249147


On Fri, 4 Sep 2020 22:03:01 +0900
Tomoaki AOKI <[hidden email]> wrote:

> Hi.
>
> Encountering boot failure with fatal trap 18 on boot,
> happening at (maybe) just before init() starts. Possibly on
> root remount by kernel or zpool import by rc.d script.
> The last revision tried is r365316 (r364788 is the last tried
> clean rebuild).
>
> The last health revision is r364744, just before actual switch
> to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU.
>
> r364751 with diff of r364777 and r364788 (to successfully built
> Without unrelated-to-OpenZFS changes) fails.
>
> Any suggestions and fixes are appreciated.
>
>
> Trap screen is something like below (text attached),
> typed up from relatively clear photo, so could be some typo.
>
> This is shown just after usual kernel startup outputs.
> boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi
> properly, and loader.efi seems to boot kernel properly.
>
> As even single user shell selection doesn't appear, loader.efi
> is of r364744. But they works even if I proceeded irregular
> process,
>
>   1)Update src tree
>   2)Clean obj tree
>   3)buildworld
>   4)etcupdate -p
>   5)buildkernel
>   6)installkernel
>   7)shutdown to single user WITHOUT reboot  <- Irregular!
>   8)installworld
>   9)etcupdate
>  10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...)
>  11)reboot
>
> loader.efi looks doing its job and panics after kernel startup ends.
> Needless to say, rolling back to r364744 state from stable/12 on nvd0
> Fixes the issue.
>
> Regards.
>
> =====
>
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 2; apic id = 02
> instruction pointer     = 0x20:0xffffffff82bfa320
> stack pointer           = 0x28:0xfffffe00e20c6900
> frame pointer           = 0x28:0xfffffe00e20c6960
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 27 (vdev_open)
> trap number             = 18
> panic: integer divide fault
> cpuid = 2
> time = 16
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660
> panic() at panic+0x43/frame fffffe00e20c66c0
> trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720
> trap() at trap+0x8e/frame fffffe00e20c6830
> calltrap() at calltrap+0x8/frame fffffe00e20c6830
> --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp
> = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame
> 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame
> 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame
> 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame
> 0xfffffe00e20c6a00 taskqueue_run_locked() at
> taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80
> taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame
> 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame
> 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame
> 0xfffffe00e20c6af0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 27 tid 100570 ]
> Stopped at      kdb_enter+0x37: movq    $0,0x1091556(%rip)
> db>
>
> =====
>
> Additional info:
>  *Clean build with killing CPUTYPE from command line and
>   make.conf (so should be equivalent with nocona) didn't help.
>
>  *Clean build with commenting out WITH_KERNEL_RETPOLINE line
>   and WITH_RETPOLINE line in src.conf didn't help.
>
>  *Combination of the above two didn't help, too (at r364788).
>
>  *There are two root pools in different physical drive.
>   stable/12 on nvd0 (primary) and head on ada0 (secondary).
>
>  *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC)
>   kernel.
>
> --
> Tomoaki AOKI    <[hidden email]>


--
Tomoaki AOKI    <[hidden email]>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Fatal trap 18 on boot after OpenZFS import

Tomoaki AOKI
Forgot to mention here.

As I already mentioned on bugzilla, this problem is fixed at r365894.

Thanks again, Ryan and Matthew!


On Sun, 6 Sep 2020 18:02:40 +0900
Tomoaki AOKI <[hidden email]> wrote:

> Filed PR.
> Bug 249147 - [ZFS][Panic]Fatal trap 18 on boot after OpenZFS import
>
>  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249147
>
>
> On Fri, 4 Sep 2020 22:03:01 +0900
> Tomoaki AOKI <[hidden email]> wrote:
>
> > Hi.
> >
> > Encountering boot failure with fatal trap 18 on boot,
> > happening at (maybe) just before init() starts. Possibly on
> > root remount by kernel or zpool import by rc.d script.
> > The last revision tried is r365316 (r364788 is the last tried
> > clean rebuild).
> >
> > The last health revision is r364744, just before actual switch
> > to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU.
> >
> > r364751 with diff of r364777 and r364788 (to successfully built
> > Without unrelated-to-OpenZFS changes) fails.
> >
> > Any suggestions and fixes are appreciated.
> >
> >
> > Trap screen is something like below (text attached),
> > typed up from relatively clear photo, so could be some typo.
> >
> > This is shown just after usual kernel startup outputs.
> > boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi
> > properly, and loader.efi seems to boot kernel properly.
> >
> > As even single user shell selection doesn't appear, loader.efi
> > is of r364744. But they works even if I proceeded irregular
> > process,
> >
> >   1)Update src tree
> >   2)Clean obj tree
> >   3)buildworld
> >   4)etcupdate -p
> >   5)buildkernel
> >   6)installkernel
> >   7)shutdown to single user WITHOUT reboot  <- Irregular!
> >   8)installworld
> >   9)etcupdate
> >  10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...)
> >  11)reboot
> >
> > loader.efi looks doing its job and panics after kernel startup ends.
> > Needless to say, rolling back to r364744 state from stable/12 on nvd0
> > Fixes the issue.
> >
> > Regards.
> >
> > =====
> >
> > Fatal trap 18: integer divide fault while in kernel mode
> > cpuid = 2; apic id = 02
> > instruction pointer     = 0x20:0xffffffff82bfa320
> > stack pointer           = 0x28:0xfffffe00e20c6900
> > frame pointer           = 0x28:0xfffffe00e20c6960
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 27 (vdev_open)
> > trap number             = 18
> > panic: integer divide fault
> > cpuid = 2
> > time = 16
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660
> > panic() at panic+0x43/frame fffffe00e20c66c0
> > trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720
> > trap() at trap+0x8e/frame fffffe00e20c6830
> > calltrap() at calltrap+0x8/frame fffffe00e20c6830
> > --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp
> > = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame
> > 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame
> > 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame
> > 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame
> > 0xfffffe00e20c6a00 taskqueue_run_locked() at
> > taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80
> > taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame
> > 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame
> > 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame
> > 0xfffffe00e20c6af0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > [ thread pid 27 tid 100570 ]
> > Stopped at      kdb_enter+0x37: movq    $0,0x1091556(%rip)
> > db>
> >
> > =====
> >
> > Additional info:
> >  *Clean build with killing CPUTYPE from command line and
> >   make.conf (so should be equivalent with nocona) didn't help.
> >
> >  *Clean build with commenting out WITH_KERNEL_RETPOLINE line
> >   and WITH_RETPOLINE line in src.conf didn't help.
> >
> >  *Combination of the above two didn't help, too (at r364788).
> >
> >  *There are two root pools in different physical drive.
> >   stable/12 on nvd0 (primary) and head on ada0 (secondary).
> >
> >  *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC)
> >   kernel.
> >
> > --
> > Tomoaki AOKI    <[hidden email]>
>
>
> --
> Tomoaki AOKI    <[hidden email]>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[hidden email]"


--
Tomoaki AOKI    <[hidden email]>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"