amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
Threadripper 1950X). No other Hyper-V use.

This happened during:

# ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
>>> Checking for old files


(Hand typed from a picture of a window's content
at slighly different times, expect typos:)

KDB: enter: panic
[thread pid 42170 tid 100752 ]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db> call doadump
Dumping 4825 out of 110559 MB: ... (omitted) ...
Dump complete
= 0


(The "pid 42170" identification as the process reporting the
issue does not seem to appear in the core.txt.0 file.)


# ls -lTdt /var/crash/*
-rw-r--r--  1 root  wheel      100792 Feb 17 16:09:18 2018 /var/crash/core.txt.0
lrwxr-xr-x  1 root  wheel           8 Feb 17 16:09:08 2018 /var/crash/vmcore.last -> vmcore.0
lrwxr-xr-x  1 root  wheel           6 Feb 17 16:09:08 2018 /var/crash/info.last -> info.0
-rw-------  1 root  wheel  5060296704 Feb 17 16:09:08 2018 /var/crash/vmcore.0
-rw-------  1 root  wheel         392 Feb 17 16:08:59 2018 /var/crash/info.0
-rw-r--r--  1 root  wheel           2 Feb 17 16:08:59 2018 /var/crash/bounds
-rw-r--r--  1 root  wheel           5 Nov 22 04:34:36 2017 /var/crash/minfree

From /var/crash/core.txt.0 :

Unread portion of the kernel message buffer:
spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long
panic: spin lock held too long
cpuid = 5
time = 1518911834
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f10094d0
vpanic() at vpanic+0x18d/frame 0xfffffe00f1009530
panic() at panic+0x43/frame 0xfffffe00f1009590
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f10095a0
thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1009610
statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1009650
handleevents() at handleevents+0x113/frame 0xfffffe00f10096a0
timercb() at timercb+0xa9/frame 0xfffffe00f10096f0
lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1009730
timerint_u() at timerint_u+0x96/frame 0xfffffe00f1009810
thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1009880
fork1() at fork1+0x1b9f/frame 0xfffffe00f1009930
sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1009980
amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1009ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffcfc0
KDB: enter: panic

__curthread () at ./machine/pcpu.h:230
230             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) #0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=-2122191464) at /usr/src/sys/kern/kern_shutdown.c:347
#2  0xffffffff8040b42c in db_fncall_generic (addr=<optimized out>,
   rv=<optimized out>, nargs=<optimized out>, args=<optimized out>)
   at /usr/src/sys/ddb/db_command.c:609
#3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>,
   dummy3=<optimized out>, dummy4=<optimized out>)
   at /usr/src/sys/ddb/db_command.c:657
#4  0xffffffff8040af79 in db_command (last_cmdp=<optimized out>,
   cmd_table=<optimized out>, dopager=<optimized out>)
   at /usr/src/sys/ddb/db_command.c:481
#5  0xffffffff8040acf4 in db_command_loop ()
   at /usr/src/sys/ddb/db_command.c:534
#6  0xffffffff8040df9f in db_trap (type=<optimized out>, code=<optimized out>)
   at /usr/src/sys/ddb/db_main.c:250
#7  0xffffffff80b370e3 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
   at /usr/src/sys/kern/subr_kdb.c:697
#8  0xffffffff80fa2c5c in trap (frame=0xfffffe00f1009400)
   at /usr/src/sys/amd64/amd64/trap.c:547
#9  <signal handler called>
#10 kdb_enter (why=0xffffffff811f280b "panic", msg=<optimized out>)
   at /usr/src/sys/kern/subr_kdb.c:479
#11 0xffffffff80aef17a in vpanic (fmt=<optimized out>, ap=0xfffffe00f1009570)
   at /usr/src/sys/kern/kern_shutdown.c:801
#12 0xffffffff80aeefc3 in panic (fmt=0x0)
   at /usr/src/sys/kern/kern_shutdown.c:739
#13 0xffffffff80acfa31 in _mtx_lock_indefinite_check (m=<optimized out>,
   ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224
#14 0xffffffff80acfb9b in thread_lock_flags_ (td=0xfffff818719d1000,
   opts=<optimized out>, file=<optimized out>, line=<optimized out>)
   at /usr/src/sys/kern/kern_mutex.c:913
#15 0xffffffff80a89d6c in statclock_cnt (cnt=1, usermode=<optimized out>)
   at /usr/src/sys/kern/kern_clock.c:768
#16 0xffffffff810d0003 in handleevents (now=43230207690178, fake=0)
   at /usr/src/sys/kern/kern_clocksource.c:196
#17 0xffffffff810d0709 in timercb (et=0xffffffff81c528e8 <lapic_et>,
   arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353
#18 0xffffffff8110dad7 in lapic_handle_timer (frame=0xfffffe00f1009740)
   at /usr/src/sys/x86/x86/local_apic.c:1305
#19 0xffffffff80f849d0 in timerint_u ()
   at /usr/src/sys/amd64/amd64/apic_vector.S:132
#20 0xfffffe00f1009828 in ?? ()
#21 0x000000000000b6b1 in ?? ()
#22 0x0000000000002000 in ?? ()
#23 0x00000000ffffdfff in ?? ()
#24 0x00c11c08e43e7fd5 in ?? ()
#25 0x00000000000003e8 in ?? ()
#26 0x00000000fffff1eb in ?? ()
#27 0xfffffe00f1009828 in ?? ()
#28 0xfffffe00f1009810 in ?? ()
#29 0x00000000800e6d01 in ?? ()
#30 0x0000000000000064 in ?? ()
#31 0xfffff8011d936560 in ?? ()
#32 0xfffffe00f1009828 in ?? ()
#33 0xffffffff81771014 in mtx_delay ()
#34 0x0000000000000000 in ?? ()
(kgdb)

. . .
UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN   STAT TT     TIME COMMAND
. . .
  0  1110  1102   0  33  0 12024  3076 ttyin    D+    -  0:00.00 [sh]
  0  1120  1044   0  20  0 18572  7936 select   Ds    -  0:00.00 [sshd]
1001  1123  1120   0  20  0 18936  8044 select   D     -  0:00.00 [sshd]
1001  1124  1123   0  34  0 12120  3196 wait     Ds    -  0:00.00 [sh]
  0  1134  1124   0  22  0 12060  3148 wait     D     -  0:00.00 [su]
  0  1135  1134   0  20  0 12312  3244 wait     D     -  0:00.00 [sh]
  0 42072  1135   0  25  0 11464  3060 wait     D+    -  0:00.00 [sh]
  0 42072  1135   0  25  0 11464  3060 wait     D+    -  0:00.00 [sh]
  0 42075 42072   0  20  0 10928  2480 select   D+    -  0:00.00 [script]
  0 42076 42075   0  52  0 10160  1396 wait     Ds+   -  0:00.00 [make]
  0 42108 42076   0  52  0 12236  3224 wait     D+    -  0:00.00 [make]
  0 42168 42108   0  52  0 11496  3068 wait     D+    -  0:00.00 [sh]
  0 42169 42168   0  52  0 12608  3568 pipewr   D+    -  0:00.00 [make]
  0 42170 42168   0  72  0 10956  2284 -        R+    -  0:00.00 [xargs]
  0 42171 42168   0  35  0 11500  3064 piperd   D+    -  0:00.00 [sh]
  0 46769 42170   0  72  0 10956  2284 -        ?VL+  -  0:00.00 [xargs]
. . .




Context details:

# uname -apKU
FreeBSD FBSDFSSD 12.0-CURRENT FreeBSD 12.0-CURRENT  r329465M  amd64 amd64 1200058 1200058

# svnlite status /usr/src | sort
?       /usr/src/sys/amd64/conf/GENERIC-DBG
?       /usr/src/sys/amd64/conf/GENERIC-NODBG
?       /usr/src/sys/arm/conf/GENERIC-DBG
?       /usr/src/sys/arm/conf/GENERIC-NODBG
?       /usr/src/sys/arm64/conf/GENERIC-DBG
?       /usr/src/sys/arm64/conf/GENERIC-NODBG
?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
?       /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
?       /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
M       /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M       /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
M       /usr/src/crypto/openssl/crypto/armcap.c
M       /usr/src/lib/libkvm/kvm_powerpc.c
M       /usr/src/lib/libkvm/kvm_private.c
M       /usr/src/stand/defs.mk
M       /usr/src/stand/powerpc/boot1.chrp/Makefile
M       /usr/src/stand/powerpc/kboot/Makefile
M       /usr/src/sys/arm64/arm64/identcpu.c
M       /usr/src/sys/conf/kmod.mk
M       /usr/src/sys/conf/ldscript.powerpc
M       /usr/src/sys/kern/subr_pcpu.c
M       /usr/src/sys/powerpc/aim/mmu_oea64.c
M       /usr/src/sys/powerpc/ofw/ofw_machdep.c
M       /usr/src/sys/powerpc/powerpc/interrupt.c
M       /usr/src/sys/powerpc/powerpc/mp_machdep.c
M       /usr/src/sys/powerpc/powerpc/trap.c
M       /usr/src/sys/vm/uma_core.c
M       /usr/src/usr.bin/top/machine.c

The GENERIC* files include GENERIC and then set explicit
debug status choices of mine. Most of the rest is tied to
historical powerpc and powerpc64 investigations. I also
have top report the maximum swap-in-use figure that it
sees during its run.

# svnlite diff /usr/src/sys/vm/uma_core.c
Index: /usr/src/sys/vm/uma_core.c
===================================================================
--- /usr/src/sys/vm/uma_core.c (revision 329465)
+++ /usr/src/sys/vm/uma_core.c (working copy)
@@ -3428,7 +3428,7 @@
void
uma_reclaim_wakeup(void)
{
-
+printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
        if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
                wakeup(uma_reclaim);
}



Side note: It took the automatic fsck and 2 manual fsck
runs to get back to a clean status (before I could get to
multi-user).

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
[Some more information added, from /usr/libexec/kgdb use.]

On 2018-Feb-17, at 5:39 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:

> This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
> The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
> 29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
> Threadripper 1950X). No other Hyper-V use.
>
> This happened during:
>
> # ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
> Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
>>>> Checking for old files
>
>
> (Hand typed from a picture of a window's content
> at slighly different times, expect typos:)
>
> KDB: enter: panic
> [thread pid 42170 tid 100752 ]
> Stopped at kdb_enter+0x3b: movq $0,kdb_why
> db> call doadump
> Dumping 4825 out of 110559 MB: ... (omitted) ...
> Dump complete
> = 0
>
>
> (The "pid 42170" identification as the process reporting the
> issue does not seem to appear in the core.txt.0 file.)
>
>
> # ls -lTdt /var/crash/*
> -rw-r--r--  1 root  wheel      100792 Feb 17 16:09:18 2018 /var/crash/core.txt.0
> lrwxr-xr-x  1 root  wheel           8 Feb 17 16:09:08 2018 /var/crash/vmcore.last -> vmcore.0
> lrwxr-xr-x  1 root  wheel           6 Feb 17 16:09:08 2018 /var/crash/info.last -> info.0
> -rw-------  1 root  wheel  5060296704 Feb 17 16:09:08 2018 /var/crash/vmcore.0
> -rw-------  1 root  wheel         392 Feb 17 16:08:59 2018 /var/crash/info.0
> -rw-r--r--  1 root  wheel           2 Feb 17 16:08:59 2018 /var/crash/bounds
> -rw-r--r--  1 root  wheel           5 Nov 22 04:34:36 2017 /var/crash/minfree
>
> From /var/crash/core.txt.0 :
>
> Unread portion of the kernel message buffer:
> spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long
> panic: spin lock held too long
> cpuid = 5
> time = 1518911834
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f10094d0
> vpanic() at vpanic+0x18d/frame 0xfffffe00f1009530
> panic() at panic+0x43/frame 0xfffffe00f1009590
> _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f10095a0
> thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1009610
> statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1009650
> handleevents() at handleevents+0x113/frame 0xfffffe00f10096a0
> timercb() at timercb+0xa9/frame 0xfffffe00f10096f0
> lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1009730
> timerint_u() at timerint_u+0x96/frame 0xfffffe00f1009810
> thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1009880
> fork1() at fork1+0x1b9f/frame 0xfffffe00f1009930
> sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1009980
> amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1009ab0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffcfc0
> KDB: enter: panic
>
> __curthread () at ./machine/pcpu.h:230
> 230             __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) #0  __curthread () at ./machine/pcpu.h:230
> #1  doadump (textdump=-2122191464) at /usr/src/sys/kern/kern_shutdown.c:347
> #2  0xffffffff8040b42c in db_fncall_generic (addr=<optimized out>,
>   rv=<optimized out>, nargs=<optimized out>, args=<optimized out>)
>   at /usr/src/sys/ddb/db_command.c:609
> #3  db_fncall (dummy1=<optimized out>, dummy2=<optimized out>,
>   dummy3=<optimized out>, dummy4=<optimized out>)
>   at /usr/src/sys/ddb/db_command.c:657
> #4  0xffffffff8040af79 in db_command (last_cmdp=<optimized out>,
>   cmd_table=<optimized out>, dopager=<optimized out>)
>   at /usr/src/sys/ddb/db_command.c:481
> #5  0xffffffff8040acf4 in db_command_loop ()
>   at /usr/src/sys/ddb/db_command.c:534
> #6  0xffffffff8040df9f in db_trap (type=<optimized out>, code=<optimized out>)
>   at /usr/src/sys/ddb/db_main.c:250
> #7  0xffffffff80b370e3 in kdb_trap (type=3, code=-61456, tf=<optimized out>)
>   at /usr/src/sys/kern/subr_kdb.c:697
> #8  0xffffffff80fa2c5c in trap (frame=0xfffffe00f1009400)
>   at /usr/src/sys/amd64/amd64/trap.c:547
> #9  <signal handler called>
> #10 kdb_enter (why=0xffffffff811f280b "panic", msg=<optimized out>)
>   at /usr/src/sys/kern/subr_kdb.c:479
> #11 0xffffffff80aef17a in vpanic (fmt=<optimized out>, ap=0xfffffe00f1009570)
>   at /usr/src/sys/kern/kern_shutdown.c:801
> #12 0xffffffff80aeefc3 in panic (fmt=0x0)
>   at /usr/src/sys/kern/kern_shutdown.c:739
> #13 0xffffffff80acfa31 in _mtx_lock_indefinite_check (m=<optimized out>,
>   ldap=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1224
> #14 0xffffffff80acfb9b in thread_lock_flags_ (td=0xfffff818719d1000,
>   opts=<optimized out>, file=<optimized out>, line=<optimized out>)
>   at /usr/src/sys/kern/kern_mutex.c:913
> #15 0xffffffff80a89d6c in statclock_cnt (cnt=1, usermode=<optimized out>)
>   at /usr/src/sys/kern/kern_clock.c:768
> #16 0xffffffff810d0003 in handleevents (now=43230207690178, fake=0)
>   at /usr/src/sys/kern/kern_clocksource.c:196
> #17 0xffffffff810d0709 in timercb (et=0xffffffff81c528e8 <lapic_et>,
>   arg=<optimized out>) at /usr/src/sys/kern/kern_clocksource.c:353
> #18 0xffffffff8110dad7 in lapic_handle_timer (frame=0xfffffe00f1009740)
>   at /usr/src/sys/x86/x86/local_apic.c:1305
> #19 0xffffffff80f849d0 in timerint_u ()
>   at /usr/src/sys/amd64/amd64/apic_vector.S:132
> #20 0xfffffe00f1009828 in ?? ()
> #21 0x000000000000b6b1 in ?? ()
> #22 0x0000000000002000 in ?? ()
> #23 0x00000000ffffdfff in ?? ()
> #24 0x00c11c08e43e7fd5 in ?? ()
> #25 0x00000000000003e8 in ?? ()
> #26 0x00000000fffff1eb in ?? ()
> #27 0xfffffe00f1009828 in ?? ()
> #28 0xfffffe00f1009810 in ?? ()
> #29 0x00000000800e6d01 in ?? ()
> #30 0x0000000000000064 in ?? ()
> #31 0xfffff8011d936560 in ?? ()
> #32 0xfffffe00f1009828 in ?? ()
> #33 0xffffffff81771014 in mtx_delay ()
> #34 0x0000000000000000 in ?? ()
> (kgdb)
>
> . . .
> UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN   STAT TT     TIME COMMAND
> . . .
>  0  1110  1102   0  33  0 12024  3076 ttyin    D+    -  0:00.00 [sh]
>  0  1120  1044   0  20  0 18572  7936 select   Ds    -  0:00.00 [sshd]
> 1001  1123  1120   0  20  0 18936  8044 select   D     -  0:00.00 [sshd]
> 1001  1124  1123   0  34  0 12120  3196 wait     Ds    -  0:00.00 [sh]
>  0  1134  1124   0  22  0 12060  3148 wait     D     -  0:00.00 [su]
>  0  1135  1134   0  20  0 12312  3244 wait     D     -  0:00.00 [sh]
>  0 42072  1135   0  25  0 11464  3060 wait     D+    -  0:00.00 [sh]
>  0 42072  1135   0  25  0 11464  3060 wait     D+    -  0:00.00 [sh]
>  0 42075 42072   0  20  0 10928  2480 select   D+    -  0:00.00 [script]
>  0 42076 42075   0  52  0 10160  1396 wait     Ds+   -  0:00.00 [make]
>  0 42108 42076   0  52  0 12236  3224 wait     D+    -  0:00.00 [make]
>  0 42168 42108   0  52  0 11496  3068 wait     D+    -  0:00.00 [sh]
>  0 42169 42168   0  52  0 12608  3568 pipewr   D+    -  0:00.00 [make]
>  0 42170 42168   0  72  0 10956  2284 -        R+    -  0:00.00 [xargs]
>  0 42171 42168   0  35  0 11500  3064 piperd   D+    -  0:00.00 [sh]
>  0 46769 42170   0  72  0 10956  2284 -        ?VL+  -  0:00.00 [xargs]
> . . .
>
>
>
>
> Context details:
>
> # uname -apKU
> FreeBSD FBSDFSSD 12.0-CURRENT FreeBSD 12.0-CURRENT  r329465M  amd64 amd64 1200058 1200058
>
> # svnlite status /usr/src | sort
> ?       /usr/src/sys/amd64/conf/GENERIC-DBG
> ?       /usr/src/sys/amd64/conf/GENERIC-NODBG
> ?       /usr/src/sys/arm/conf/GENERIC-DBG
> ?       /usr/src/sys/arm/conf/GENERIC-NODBG
> ?       /usr/src/sys/arm64/conf/GENERIC-DBG
> ?       /usr/src/sys/arm64/conf/GENERIC-NODBG
> ?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-DBG
> ?       /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
> ?       /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
> ?       /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
> M       /usr/src/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
> M       /usr/src/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
> M       /usr/src/crypto/openssl/crypto/armcap.c
> M       /usr/src/lib/libkvm/kvm_powerpc.c
> M       /usr/src/lib/libkvm/kvm_private.c
> M       /usr/src/stand/defs.mk
> M       /usr/src/stand/powerpc/boot1.chrp/Makefile
> M       /usr/src/stand/powerpc/kboot/Makefile
> M       /usr/src/sys/arm64/arm64/identcpu.c
> M       /usr/src/sys/conf/kmod.mk
> M       /usr/src/sys/conf/ldscript.powerpc
> M       /usr/src/sys/kern/subr_pcpu.c
> M       /usr/src/sys/powerpc/aim/mmu_oea64.c
> M       /usr/src/sys/powerpc/ofw/ofw_machdep.c
> M       /usr/src/sys/powerpc/powerpc/interrupt.c
> M       /usr/src/sys/powerpc/powerpc/mp_machdep.c
> M       /usr/src/sys/powerpc/powerpc/trap.c
> M       /usr/src/sys/vm/uma_core.c
> M       /usr/src/usr.bin/top/machine.c
>
> The GENERIC* files include GENERIC and then set explicit
> debug status choices of mine. Most of the rest is tied to
> historical powerpc and powerpc64 investigations. I also
> have top report the maximum swap-in-use figure that it
> sees during its run.
>
> # svnlite diff /usr/src/sys/vm/uma_core.c
> Index: /usr/src/sys/vm/uma_core.c
> ===================================================================
> --- /usr/src/sys/vm/uma_core.c (revision 329465)
> +++ /usr/src/sys/vm/uma_core.c (working copy)
> @@ -3428,7 +3428,7 @@
> void
> uma_reclaim_wakeup(void)
> {
> -
> +printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
> if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
> wakeup(uma_reclaim);
> }
>
>
>
> Side note: It took the automatic fsck and 2 manual fsck
> runs to get back to a clean status (before I could get to
> multi-user).

/usr/local/bin/kgdb did not seem to work. But
/usr/libexec/kgdb seems to. (This is likely the first time
that I've used a normal vmcore.* file on any architecture.
On powerpc I had to hack things into a non-default way of
working. On powerpc64 dumps used to fail.)

# /usr/libexec/kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
. . .

So here is what thread 100691 gets as a report in
"info threads" --100752 shows "doadump", but of course
was doing a fork at the time of the problem:

(kgdb) info threads
  558 Thread 100691 (PID=46769: xargs)  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:840
. . .
* 556 Thread 100752 (PID=42170: xargs)  doadump (textdump=0x8181ed98) at pcpu.h:230
. . .

The bt for 558 (a.k.a. 100691) does not seem to have much
information:

(kgdb) thread 558
[Switching to thread 558 (Thread 100691)]#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:840
840 movq %r12,%rdi /* function */
Current language:  auto; currently asm
(kgdb) bt
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:840
#1  0xffffffff80f8398d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:480
#2  0x0000000800c43000 in ?? ()
#3  0x0000000000203353 in ?? ()
#4  0x0000000000000001 in .rodata ()
#5  0x0000000800c43008 in ?? ()
#6  0x00c11c08e43e7fd5 in ?? ()
#7  0x0000000000000000 in ?? ()
(kgdb)

But the kernel message buffer material reported:
(repeated from the earlier Email)

spin lock 0xffffffff81b2dcc0 (sched lock 5) held by 0xfffff8011d936560 (tid 100691) too long


I'll note that I have no known way to cause a repeat. I'd been doing the same sort of
build activity for a time when this happened.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

Mateusz Guzik
Can you please bisect this? There is another report stating that r329418
works fine.

On Sun, Feb 18, 2018 at 6:35 PM, Mark Millard <[hidden email]>
wrote:

>
> On 2018-Feb-17, at 6:10 PM, Mark Millard <marklmi26-fbsd at yahoo.com>
> wrote:
>
> > [Some more information added, from /usr/libexec/kgdb use.]
> >
> > On 2018-Feb-17, at 5:39 PM, Mark Millard <marklmi26-fbsd at yahoo.com>
> wrote:
> >
> >> This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
> >> The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
> >> 29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
> >> Threadripper 1950X). No other Hyper-V use.
>
> Trond's report seems to be for a "4 core" Intel i7 context (as seen
> by FreeBSD in virtual box). So Ryzen seems to be non-essential for
> reproduction.
>
> Both of our reports are from some form of using FreeBSD in a virtual
> machine (Hyper-V and VirtualBox). I do not know if that is a required
> type of context or not.
>
> >> This happened during:
> >>
> >> # ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_
> nodebug_clang_altbinutils-amd64-host.sh check-old
> DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
> >> Script started, output file is /root/sys_typescripts/
> typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-
> amd64-host-2018-02-17:15:56:20
> >>>>> Checking for old files
> >>
>
> I got another example but during a buildworld:
>
> >>> Deleting stale files in build tree...
> cd /usr/src; MACHINE_ARCH=powerpc64  MACHINE=powerpc  CPUTYPE=
> BUILD_TOOLS_META=.NOMETA CC="cc -target powerpc64-unknown-freebsd12.0
> --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp
> -B/usr/local/powerpc64-unknown-freebsd12.0/bin/" CXX="c++  -target
> powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_
> altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp
> -B/usr/local/powerpc64-unknown-freebsd12.0/bin/"  CPP="cpp -target
> powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_
> altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp
> -B/usr/local/powerpc64-unknown-freebsd12.0/bin/"
> AS="/usr/local/powerpc64-unknown-freebsd12.0/bin/as"
> AR="/usr/local/powerpc64-unknown-freebsd12.0/bin/ar"
> LD="/usr/local/powerpc64-unknown-freebsd12.0/bin/ld" LLVM_LINK=""
> NM=/usr/local/powerpc64-unknown-freebsd12.0/bin/nm
> OBJCOPY="/usr/local/powerpc64-unknown-freebsd12.0/bin/objcopy"
> RANLIB=/usr/local/powerpc64-unknown-
>  freebsd12.0/bin/ranlib STRINGS=/usr/local/bin/
> powerpc64-unknown-freebsd12.0-strings  SIZE="/usr/local/powerpc64-unknown-freebsd12.0/bin/size"
> INSTALL="sh /usr/src/tools/install.sh"  PATH=/usr/obj/powerpc64vtsc_
> clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.
> powerpc64/tmp/legacy/usr/sbin:/usr/obj/powerpc64vtsc_clang_
> altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/
> legacy/usr/bin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/
> usr/src/powerpc.powerpc64/tmp/legacy/bin:/usr/obj/powerpc64vtsc_clang_
> altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/
> usr/sbin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/
> usr/src/powerpc.powerpc64/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin
> SYSROOT=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp
> make  -f Makefile.inc1  BWPHASE=worldtmp  DESTDIR=/usr/obj/
> powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp
> -DBATCH_DELETE_OLD_FILES  delete-old d
>  elete-old-libs >/dev/null
>
> load: 0.68  cmd: make 62180 [select] 25.15r 0.00u 0.00s 0% 1468k
> make: Working in: /usr/obj/powerpc64vtsc_clang_
> altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64
> packet_write_wait: Connection to 192.168.1.165 port 22: Broken pipe
>
>
> (I noticed the long pause and got the ^T in before the panic.)
>
> Yet again it is xargs related fork activity that gets the problem (from
> core.txt.1 ):
>
>   561 Thread 100836 (PID=69982: xargs)  fork_trampoline () at
> /usr/src/sys/amd64/amd64/exception.S:840
> . . .
> * 559 Thread 100811 (PID=62304: xargs)  doadump (textdump=-2122191464) at
> pcpu.h:230
>
> spin lock 0xffffffff81b3cf00 (sched lock 24) held by 0xfffff806aa6d5000
> (tid 100836) too long
> panic: spin lock held too long
> cpuid = 24
> time = 1518974055
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00f11304d0
> vpanic() at vpanic+0x18d/frame 0xfffffe00f1130530
> panic() at panic+0x43/frame 0xfffffe00f1130590
> _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame
> 0xfffffe00f11305a0
> thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1130610
> statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1130650
> handleevents() at handleevents+0x113/frame 0xfffffe00f11306a0
> timercb() at timercb+0xa9/frame 0xfffffe00f11306f0
> lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1130730
> timerint_u() at timerint_u+0x96/frame 0xfffffe00f1130810
> thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1130880
> fork1() at fork1+0x1b9f/frame 0xfffffe00f1130930
> sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1130980
> amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1130ab0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffc5a0
>
>
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( markmi at dsl-only.net is
> going away in 2018-Feb, late)
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "[hidden email]"
>



--
Mateusz Guzik <mjguzik gmail.com>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

Trond Endrestøl
On Sun, 18 Feb 2018 19:08+0100, Mateusz Guzik wrote:

> Can you please bisect this? There is another report stating that r329418
> works fine.

My problems started yesterday with r329464. I decided to go back to
r329101 (ZFS BE), update the source tree, move forward to the latest
revision, and so on. I even emptied /usr/obj and /var/cache/ccache and
set WITHOUT_SYSTEM_COMPILER=yes in /etc/src.conf to get rid of any
bias.

I have tried with success r329418, r329419, r329420, and r329422.

I'm now at r329448 and have not seen any spin lock problems so far.

Sooner or later I'll reach r329464 and by then it should be clear
which revision is the likely culprit.

--
Trond.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
In reply to this post by Mateusz Guzik
On 2018-Feb-18, at 10:08 AM, Mateusz Guzik <mjguzik at gmail.com> wrote:

> Can you please bisect this? There is another report stating that r329418 works fine.

I saw that Trond indicated an intent to test -r329418 but I've not seen
any reports about -r329418 or how much activity was used to make any
judgment about its status. But I can assume -r329418 is good if you
want.

Bisecting is likely going to be problematical for self-updates: builds
and installs and such can crash, making the installs risky. I do not
have an alternate builder for amd64 set up.

Even without that, it is not clear how many hours of build-related activity
it takes to have a high probability that the problem is gone. (I've seen
widely variable amounts of activity between failures in -r329465 .) It is
obvious to try an earlier version after failure but not obvious when to
try a later version.

My FreeBSD time is also rather limited (compared to historically over the
last few years), so the activity could be spread over parts of various
weekends, depending on how it goes.

>> On Sun, Feb 18, 2018 at 6:35 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:
>>
>> On 2018-Feb-17, at 6:10 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:
>>
>> > [Some more information added, from /usr/libexec/kgdb use.]
>> >
>> > On 2018-Feb-17, at 5:39 PM, Mark Millard <marklmi26-fbsd at yahoo.com> wrote:
>> >
>> >> This is for FreeBSD running under Hyper-V on a Windows 10 Pro machine.
>> >> The FreeBSD "disk" bindings are to SSDs, not the insides of NTFS files.
>> >> 29 logical processors assigned to FreeBSD (on a 32-thread Ryzen
>> >> Threadripper 1950X). No other Hyper-V use.
>>
>> Trond's report seems to be for a "4 core" Intel i7 context (as seen
>> by FreeBSD in virtual box). So Ryzen seems to be non-essential for
>> reproduction.
>>
>> Both of our reports are from some form of using FreeBSD in a virtual
>> machine (Hyper-V and VirtualBox). I do not know if that is a required
>> type of context or not.
>>
>> >> This happened during:
>> >>
>> >> # ~/sys_build_scripts.amd64-host/make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host.sh check-old DESTDIR=/usr/obj/DESTDIRs/clang-powerpc64-installworld_altbinutils
>> >> Script started, output file is /root/sys_typescripts/typescript_make_powerpc64vtsc_nodebug_clang_altbinutils-amd64-host-2018-02-17:15:56:20
>> >>>>> Checking for old files
>> >>
>>
>> I got another example but during a buildworld:
>>
>> >>> Deleting stale files in build tree...
>> cd /usr/src; MACHINE_ARCH=powerpc64  MACHINE=powerpc  CPUTYPE= BUILD_TOOLS_META=.NOMETA CC="cc -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/" CXX="c++  -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/"  CPP="cpp -target powerpc64-unknown-freebsd12.0 --sysroot=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -B/usr/local/powerpc64-unknown-freebsd12.0/bin/"  AS="/usr/local/powerpc64-unknown-freebsd12.0/bin/as" AR="/usr/local/powerpc64-unknown-freebsd12.0/bin/ar" LD="/usr/local/powerpc64-unknown-freebsd12.0/bin/ld" LLVM_LINK=""  NM=/usr/local/powerpc64-unknown-freebsd12.0/bin/nm OBJCOPY="/usr/local/powerpc64-unknown-freebsd12.0/bin/objcopy"  RANLIB=/usr/local/powerpc64-unkno
 wn-
>>  freebsd12.0/bin/ranlib STRINGS=/usr/local/bin/powerpc64-unknown-freebsd12.0-strings  SIZE="/usr/local/powerpc64-unknown-freebsd12.0/bin/size"  INSTALL="sh /usr/src/tools/install.sh"  PATH=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/usr/sbin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/usr/bin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/legacy/bin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/sbin:/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin  SYSROOT=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp make  -f Makefile.inc1  BWPHASE=worldtmp  DESTDIR=/usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64/tmp -DBATCH_DELETE_OLD_FILES  delete-ol
 d d

>>  elete-old-libs >/dev/null
>>
>> load: 0.68  cmd: make 62180 [select] 25.15r 0.00u 0.00s 0% 1468k
>> make: Working in: /usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc.powerpc64
>> packet_write_wait: Connection to 192.168.1.165 port 22: Broken pipe
>>
>>
>> (I noticed the long pause and got the ^T in before the panic.)
>>
>> Yet again it is xargs related fork activity that gets the problem (from core.txt.1 ):
>>
>>   561 Thread 100836 (PID=69982: xargs)  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:840
>> . . .
>> * 559 Thread 100811 (PID=62304: xargs)  doadump (textdump=-2122191464) at pcpu.h:230
>>
>> spin lock 0xffffffff81b3cf00 (sched lock 24) held by 0xfffff806aa6d5000 (tid 100836) too long
>> panic: spin lock held too long
>> cpuid = 24
>> time = 1518974055
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00f11304d0
>> vpanic() at vpanic+0x18d/frame 0xfffffe00f1130530
>> panic() at panic+0x43/frame 0xfffffe00f1130590
>> _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x71/frame 0xfffffe00f11305a0
>> thread_lock_flags_() at thread_lock_flags_+0xdb/frame 0xfffffe00f1130610
>> statclock_cnt() at statclock_cnt+0xdc/frame 0xfffffe00f1130650
>> handleevents() at handleevents+0x113/frame 0xfffffe00f11306a0
>> timercb() at timercb+0xa9/frame 0xfffffe00f11306f0
>> lapic_handle_timer() at lapic_handle_timer+0xa7/frame 0xfffffe00f1130730
>> timerint_u() at timerint_u+0x96/frame 0xfffffe00f1130810
>> thread_lock_flags_() at thread_lock_flags_+0xc1/frame 0xfffffe00f1130880
>> fork1() at fork1+0x1b9f/frame 0xfffffe00f1130930
>> sys_vfork() at sys_vfork+0x4c/frame 0xfffffe00f1130980
>> amd64_syscall() at amd64_syscall+0xa48/frame 0xfffffe00f1130ab0
>> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffc5a0
>>
>


===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
In reply to this post by Trond Endrestøl
On 2018-Feb-18, at 11:30 AM, Trond Endrestøl <Trond.Endrestol at fagskolen.gjovik.no> wrote:

> On Sun, 18 Feb 2018 19:08+0100, Mateusz Guzik wrote:
>
>> Can you please bisect this? There is another report stating that r329418
>> works fine.
>
> My problems started yesterday with r329464. I decided to go back to
> r329101 (ZFS BE), update the source tree, move forward to the latest
> revision, and so on. I even emptied /usr/obj and /var/cache/ccache and
> set WITHOUT_SYSTEM_COMPILER=yes in /etc/src.conf to get rid of any
> bias.
>
> I have tried with success r329418, r329419, r329420, and r329422.
>
> I'm now at r329448 and have not seen any spin lock problems so far.

Note: -r329448 was reverted in -r329461 : racy.

> Sooner or later I'll reach r329464 and by then it should be clear
> which revision is the likely culprit.

===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
On 2018-Feb-18, at 1:33 PM, Mateusz Guzik <[hidden email]> wrote:

> On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
> [hidden email]> wrote:
>
>> On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
>>
>>> Note: -r329448 was reverted in -r329461 : racy.
>>
>> True. I got a crash when compiling r329451 while running r329449.
>> I've now booted the r329422 ZFS BE and I'm attempting to build
>> r329529.
>>
>
> Looking around strongly suggests r329448 is the culprit. If you can verify
> 329447 works fine we are mostly done here.
>
> Note the revision got reverted and different variant got in in r329531.
>
> That said, if r329447 works then the issue should be already fixed and in
> particular fresh head should work fine.

My initial problem was with -r329465, which is after -r329461 reverted
-r329488 . Trond reported in one note that he had problems with
-r329464 , also after -r329488 was reverted. Trond has also reported
-r329449 failed.

I did manage to revert to -r329447 earlier and so far the results
suggests that it works.

From this I get that -r329449 is the the one that is common to
all the so--far failing combinations. -r329448 is not common to
all of them.


===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list


On 2018-Feb-18, at 1:46 PM, Mark Millard <[hidden email]> wrote:

> On 2018-Feb-18, at 1:33 PM, Mateusz Guzik <[hidden email]> wrote:
>
>> On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
>> [hidden email]> wrote:
>>
>>> On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
>>>
>>>> Note: -r329448 was reverted in -r329461 : racy.
>>>
>>> True. I got a crash when compiling r329451 while running r329449.
>>> I've now booted the r329422 ZFS BE and I'm attempting to build
>>> r329529.
>>>
>>
>> Looking around strongly suggests r329448 is the culprit. If you can verify
>> 329447 works fine we are mostly done here.
>>
>> Note the revision got reverted and different variant got in in r329531.
>>
>> That said, if r329447 works then the issue should be already fixed and in
>> particular fresh head should work fine.
>
> My initial problem was with -r329465, which is after -r329461 reverted
> -r329488 . Trond reported in one note that he had problems with
> -r329464 , also after -r329488 was reverted. Trond has also reported
> -r329449 failed.

Dumb typos above: I meant -r329448 instead of -r329488 both times.

> I did manage to revert to -r329447 earlier and so far the results
> suggests that it works.
>
> From this I get that -r329449 is the the one that is common to
> all the so--far failing combinations. -r329448 is not common to
> all of them.


===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

Trond Endrestøl
In reply to this post by freebsd-hackers mailing list
On Sun, 18 Feb 2018 22:33+0100, Mateusz Guzik wrote:

> On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
> [hidden email]> wrote:
>
> > On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
> >
> > > Note: -r329448 was reverted in -r329461 : racy.
> >
> > True. I got a crash when compiling r329451 while running r329449.
> > I've now booted the r329422 ZFS BE and I'm attempting to build
> > r329529.
> >
>
> Looking around strongly suggests r329448 is the culprit. If you can verify
> 329447 works fine we are mostly done here.

I noticed no errors in r329447. When r329529 is built and installed,
I'll try to incrementally build and install r329531.

> Note the revision got reverted and different variant got in in r329531.
>
> That said, if r329447 works then the issue should be already fixed and in
> particular fresh head should work fine.

--
Trond.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

Mateusz Guzik
On Sun, Feb 18, 2018 at 11:24 PM, Trond Endrestøl <
[hidden email]> wrote:

> On Sun, 18 Feb 2018 22:33+0100, Mateusz Guzik wrote:
>
> > On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
> > [hidden email]> wrote:
> >
> > > On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
> > >
> > > > Note: -r329448 was reverted in -r329461 : racy.
> > >
> > > True. I got a crash when compiling r329451 while running r329449.
> > > I've now booted the r329422 ZFS BE and I'm attempting to build
> > > r329529.
> > >
> >
> > Looking around strongly suggests r329448 is the culprit. If you can
> verify
> > 329447 works fine we are mostly done here.
>
> I noticed no errors in r329447. When r329529 is built and installed,
> I'll try to incrementally build and install r329531.
>

Can you grab a panicking kernel and apply this:
https://people.freebsd.org/~mjg/wait_unlocked.diff

there may be debug printfs signifying the problem condition was hit,
however the patch should fix the panic

--
Mateusz Guzik <mjguzik gmail.com>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

Mateusz Guzik
 I committed the fix in
https://svnweb.freebsd.org/base?view=revision&revision=329542

i.e. should be stable from this point on.

On Sun, Feb 18, 2018 at 11:55 PM, Mateusz Guzik <[hidden email]> wrote:

> On Sun, Feb 18, 2018 at 11:24 PM, Trond Endrestøl <
> [hidden email]> wrote:
>
>> On Sun, 18 Feb 2018 22:33+0100, Mateusz Guzik wrote:
>>
>> > On Sun, Feb 18, 2018 at 9:38 PM, Trond Endrestøl <
>> > [hidden email]> wrote:
>> >
>> > > On Sun, 18 Feb 2018 11:51-0800, Mark Millard wrote:
>> > >
>> > > > Note: -r329448 was reverted in -r329461 : racy.
>> > >
>> > > True. I got a crash when compiling r329451 while running r329449.
>> > > I've now booted the r329422 ZFS BE and I'm attempting to build
>> > > r329529.
>> > >
>> >
>> > Looking around strongly suggests r329448 is the culprit. If you can
>> verify
>> > 329447 works fine we are mostly done here.
>>
>> I noticed no errors in r329447. When r329529 is built and installed,
>> I'll try to incrementally build and install r329531.
>>
>
> Can you grab a panicking kernel and apply this:
> https://people.freebsd.org/~mjg/wait_unlocked.diff
>
> there may be debug printfs signifying the problem condition was hit,
> however the patch should fix the panic
>
> --
> Mateusz Guzik <mjguzik gmail.com>
>



--
Mateusz Guzik <mjguzik gmail.com>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 head -r329465 (non-debug build, but with symbols): "panic: spin lock held too long" during make check-old, reported during a sys_vfork

freebsd-hackers mailing list
On 2018-Feb-18, at 4:55 PM, Mateusz Guzik <mjguzik at gmail.com> wrote:

> I committed the fix in
> https://svnweb.freebsd.org/base?view=revision&revision=329542
>
> i.e. should be stable from this point on.

Thanks!



===
Mark Millard
marklmi at yahoo.com
( markmi at dsl-only.net is
going away in 2018-Feb, late)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"