FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot

Mark Millard-2
I took my normal GENERIC-NODBG (that includes GENERIC)
and changed INVARIANTS and INVARIANT_SUPPORT to have
"options" status instead of "nooptions" status. The
result boots (so far no counterexamples). (This is
head -r323246 .)

So it appears that one or more INVARIANT tests are
"fixing" the Pine64+ 2GB boot problem. I've no clue
which. But other debug options are not required.

FYI. . .

# more /usr/src/sys/arm64/conf/GENERIC-NODBG                                                                                                                                              
#
# GENERIC -- Custom configuration for the arm64/aarch64
#

include "GENERIC"

ident   GENERIC-NODBG

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

options         ALT_BREAK_TO_DEBUGGER

options         KDB                     # Enable kernel debugger support

# For minimum debugger support (stable branch) use:
#options        KDB_TRACE               # Print a stack trace for a panic
options         DDB                     # Enable the kernel debugger

# Extra stuff:
#options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
#options        BOOTVERBOSE=1
#options        BOOTHOWTO=RB_VERBOSE
#options        KTR
#options        KTR_MASK=KTR_TRAP
##options       KTR_CPUMASK=0xF
#options        KTR_VERBOSE

# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
options         INVARIANTS              # Enable calls of extra sanity checking
options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones
nooptions       BUF_TRACKING
nooptions       FULL_BUF_TRACKING

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot

Mark Millard-2
[Back to nooptions for INVARIANTS and INVARIANT_SUPPORT
but now verbose booting. taskqgroup_adjust_softirq(0)...
is the one to not get a "done." before failure.]

On 2017-Sep-12, at 7:19 PM, Mark Millard <[hidden email]> wrote:

> I took my normal GENERIC-NODBG (that includes GENERIC)
> and changed INVARIANTS and INVARIANT_SUPPORT to have
> "options" status instead of "nooptions" status. The
> result boots (so far no counterexamples). (This is
> head -r323246 .)
>
> So it appears that one or more INVARIANT tests are
> "fixing" the Pine64+ 2GB boot problem. I've no clue
> which. But other debug options are not required.
>
> FYI. . .
>
> # more /usr/src/sys/arm64/conf/GENERIC-NODBG                                                                                                                                              
> #
> # GENERIC -- Custom configuration for the arm64/aarch64
> #
>
> include "GENERIC"
>
> ident   GENERIC-NODBG
>
> makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols
>
> options         ALT_BREAK_TO_DEBUGGER
>
> options         KDB                     # Enable kernel debugger support
>
> # For minimum debugger support (stable branch) use:
> #options        KDB_TRACE               # Print a stack trace for a panic
> options         DDB                     # Enable the kernel debugger
>
> # Extra stuff:
> #options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
> #options        BOOTVERBOSE=1
> #options        BOOTHOWTO=RB_VERBOSE
> #options        KTR
> #options        KTR_MASK=KTR_TRAP
> ##options       KTR_CPUMASK=0xF
> #options        KTR_VERBOSE
>
> # Disable any extra checking for. . .
> nooptions       DEADLKRES               # Enable the deadlock resolver
> options         INVARIANTS              # Enable calls of extra sanity checking
> options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
> nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
> nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
> nooptions       DIAGNOSTIC
> nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones
> nooptions       BUF_TRACKING
> nooptions       FULL_BUF_TRACKING

I've changed to have:

options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
options        BOOTVERBOSE=1
options        BOOTHOWTO=RB_VERBOSE

and:

nooptions         INVARIANTS              # Enable calls of extra sanity checking
nooptions         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS

The tail of the verbose failing boot looks like:

. . .
   vt_upgrade(&vt_consdev)... done.
subsystem b000000
   nfs_rootconf(0)... done.
   fhanew_init(0)... done.
subsystem d000000
   proc0_post(0)... done.
subsystem d800000
   sctp_syscalls_init(0)... done.
   selectinit(0)... done.
subsystem dffff9c
   linker_preload_finish(0)... done.
subsystem e000000
   kick_init(0)... done.
   kstack_cache_init(0)... done.
subsystem e400000
   vm_pageout_init(0)... done.
   $x.1(&page_kp)... done.
subsystem e800000
   $x.1(&vm_kp)... done.
subsystem ea00000
   $x.1(&bufspace_kp)... done.
   $x.1(&buf_kp)... done.
subsystem ec00000
   $x.1(&vnlru_kp)... done.
   $x.1(&up_kp)... done.
subsystem ee00000
   acpi_acad_ac_only(0)... done.
   nfsiod_setup(0)... done.
subsystem f000000
   release_aps(0)... Release APs
APs not started
done.
   tmr_setup_user_access(0)... done.
   intr_irq_shuffle(0)... done.
   tqg_record_smp_started(0)... done.
   netisr_start(0)... done.
   cpuset_init(0)... done.
   taskqgroup_adjust_if_config_tqg(0)... done.
   identify_cpu_sysinit(0)... CPU  0: ARM Cortex-A53 r0p4 affinity:  0
 Instruction Set Attributes 0 = <AES+PMULL,SHA1,SHA2,CRC32>
 Instruction Set Attributes 1 = <0>
         Processor Features 0 = <AdvSIMD,Float,EL3 32,EL2 32,EL1 32,EL0 32>
         Processor Features 1 = <0>
      Memory Model Features 0 = <4k Granule,64k Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA>
      Memory Model Features 1 = <>
             Debug Features 0 = <2 CTX Breakpoints,4 Watchpoints,6 Breakpoints,PMUv3,Debug v8>
             Debug Features 1 = <0>
         Auxiliary Features 0 = <0>
         Auxiliary Features 1 = <0>
CPU  1: (null) (null) r0p0 affinity:  0
CPU  2: (null) (null) r0p0 affinity:  0
CPU  3: (null) (null) r0p0 affinity:  0
done.
   taskqgroup_adjust_softirq(0)...   x0: ffff000000a1c080
  x1: fffffd0001031a80
  x2:                3
[ thread pid 0 tid 100055 ]
Stopped at      thread_lock_flags_+0x298:       ldr     w4, [x3, #156]
db>

taskqgroup_adjust_softirq seems to be from:

/usr/src/sys/kern/subr_gtaskqueue.c :

TASKQGROUP_DEFINE(softirq, mp_ncpus, 1);





===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot

Mark Millard-2
[This time a debug kernel (including witness) and
verbose booting. Also showing what has spin locks
active (none) and what has critical sections
mentioned on the back traces (critical_exit).]

On 2017-Sep-12, at 11:16 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Back to nooptions for INVARIANTS and INVARIANT_SUPPORT
> but now verbose booting. taskqgroup_adjust_softirq(0)...
> is the one to not get a "done." before failure.]
>
> On 2017-Sep-12, at 7:19 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
>> I took my normal GENERIC-NODBG (that includes GENERIC)
>> and changed INVARIANTS and INVARIANT_SUPPORT to have
>> "options" status instead of "nooptions" status. The
>> result boots (so far no counterexamples). (This is
>> head -r323246 .)
>>
>> So it appears that one or more INVARIANT tests are
>> "fixing" the Pine64+ 2GB boot problem. I've no clue
>> which. But other debug options are not required.
>>
>> FYI. . .
>>
>> # more /usr/src/sys/arm64/conf/GENERIC-NODBG                                                                                                                                              
>> #
>> # GENERIC -- Custom configuration for the arm64/aarch64
>> #
>>
>> include "GENERIC"
>>
>> ident   GENERIC-NODBG
>>
>> makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols
>>
>> options         ALT_BREAK_TO_DEBUGGER
>>
>> options         KDB                     # Enable kernel debugger support
>>
>> # For minimum debugger support (stable branch) use:
>> #options        KDB_TRACE               # Print a stack trace for a panic
>> options         DDB                     # Enable the kernel debugger
>>
>> # Extra stuff:
>> #options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
>> #options        BOOTVERBOSE=1
>> #options        BOOTHOWTO=RB_VERBOSE
>> #options        KTR
>> #options        KTR_MASK=KTR_TRAP
>> ##options       KTR_CPUMASK=0xF
>> #options        KTR_VERBOSE
>>
>> # Disable any extra checking for. . .
>> nooptions       DEADLKRES               # Enable the deadlock resolver
>> options         INVARIANTS              # Enable calls of extra sanity checking
>> options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
>> nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
>> nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
>> nooptions       DIAGNOSTIC
>> nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones
>> nooptions       BUF_TRACKING
>> nooptions       FULL_BUF_TRACKING
>
> I've changed to have:
>
> options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
> options        BOOTVERBOSE=1
> options        BOOTHOWTO=RB_VERBOSE
>
> and:
>
> nooptions         INVARIANTS              # Enable calls of extra sanity checking
> nooptions         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
>
> The tail of the verbose failing boot looks like:
>
> . . .
>   vt_upgrade(&vt_consdev)... done.
> subsystem b000000
>   nfs_rootconf(0)... done.
>   fhanew_init(0)... done.
> subsystem d000000
>   proc0_post(0)... done.
> subsystem d800000
>   sctp_syscalls_init(0)... done.
>   selectinit(0)... done.
> subsystem dffff9c
>   linker_preload_finish(0)... done.
> subsystem e000000
>   kick_init(0)... done.
>   kstack_cache_init(0)... done.
> subsystem e400000
>   vm_pageout_init(0)... done.
>   $x.1(&page_kp)... done.
> subsystem e800000
>   $x.1(&vm_kp)... done.
> subsystem ea00000
>   $x.1(&bufspace_kp)... done.
>   $x.1(&buf_kp)... done.
> subsystem ec00000
>   $x.1(&vnlru_kp)... done.
>   $x.1(&up_kp)... done.
> subsystem ee00000
>   acpi_acad_ac_only(0)... done.
>   nfsiod_setup(0)... done.
> subsystem f000000
>   release_aps(0)... Release APs
> APs not started
> done.
>   tmr_setup_user_access(0)... done.
>   intr_irq_shuffle(0)... done.
>   tqg_record_smp_started(0)... done.
>   netisr_start(0)... done.
>   cpuset_init(0)... done.
>   taskqgroup_adjust_if_config_tqg(0)... done.
>   identify_cpu_sysinit(0)... CPU  0: ARM Cortex-A53 r0p4 affinity:  0
> Instruction Set Attributes 0 = <AES+PMULL,SHA1,SHA2,CRC32>
> Instruction Set Attributes 1 = <0>
>         Processor Features 0 = <AdvSIMD,Float,EL3 32,EL2 32,EL1 32,EL0 32>
>         Processor Features 1 = <0>
>      Memory Model Features 0 = <4k Granule,64k Granule,MixedEndian,S/NS Mem,16bit ASID,1TB PA>
>      Memory Model Features 1 = <>
>             Debug Features 0 = <2 CTX Breakpoints,4 Watchpoints,6 Breakpoints,PMUv3,Debug v8>
>             Debug Features 1 = <0>
>         Auxiliary Features 0 = <0>
>         Auxiliary Features 1 = <0>
> CPU  1: (null) (null) r0p0 affinity:  0
> CPU  2: (null) (null) r0p0 affinity:  0
> CPU  3: (null) (null) r0p0 affinity:  0
> done.
>   taskqgroup_adjust_softirq(0)...   x0: ffff000000a1c080
>  x1: fffffd0001031a80
>  x2:                3
> [ thread pid 0 tid 100055 ]
> Stopped at      thread_lock_flags_+0x298:       ldr     w4, [x3, #156]
> db>
>
> taskqgroup_adjust_softirq seems to be from:
>
> /usr/src/sys/kern/subr_gtaskqueue.c :
>
> TASKQGROUP_DEFINE(softirq, mp_ncpus, 1);

[The above was a non-debug kernel with
verbose messages.]

So a debug kernel with verbose boot
messages:

CPU  1: (null) (null) r0p0 affinity:  0
CPU  2: (null) (null) r0p0 affinity:  0
CPU  3: (null) (null) r0p0 affinity:  0
done.
   taskqgroup_adjust_softirq(0)... panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
cpuid = 0
time = 13


Thus the non-debug kernel boot-failures stop during
"taskqgroup_adjust_softirq(0)..." and that is also
what the debug kernel reports via witness (or
invariant testing if witness is disabled). Witness
does catch the problem somewhat earlier than
invariant in the code sequence (when the race happens).
Without invariants (and without witness) the failure
seems to happen reliably.

For this witness context. . .

db> show allpcpu
Current CPU: 0

cpuid        = 0
dynamic pcpu = 0x84af00
curthread    = 0xfffffd0001415a80: pid 0 tid 100058 "softirq_1"
curpcb       = 0xffff000069850cb0
fpcurthread  = none
idlethread   = 0xfffffd00005de000: tid 100003 "idle: cpu0"
spin locks held:

cpuid        = 1
dynamic pcpu = 0x81324f00
curthread    = none
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffffd00005dda80: tid 100004 "idle: cpu1"
spin locks held:

cpuid        = 2
dynamic pcpu = 0x81325f00
curthread    = none
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffffd00005dd540: tid 100005 "idle: cpu2"
spin locks held:

cpuid        = 3
dynamic pcpu = 0x81326f00
curthread    = none
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffffd00005dd000: tid 100006 "idle: cpu3"
spin locks held:


So no spin locks held. As for critical sections. . .


db> show allchains
. . . (just ones mentioning "on a run queue"). . .
chain 20:
 thread 100014 (pid 12, swi4: clock (0)) blocked on lock 0xffff000000c5d8e0 (sleep mutex) "Giant"
 thread 100000 (pid 0, swapper) is on a run queue
chain 21:
 thread 100002 (pid 1, kernel) blocked on lock 0xffff000000c5d8e0 (sleep mutex) "Giant"
 thread 100000 (pid 0, swapper) is on a run queue
. . .

db> thread 100000
[ thread pid 0 tid 100000 ]
0
db> bt
Tracing pid 0 tid 100000 td 0xffff000000acd580
sched_switch() at mi_switch+0x1b8
         pc = 0xffff00000033f494  lr = 0xffff000000321754
         sp = 0xffff0000000109f0  fp = 0xffff000000010a10

mi_switch() at critical_exit+0x84
         pc = 0xffff000000321754  lr = 0xffff00000031e72c
         sp = 0xffff000000010a20  fp = 0xffff000000010a30

critical_exit() at spinlock_exit+0x10
         pc = 0xffff00000031e72c  lr = 0xffff0000005f83b4
         sp = 0xffff000000010a40  fp = 0xffff000000010a50

spinlock_exit() at wakeup_one+0x30
         pc = 0xffff0000005f83b4  lr = 0xffff00000032157c
         sp = 0xffff000000010a60  fp = 0xffff000000010a70

wakeup_one() at grouptaskqueue_enqueue+0xcc
         pc = 0xffff00000032157c  lr = 0xffff0000003533ec
         sp = 0xffff000000010a80  fp = 0xffff000000010aa0
       
grouptaskqueue_enqueue() at taskqgroup_adjust+0x83c
         pc = 0xffff0000003533ec  lr = 0xffff00000035483c
         sp = 0xffff000000010ab0  fp = 0xffff000000010b40

taskqgroup_adjust() at mi_startup+0x254
         pc = 0xffff00000035483c  lr = 0xffff0000002b5704
         sp = 0xffff000000010b50  fp = 0xffff000000010bb0

mi_startup() at virtdone+0x54
         pc = 0xffff0000002b5704  lr = 0xffff000000001084
         sp = 0xffff000000010bc0  fp = 0x0000000000000000

From:

db> show threads
. . . (just ones mentioning critical_exit). . .
  100027 (0xfffffd000062e000) (stack 0xffff00006a58a000)    100033 (0xfffffd0000796000) (stack 0xffff00006a5a9000)    100034 (0xfffffd0000795a80) (stack 0xffff00006a5b6000)    100003 (0xfffffd00005de000) (stack 0xffff000081baa000)  sched_switch() at mi_switch+0x1b8
         pc = 0xffff00000033f494  lr = 0xffff000000321754
         sp = 0xffff000081bada20  fp = 0xffff000081bada40

mi_switch() at critical_exit+0x84
         pc = 0xffff000000321754  lr = 0xffff00000031e72c
         sp = 0xffff000081bada50  fp = 0xffff000081bada60

critical_exit() at cpu_idle+0x3c
         pc = 0xffff00000031e72c  lr = 0xffff0000005f8308
         sp = 0xffff000081bada70  fp = 0xffff000081bada80

cpu_idle() at sched_idletd+0xf4
         pc = 0xffff0000005f8308  lr = 0xffff000000341b84
         sp = 0xffff000081bada90  fp = 0xffff000081badb50

sched_idletd() at fork_exit+0x7c
         pc = 0xffff000000341b84  lr = 0xffff0000002dbe74
         sp = 0xffff000081badb60  fp = 0xffff000081badb90

fork_exit() at fork_trampoline+0x10
         pc = 0xffff0000002dbe74  lr = 0xffff000000608664
         sp = 0xffff000081badba0  fp = 0x0000000000000000
. . .

I did not find any other references to
"critical". Only swapper listed on the run
queue as far as critical_exit goes. The
other critical_exit's were from cpu_idle.

It appears to me as fairly likely that what
witness and invariant reports sometimes is
the same thing that is involved for the
non-debug kernels run into (more) reliably:
non-debug is likely hanging on the lock
attempt while (it appears that) a critical
section is still active.

As near as I can tell some invariant logic
makes the critical section vs. blockable lock
conflict far less likely to happen: some form
of race. Thus the invariant-only and full-debug
kernels usually booting, but not always booting.

(But I make no claim to be expert in these areas.)

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: nondebug+INVARIANTS+INVARIANT_SUPPORT sufficient to boot

Mark Millard-2
[I've traced the failure back to the bad pointer
value that is in place when it was put to use. I
omit the prior details of earlier explorations
that got me into this area.]

Summary of the following investigations:

When the witness or invariant failure during:

taskqgroup_adjust_softirq(0)...

happens it traces back to the condition:

        pcpu_find(cpu)->pc_curthread == NULL

in the code:

        ctd = pcpu_find(cpu)->pc_curthread;

for cpu == 1 in tdq_notify (but inlined). It then
attempts to evaluate:

ctd->td_priority

which gets a data_abort but it is during when
blocked_lock is the container lock for the
thread.

In the witness included case this leads to:

panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) pmap @ /usr/src/sys/arm64/arm64/pmap.c:4710
cpuid = 0
time = 13

but that is a later consequence of the earlier
problem.

I'm not sure why pcpu_find(cpu)->pc_curthread
is still NULL but since the behavior is intermittent
for debug kernel builds it suggests a race for an
update that was initiated but not always finished
in time.

(I've had occasions of hours of reboots to try
to get a failure but mostly only needing a few.
I've not run into a long sequence of failures to
boot for a debug kernel but have had some
back-to-back ones.)


Supporting detail that lead to the above:

int
pmap_fault(pmap_t pmap, uint64_t esr, uint64_t far)

The far (fault address register) argument to pmap_fault is the
rd one (x2 below):

ffff000000606954 <pmap_fault> stp       x22, x21, [sp, #-48]!
ffff000000606958 <pmap_fault+0x4> stp   x20, x19, [sp, #16]
ffff00000060695c <pmap_fault+0x8> stp   x29, x30, [sp, #32]
ffff000000606960 <pmap_fault+0xc> add   x29, sp, #0x20
ffff000000606964 <pmap_fault+0x10> mov  x20, x2
ffff000000606968 <pmap_fault+0x14> mov  x22, x1
ffff00000060696c <pmap_fault+0x18> mov  x21, x0

For the failing call sequence far ends up stored on the stack
via the x20 save-to-stack in:

ffff0000002f8c0c <__mtx_lock_flags> stp x24, x23, [sp, #-64]!
ffff0000002f8c10 <__mtx_lock_flags+0x4> stp     x22, x21, [sp, #16]
ffff0000002f8c14 <__mtx_lock_flags+0x8> stp     x20, x19, [sp, #32]
ffff0000002f8c18 <__mtx_lock_flags+0xc> stp     x29, x30, [sp, #48]
ffff0000002f8c1c <__mtx_lock_flags+0x10> add    x29, sp, #0x30

Stack segment with a little context:

0xffff000069850470:     ffff0000698504b0                ffff0000002f8b80
0xffff000069850480:     ffff000000c6a528                0
0xffff000069850490:     96000004                        ffff000000c6a658
0xffff0000698504a0:     37e                             ffff000000c6a670
0xffff0000698504b0:     ffff0000698504e0                ffff000000606998

So it appears:

pmap_fault`esr == 0x96000004
pmap_fault`pmap == 0xffff000000c6a658 (vmspace0+0x130)
pmap_fault`far == 0x37e

I'll note that 0x37e = 894 so it matches up with
x8 == 0x0 for the likes of:

elr         0xffff00000033f0dc  sched_switch+0x2bc
. . .
ssched_switch+0x2b8:     ldrb    w9, [x8, #894]

matching:

db> show reg
. . .
x8                           0
. . .

So apparently sched_switch tried to access 0x37e

db> x/gx 0xffff000000606998  
pmap_fault+0x44:        f100111f927e0ec8

Part of the back trace is (for the example
debug kernel):

kassert_panic() at witness_checkorder+0x160
         pc = 0xffff0000003174e4  lr = 0xffff000000374990
         sp = 0xffff0000698503f0  fp = 0xffff000069850470

witness_checkorder() at __mtx_lock_flags+0xa8
         pc = 0xffff000000374990  lr = 0xffff0000002f8b7c
         sp = 0xffff000069850480  fp = 0xffff0000698504b0

__mtx_lock_flags() at pmap_fault+0x40
         pc = 0xffff0000002f8b7c  lr = 0xffff000000606994
         sp = 0xffff0000698504c0  fp = 0xffff0000698504e0
       
pmap_fault() at data_abort+0xb8
         pc = 0xffff000000606994  lr = 0xffff000000608a9c
         sp = 0xffff0000698504f0  fp = 0xffff0000698505a0

data_abort() at do_el1h_sync+0xfc
         pc = 0xffff000000608a9c  lr = 0xffff0000006088f0
         sp = 0xffff0000698505b0  fp = 0xffff0000698505e0

do_el1h_sync() at handle_el1h_sync+0x74
         pc = 0xffff0000006088f0  lr = 0xffff0000005f1874
         sp = 0xffff0000698505f0  fp = 0xffff000069850700

handle_el1h_sync() at sched_switch+0x2a8
         pc = 0xffff0000005f1874  lr = 0xffff00000033f0c8
         sp = 0xffff000069850710  fp = 0xffff0000698507f0

sched_switch() at mi_switch+0x1b8
         pc = 0xffff00000033f0c8  lr = 0xffff00000032161c
         sp = 0xffff000069850800  fp = 0xffff000069850820

mi_switch() at taskqgroup_binder+0x7c
         pc = 0xffff00000032161c  lr = 0xffff00000035510c
         sp = 0xffff000069850830  fp = 0xffff000069850860

taskqgroup_binder() at gtaskqueue_run_locked+0x104
         pc = 0xffff00000035510c  lr = 0xffff000000354f74
         sp = 0xffff000069850870  fp = 0xffff0000698508e0
       
gtaskqueue_run_locked() at gtaskqueue_thread_loop+0x9c
         pc = 0xffff000000354f74  lr = 0xffff000000354d10
         sp = 0xffff0000698508f0  fp = 0xffff000069850910

gtaskqueue_thread_loop() at fork_exit+0x7c
         pc = 0xffff000000354d10  lr = 0xffff0000002dbd3c
         sp = 0xffff000069850920  fp = 0xffff000069850950

fork_exit() at fork_trampoline+0x10
         pc = 0xffff0000002dbd3c  lr = 0xffff000000608664
         sp = 0xffff000069850960  fp = 0x0000000000000000

Note:

ffff00000033f0bc <sched_switch+0x29c> ldr       w0, [x19, #1292]
ffff00000033f0c0 <sched_switch+0x2a0> ldrb      w27, [x19, #894]
ffff00000033f0c4 <sched_switch+0x2a4> str       w0, [sp, #12]
ffff00000033f0c8 <sched_switch+0x2a8> bl        ffff000000359708 <pcpu_find>
ffff00000033f0cc <sched_switch+0x2ac> ldr       x8, [x0]
ffff00000033f0d0 <sched_switch+0x2b0> mov       w11, w27
ffff00000033f0d4 <sched_switch+0x2b4> adrp      x27, ffff000000c85000 <dpcpu+0x158>
ffff00000033f0d8 <sched_switch+0x2b8> ldrb      w9, [x8, #894]
ffff00000033f0dc <sched_switch+0x2bc> cmp       w11, w9
ffff00000033f0e0 <sched_switch+0x2c0> b.cs      ffff00000033f150 <sched_switch+0x330>  // b.hs, b.nlast

This is code for the later part of what is
shown below:

static void
tdq_notify(struct tdq *tdq, struct thread *td)
{
        struct thread *ctd;
        int pri;
        int cpu;

        if (tdq->tdq_ipipending)
                return;
        cpu = td_get_sched(td)->ts_cpu;
        pri = td->td_priority;
        ctd = pcpu_find(cpu)->pc_curthread;
        if (!sched_shouldpreempt(pri, ctd->td_priority, 1))
                return;
. . .
}

(Where: sched_shouldpreempt has been inlined and
some of it is interlaced.)

The failing [x8, #894] is the attempt to access: ctd->td_priority

In other words: ctd == NULL resulted from the pcpu_find
(i.e., x8 == 0 ). As for how it got to be zero:

db> show reg
spsr        0x9600000440000085
x0          0xffff000000ac1000  __pcpu+0x200
. . .

db> x/gx cpuid_to_pcpu,32
cpuid_to_pcpu:  ffff000000ac0e00                ffff000000ac1000
cpuid_to_pcpu+0x10:     ffff000000ac1200                ffff000000ac1400
cpuid_to_pcpu+0x20:     0                               0
. . .

(So cpu == 1 .)

db> x/gx 0xffff000000ac1000,8
__pcpu+0x200:   0                               fffffd00005dda80
__pcpu+0x210:   0                               0
__pcpu+0x220:   0                               0
__pcpu+0x230:   100000000                       ffff000000ac1200

Thus it seems that at the time for cpu==1 :

        pcpu_find(cpu)->pc_curthread == NULL

(at __pcpu+0x200).

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only, possible missing atomic_load_acq_int's?

Mark Millard-2
A new finding:

When verbose boot messages are enabled
there is an earlier contrast between when
booting works overall vs. when it later
fails:

When it works:

subsystem f000000
   release_aps(0)... Release APs
done.

When it fails:

subsystem f000000
   release_aps(0)... Release APs
APs not started
done.

And it well explains why ->pc_curthread
ends up NULL for secondaries (in particular
cpu == 1), init_secondary had never executed
the assignments show below:

        while (!aps_ready)
                __asm __volatile("wfe");

        /* Initialize curthread */
        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
        pcpup->pc_curthread = pcpup->pc_idlethread;
        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;

The subsystem messages are from:

static void
release_aps(void *dummy __unused)
{      
        int i;
 
        /* Only release CPUs if they exist */
        if (mp_ncpus == 1)
                return;

        intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL);
        intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL);
        intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, NULL);
        intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL);
        intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL);
        intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, NULL);

        atomic_store_rel_int(&aps_ready, 1);
        /* Wake up the other CPUs */
        __asm __volatile("sev");

        printf("Release APs\n");

        for (i = 0; i < 2000; i++) {
                if (smp_started)
                        return;
                DELAY(1000);
        }
       
        printf("APs not started\n");
}      
SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL);


init_secondary has an example or two of not using
atomic_load_acq_int when atomic_store_rel_int is in
use. One is:

        while (!aps_ready)
                __asm __volatile("wfe");

        /* Initialize curthread */
        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
        pcpup->pc_curthread = pcpup->pc_idlethread;
        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;

where aps_ready was declared via:

/* Set to 1 once we're ready to let the APs out of the pen. */
volatile int aps_ready = 0;

where release_aps has the use of atomic_store_rel_int:

        atomic_store_rel_int(&aps_ready, 1);
        /* Wake up the other CPUs */
        __asm __volatile("sev");

There is also in init_secondary:

        atomic_add_rel_32(&smp_cpus, 1);

        if (smp_cpus == mp_ncpus) {
                /* enable IPI's, tlb shootdown, freezes etc */
                atomic_store_rel_int(&smp_started, 1);
        }

where smp_cpus is accessed without being explicitly
atomic. mp_ncpus seems to have no atomic use at all.

Where:

/usr/src/sys/sys/smp.h:extern int smp_cpus;
/usr/src/sys/kern/subr_smp.c:int smp_cpus = 1;  /* how many cpu's running */

So no "volatile", unlike the earlier example.

/usr/src/sys/kern/kern_umtx.c:          if (smp_cpus > 1) {
/usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0,

/usr/src/sys/sys/smp.h:extern int mp_ncpus;
/usr/src/sys/kern/subr_smp.c:int mp_ncpus;


The smp_started is not explicitly accessed as atomic
in release_aps but in init_secondary has its update
to 1 via:

        mtx_lock_spin(&ap_boot_mtx);

        atomic_add_rel_32(&smp_cpus, 1);

        if (smp_cpus == mp_ncpus) {
                /* enable IPI's, tlb shootdown, freezes etc */
                atomic_store_rel_int(&smp_started, 1);
        }

        mtx_unlock_spin(&ap_boot_mtx);

where:

/usr/src/sys/sys/smp.h:extern volatile int smp_started;
/usr/src/sys/kern/subr_smp.c:volatile int smp_started;

("volatile" again for this context.)

I'll also note that for the sparc64 architecture
there is some code like:

     if (__predict_false(atomic_load_acq_int(&smp_started) == 0))

that is explicitly matched to the atomic_store_rel_int
in its mp_machdep.c .

I do not have enough background aarch64 knowledge to
know if it is provable that atomic_load_acq_int is not
needed in some of these cases.

But getting "APs not started" at least sometimes
suggests an intermittent failure of the code as
it is.


Another difference is lack of explicit initialization
of smp_started but explicit initialization of aps_ready
and smp_cpus .



I have no clue if the boot sequence is supposed to
handle "APs not started" by reverting to not being
a symmetric multiprocessing boot or some other
specific way instead of trying to avoiding use of
what was not initialized by:

        pcpup->pc_curthread = pcpup->pc_idlethread;
        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;

in init_secondary.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only, possible missing atomic_load_acq_int's?

Mark Millard-2
[Adding a couple of numbers that help
interpret what I found as far as what
specifically did not work as expected.]

On 2017-Sep-16, at 3:17 PM, Mark Millard <[hidden email]> wrote:

> A new finding:
>
> When verbose boot messages are enabled
> there is an earlier contrast between when
> booting works overall vs. when it later
> fails:
>
> When it works:
>
> subsystem f000000
>   release_aps(0)... Release APs
> done.
>
> When it fails:
>
> subsystem f000000
>   release_aps(0)... Release APs
> APs not started
> done.
>
> And it well explains why ->pc_curthread
> ends up NULL for secondaries (in particular
> cpu == 1), init_secondary had never executed
> the assignments show below:
>
>        while (!aps_ready)
>                __asm __volatile("wfe");
>
>        /* Initialize curthread */
>        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>        pcpup->pc_curthread = pcpup->pc_idlethread;
>        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>
> The subsystem messages are from:
>
> static void
> release_aps(void *dummy __unused)
> {      
>        int i;
>
>        /* Only release CPUs if they exist */
>        if (mp_ncpus == 1)
>                return;
>
>        intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL);
>        intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL);
>        intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, NULL);
>        intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL);
>        intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL);
>        intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, NULL);
>
>        atomic_store_rel_int(&aps_ready, 1);
>        /* Wake up the other CPUs */
>        __asm __volatile("sev");
>
>        printf("Release APs\n");
>
>        for (i = 0; i < 2000; i++) {
>                if (smp_started)
>                        return;
>                DELAY(1000);
>        }
>
>        printf("APs not started\n");
> }      
> SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL);
>
>
> init_secondary has an example or two of not using
> atomic_load_acq_int when atomic_store_rel_int is in
> use. One is:
>
>        while (!aps_ready)
>                __asm __volatile("wfe");
>
>        /* Initialize curthread */
>        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>        pcpup->pc_curthread = pcpup->pc_idlethread;
>        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>
> where aps_ready was declared via:
>
> /* Set to 1 once we're ready to let the APs out of the pen. */
> volatile int aps_ready = 0;
>
> where release_aps has the use of atomic_store_rel_int:
>
>        atomic_store_rel_int(&aps_ready, 1);
>        /* Wake up the other CPUs */
>        __asm __volatile("sev");
>
> There is also in init_secondary:
>
>        atomic_add_rel_32(&smp_cpus, 1);
>
>        if (smp_cpus == mp_ncpus) {
>                /* enable IPI's, tlb shootdown, freezes etc */
>                atomic_store_rel_int(&smp_started, 1);
>        }
>
> where smp_cpus is accessed without being explicitly
> atomic. mp_ncpus seems to have no atomic use at all.
>
> Where:
>
> /usr/src/sys/sys/smp.h:extern int smp_cpus;
> /usr/src/sys/kern/subr_smp.c:int smp_cpus = 1;  /* how many cpu's running */
>
> So no "volatile", unlike the earlier example.
>
> /usr/src/sys/kern/kern_umtx.c:          if (smp_cpus > 1) {
> /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0,
>
> /usr/src/sys/sys/smp.h:extern int mp_ncpus;
> /usr/src/sys/kern/subr_smp.c:int mp_ncpus;
>
>
> The smp_started is not explicitly accessed as atomic
> in release_aps but in init_secondary has its update
> to 1 via:
>
>        mtx_lock_spin(&ap_boot_mtx);
>
>        atomic_add_rel_32(&smp_cpus, 1);
>
>        if (smp_cpus == mp_ncpus) {
>                /* enable IPI's, tlb shootdown, freezes etc */
>                atomic_store_rel_int(&smp_started, 1);
>        }
>
>        mtx_unlock_spin(&ap_boot_mtx);
>
> where:
>
> /usr/src/sys/sys/smp.h:extern volatile int smp_started;
> /usr/src/sys/kern/subr_smp.c:volatile int smp_started;
>
> ("volatile" again for this context.)
>
> I'll also note that for the sparc64 architecture
> there is some code like:
>
>     if (__predict_false(atomic_load_acq_int(&smp_started) == 0))
>
> that is explicitly matched to the atomic_store_rel_int
> in its mp_machdep.c .
>
> I do not have enough background aarch64 knowledge to
> know if it is provable that atomic_load_acq_int is not
> needed in some of these cases.
>
> But getting "APs not started" at least sometimes
> suggests an intermittent failure of the code as
> it is.
>
>
> Another difference is lack of explicit initialization
> of smp_started but explicit initialization of aps_ready
> and smp_cpus .
>
>
>
> I have no clue if the boot sequence is supposed to
> handle "APs not started" by reverting to not being
> a symmetric multiprocessing boot or some other
> specific way instead of trying to avoiding use of
> what was not initialized by:
>
>        pcpup->pc_curthread = pcpup->pc_idlethread;
>        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>
> in init_secondary.


Example mp_ncpus and smp_cpus figures from a
failed Pine64+ 2GB boot:

db> print/x *smp_cpus
       1
db> print/x *mp_ncpus
138800000004

But that should be a 4 byte width. Showing
some context for reference:

db> x/bx mp_ncpus-4,4
rebooting:      0   0   0   0
db> x/bx mp_ncpus,4
mp_ncpus:       4   0   0   0
db> x/bx mp_ncpus+4,4
scsi_delay:     88  13  0   0

For completeness:

db> x/bx smp_cpus-4,4
sysctl___kern_smp_disabled+0x5c:        0   0   0   0
db> x/bx smp_cpus,4
smp_cpus:       1   0   0   0
db> x/bx smp_cpus+4,4
smp_cpus+0x4:   0   0   0   0

So smp_cpus was not incremented in memory. This
goes along with no occurances of:

       pcpup->pc_curthread = pcpup->pc_idlethread;
       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;

updates happening in init_secondary:

        /* Spin until the BSP releases the APs */
        while (!aps_ready)
                __asm __volatile("wfe");
       
        /* Initialize curthread */
        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
        pcpup->pc_curthread = pcpup->pc_idlethread;
        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
. . .
        mtx_lock_spin(&ap_boot_mtx);

        atomic_add_rel_32(&smp_cpus, 1);

        if (smp_cpus == mp_ncpus) {
                /* enable IPI's, tlb shootdown, freezes etc */
                atomic_store_rel_int(&smp_started, 1);
        }

        mtx_unlock_spin(&ap_boot_mtx);

Which seems to imply that the aps_ready
update:

        atomic_store_rel_int(&aps_ready, 1);
        /* Wake up the other CPUs */
        __asm __volatile("sev");

in release_aps was not seen in the:

        while (!aps_ready)
                __asm __volatile("wfe");

in init_secondary.

My guess is that "while (!aps_ready)" needs
to be explicit about its atomic status.
aps_ready is already volatile but apparently
that is not enough for this context to be
reliable.

The other potential needs for explicit atomics
are for later execution but may be required
overall as well.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only, possible missing atomic_load_acq_int's?

Mark Millard-2
[As the following does not flow well from the previous
message and stands somewhat on its own in some respects:
I top post this.]

ffff0000005fb14c <release_aps+0xb4> adrp        x8, ffff000000aaa000 <tmpbuffer+0x8a0>
ffff0000005fb150 <release_aps+0xb8> add x8, x8, #0x778
ffff0000005fb154 <release_aps+0xbc> adrp        x0, ffff0000006c9000 <digits+0x1e74e>
ffff0000005fb158 <release_aps+0xc0> add x0, x0, #0xb8a
ffff0000005fb15c <release_aps+0xc4> stlr        w20, [x8]
ffff0000005fb160 <release_aps+0xc8> sev

I ran into the following mention of SEV and making sure
it is appropriately delayed:

> The mechanism that signals an event to other PEs is IMPLEMENTATION DEFINED. The PE is not required to guarantee the ordering of this event with respect to the completion of memory accesses by instructions before the SEV instruction. Therefore, ARM recommends that software includes a DSB instruction before any SEV instruction.
>
> The SEVL instruction appears to execute in program order relative to any subsequent WFE instruction executed on the same PE, without the need for any explicit insertion of barrier instructions.

===
Mark Millard
markmi at dsl-only.net

On 2017-Sep-16, at 4:08 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Adding a couple of numbers that help
> interpret what I found as far as what
> specifically did not work as expected.]
>
> On 2017-Sep-16, at 3:17 PM, Mark Millard <[hidden email]> wrote:
>
>> A new finding:
>>
>> When verbose boot messages are enabled
>> there is an earlier contrast between when
>> booting works overall vs. when it later
>> fails:
>>
>> When it works:
>>
>> subsystem f000000
>>  release_aps(0)... Release APs
>> done.
>>
>> When it fails:
>>
>> subsystem f000000
>>  release_aps(0)... Release APs
>> APs not started
>> done.
>>
>> And it well explains why ->pc_curthread
>> ends up NULL for secondaries (in particular
>> cpu == 1), init_secondary had never executed
>> the assignments show below:
>>
>>       while (!aps_ready)
>>               __asm __volatile("wfe");
>>
>>       /* Initialize curthread */
>>       KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>>       pcpup->pc_curthread = pcpup->pc_idlethread;
>>       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> The subsystem messages are from:
>>
>> static void
>> release_aps(void *dummy __unused)
>> {      
>>       int i;
>>
>>       /* Only release CPUs if they exist */
>>       if (mp_ncpus == 1)
>>               return;
>>
>>       intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL);
>>       intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL);
>>       intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, NULL);
>>       intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL);
>>       intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL);
>>       intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, NULL);
>>
>>       atomic_store_rel_int(&aps_ready, 1);
>>       /* Wake up the other CPUs */
>>       __asm __volatile("sev");
>>
>>       printf("Release APs\n");
>>
>>       for (i = 0; i < 2000; i++) {
>>               if (smp_started)
>>                       return;
>>               DELAY(1000);
>>       }
>>
>>       printf("APs not started\n");
>> }      
>> SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL);
>>
>>
>> init_secondary has an example or two of not using
>> atomic_load_acq_int when atomic_store_rel_int is in
>> use. One is:
>>
>>       while (!aps_ready)
>>               __asm __volatile("wfe");
>>
>>       /* Initialize curthread */
>>       KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>>       pcpup->pc_curthread = pcpup->pc_idlethread;
>>       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> where aps_ready was declared via:
>>
>> /* Set to 1 once we're ready to let the APs out of the pen. */
>> volatile int aps_ready = 0;
>>
>> where release_aps has the use of atomic_store_rel_int:
>>
>>       atomic_store_rel_int(&aps_ready, 1);
>>       /* Wake up the other CPUs */
>>       __asm __volatile("sev");
>>
>> There is also in init_secondary:
>>
>>       atomic_add_rel_32(&smp_cpus, 1);
>>
>>       if (smp_cpus == mp_ncpus) {
>>               /* enable IPI's, tlb shootdown, freezes etc */
>>               atomic_store_rel_int(&smp_started, 1);
>>       }
>>
>> where smp_cpus is accessed without being explicitly
>> atomic. mp_ncpus seems to have no atomic use at all.
>>
>> Where:
>>
>> /usr/src/sys/sys/smp.h:extern int smp_cpus;
>> /usr/src/sys/kern/subr_smp.c:int smp_cpus = 1;  /* how many cpu's running */
>>
>> So no "volatile", unlike the earlier example.
>>
>> /usr/src/sys/kern/kern_umtx.c:          if (smp_cpus > 1) {
>> /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0,
>>
>> /usr/src/sys/sys/smp.h:extern int mp_ncpus;
>> /usr/src/sys/kern/subr_smp.c:int mp_ncpus;
>>
>>
>> The smp_started is not explicitly accessed as atomic
>> in release_aps but in init_secondary has its update
>> to 1 via:
>>
>>       mtx_lock_spin(&ap_boot_mtx);
>>
>>       atomic_add_rel_32(&smp_cpus, 1);
>>
>>       if (smp_cpus == mp_ncpus) {
>>               /* enable IPI's, tlb shootdown, freezes etc */
>>               atomic_store_rel_int(&smp_started, 1);
>>       }
>>
>>       mtx_unlock_spin(&ap_boot_mtx);
>>
>> where:
>>
>> /usr/src/sys/sys/smp.h:extern volatile int smp_started;
>> /usr/src/sys/kern/subr_smp.c:volatile int smp_started;
>>
>> ("volatile" again for this context.)
>>
>> I'll also note that for the sparc64 architecture
>> there is some code like:
>>
>>    if (__predict_false(atomic_load_acq_int(&smp_started) == 0))
>>
>> that is explicitly matched to the atomic_store_rel_int
>> in its mp_machdep.c .
>>
>> I do not have enough background aarch64 knowledge to
>> know if it is provable that atomic_load_acq_int is not
>> needed in some of these cases.
>>
>> But getting "APs not started" at least sometimes
>> suggests an intermittent failure of the code as
>> it is.
>>
>>
>> Another difference is lack of explicit initialization
>> of smp_started but explicit initialization of aps_ready
>> and smp_cpus .
>>
>>
>>
>> I have no clue if the boot sequence is supposed to
>> handle "APs not started" by reverting to not being
>> a symmetric multiprocessing boot or some other
>> specific way instead of trying to avoiding use of
>> what was not initialized by:
>>
>>       pcpup->pc_curthread = pcpup->pc_idlethread;
>>       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> in init_secondary.
>
>
> Example mp_ncpus and smp_cpus figures from a
> failed Pine64+ 2GB boot:
>
> db> print/x *smp_cpus
>       1
> db> print/x *mp_ncpus
> 138800000004
>
> But that should be a 4 byte width. Showing
> some context for reference:
>
> db> x/bx mp_ncpus-4,4
> rebooting:      0   0   0   0
> db> x/bx mp_ncpus,4
> mp_ncpus:       4   0   0   0
> db> x/bx mp_ncpus+4,4
> scsi_delay:     88  13  0   0
>
> For completeness:
>
> db> x/bx smp_cpus-4,4
> sysctl___kern_smp_disabled+0x5c:        0   0   0   0
> db> x/bx smp_cpus,4
> smp_cpus:       1   0   0   0
> db> x/bx smp_cpus+4,4
> smp_cpus+0x4:   0   0   0   0
>
> So smp_cpus was not incremented in memory. This
> goes along with no occurances of:
>
>       pcpup->pc_curthread = pcpup->pc_idlethread;
>       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>
> updates happening in init_secondary:
>
>        /* Spin until the BSP releases the APs */
>        while (!aps_ready)
>                __asm __volatile("wfe");
>
>        /* Initialize curthread */
>        KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>        pcpup->pc_curthread = pcpup->pc_idlethread;
>        pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
> . . .
>        mtx_lock_spin(&ap_boot_mtx);
>
>        atomic_add_rel_32(&smp_cpus, 1);
>
>        if (smp_cpus == mp_ncpus) {
>                /* enable IPI's, tlb shootdown, freezes etc */
>                atomic_store_rel_int(&smp_started, 1);
>        }
>
>        mtx_unlock_spin(&ap_boot_mtx);
>
> Which seems to imply that the aps_ready
> update:
>
>        atomic_store_rel_int(&aps_ready, 1);
>        /* Wake up the other CPUs */
>        __asm __volatile("sev");
>
> in release_aps was not seen in the:
>
>        while (!aps_ready)
>                __asm __volatile("wfe");
>
> in init_secondary.
>
> My guess is that "while (!aps_ready)" needs
> to be explicit about its atomic status.
> aps_ready is already volatile but apparently
> that is not enough for this context to be
> reliable.
>
> The other potential needs for explicit atomics
> are for later execution but may be required
> overall as well.


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

SOLVED (patch): FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only: solved with patch

Mark Millard-2
On 2017-Sep-16, at 7:21 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [As the following does not flow well from the previous
> message and stands somewhat on its own in some respects:
> I top post this.]
>
> ffff0000005fb14c <release_aps+0xb4> adrp        x8, ffff000000aaa000 <tmpbuffer+0x8a0>
> ffff0000005fb150 <release_aps+0xb8> add x8, x8, #0x778
> ffff0000005fb154 <release_aps+0xbc> adrp        x0, ffff0000006c9000 <digits+0x1e74e>
> ffff0000005fb158 <release_aps+0xc0> add x0, x0, #0xb8a
> ffff0000005fb15c <release_aps+0xc4> stlr        w20, [x8]
> ffff0000005fb160 <release_aps+0xc8> sev
>
> I ran into the following mention of SEV and making sure
> it is appropriately delayed:
>
>> The mechanism that signals an event to other PEs is IMPLEMENTATION DEFINED. The PE is not required to guarantee the ordering of this event with respect to the completion of memory accesses by instructions before the SEV instruction. Therefore, ARM recommends that software includes a DSB instruction before any SEV instruction.
>>
>> The SEVL instruction appears to execute in program order relative to any subsequent WFE instruction executed on the same PE, without the need for any explicit insertion of barrier instructions.

The following patch has allowed me to boot
the pine64+ 2GB with even a non-debug kernel.

# svnlite diff /usr/src/sys/arm64/arm64/mp_machdep.c
Index: /usr/src/sys/arm64/arm64/mp_machdep.c
===================================================================
--- /usr/src/sys/arm64/arm64/mp_machdep.c (revision 323246)
+++ /usr/src/sys/arm64/arm64/mp_machdep.c (working copy)
@@ -236,7 +236,9 @@
 
  atomic_store_rel_int(&aps_ready, 1);
  /* Wake up the other CPUs */
- __asm __volatile("sev");
+ __asm __volatile(
+    "dsb  ish \n"
+    "sev \n");
 
  printf("Release APs\n");

I will add this patch to bugzilla 222234.

===
Mark Millard
markmi at dsl-only.net

On 2017-Sep-16, at 4:08 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Adding a couple of numbers that help
> interpret what I found as far as what
> specifically did not work as expected.]
>
> On 2017-Sep-16, at 3:17 PM, Mark Millard <[hidden email]> wrote:
>
>> A new finding:
>>
>> When verbose boot messages are enabled
>> there is an earlier contrast between when
>> booting works overall vs. when it later
>> fails:
>>
>> When it works:
>>
>> subsystem f000000
>> release_aps(0)... Release APs
>> done.
>>
>> When it fails:
>>
>> subsystem f000000
>> release_aps(0)... Release APs
>> APs not started
>> done.
>>
>> And it well explains why ->pc_curthread
>> ends up NULL for secondaries (in particular
>> cpu == 1), init_secondary had never executed
>> the assignments show below:
>>
>>      while (!aps_ready)
>>              __asm __volatile("wfe");
>>
>>      /* Initialize curthread */
>>      KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>>      pcpup->pc_curthread = pcpup->pc_idlethread;
>>      pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> The subsystem messages are from:
>>
>> static void
>> release_aps(void *dummy __unused)
>> {      
>>      int i;
>>
>>      /* Only release CPUs if they exist */
>>      if (mp_ncpus == 1)
>>              return;
>>
>>      intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL);
>>      intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL);
>>      intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, NULL);
>>      intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL);
>>      intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL);
>>      intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, NULL);
>>
>>      atomic_store_rel_int(&aps_ready, 1);
>>      /* Wake up the other CPUs */
>>      __asm __volatile("sev");
>>
>>      printf("Release APs\n");
>>
>>      for (i = 0; i < 2000; i++) {
>>              if (smp_started)
>>                      return;
>>              DELAY(1000);
>>      }
>>
>>      printf("APs not started\n");
>> }      
>> SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL);
>>
>>
>> init_secondary has an example or two of not using
>> atomic_load_acq_int when atomic_store_rel_int is in
>> use. One is:
>>
>>      while (!aps_ready)
>>              __asm __volatile("wfe");
>>
>>      /* Initialize curthread */
>>      KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>>      pcpup->pc_curthread = pcpup->pc_idlethread;
>>      pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> where aps_ready was declared via:
>>
>> /* Set to 1 once we're ready to let the APs out of the pen. */
>> volatile int aps_ready = 0;
>>
>> where release_aps has the use of atomic_store_rel_int:
>>
>>      atomic_store_rel_int(&aps_ready, 1);
>>      /* Wake up the other CPUs */
>>      __asm __volatile("sev");
>>
>> There is also in init_secondary:
>>
>>      atomic_add_rel_32(&smp_cpus, 1);
>>
>>      if (smp_cpus == mp_ncpus) {
>>              /* enable IPI's, tlb shootdown, freezes etc */
>>              atomic_store_rel_int(&smp_started, 1);
>>      }
>>
>> where smp_cpus is accessed without being explicitly
>> atomic. mp_ncpus seems to have no atomic use at all.
>>
>> Where:
>>
>> /usr/src/sys/sys/smp.h:extern int smp_cpus;
>> /usr/src/sys/kern/subr_smp.c:int smp_cpus = 1;  /* how many cpu's running */
>>
>> So no "volatile", unlike the earlier example.
>>
>> /usr/src/sys/kern/kern_umtx.c:          if (smp_cpus > 1) {
>> /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0,
>>
>> /usr/src/sys/sys/smp.h:extern int mp_ncpus;
>> /usr/src/sys/kern/subr_smp.c:int mp_ncpus;
>>
>>
>> The smp_started is not explicitly accessed as atomic
>> in release_aps but in init_secondary has its update
>> to 1 via:
>>
>>      mtx_lock_spin(&ap_boot_mtx);
>>
>>      atomic_add_rel_32(&smp_cpus, 1);
>>
>>      if (smp_cpus == mp_ncpus) {
>>              /* enable IPI's, tlb shootdown, freezes etc */
>>              atomic_store_rel_int(&smp_started, 1);
>>      }
>>
>>      mtx_unlock_spin(&ap_boot_mtx);
>>
>> where:
>>
>> /usr/src/sys/sys/smp.h:extern volatile int smp_started;
>> /usr/src/sys/kern/subr_smp.c:volatile int smp_started;
>>
>> ("volatile" again for this context.)
>>
>> I'll also note that for the sparc64 architecture
>> there is some code like:
>>
>>   if (__predict_false(atomic_load_acq_int(&smp_started) == 0))
>>
>> that is explicitly matched to the atomic_store_rel_int
>> in its mp_machdep.c .
>>
>> I do not have enough background aarch64 knowledge to
>> know if it is provable that atomic_load_acq_int is not
>> needed in some of these cases.
>>
>> But getting "APs not started" at least sometimes
>> suggests an intermittent failure of the code as
>> it is.
>>
>>
>> Another difference is lack of explicit initialization
>> of smp_started but explicit initialization of aps_ready
>> and smp_cpus .
>>
>>
>>
>> I have no clue if the boot sequence is supposed to
>> handle "APs not started" by reverting to not being
>> a symmetric multiprocessing boot or some other
>> specific way instead of trying to avoiding use of
>> what was not initialized by:
>>
>>      pcpup->pc_curthread = pcpup->pc_idlethread;
>>      pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>>
>> in init_secondary.
>
>
> Example mp_ncpus and smp_cpus figures from a
> failed Pine64+ 2GB boot:
>
> db> print/x *smp_cpus
>      1
> db> print/x *mp_ncpus
> 138800000004
>
> But that should be a 4 byte width. Showing
> some context for reference:
>
> db> x/bx mp_ncpus-4,4
> rebooting:      0   0   0   0
> db> x/bx mp_ncpus,4
> mp_ncpus:       4   0   0   0
> db> x/bx mp_ncpus+4,4
> scsi_delay:     88  13  0   0
>
> For completeness:
>
> db> x/bx smp_cpus-4,4
> sysctl___kern_smp_disabled+0x5c:        0   0   0   0
> db> x/bx smp_cpus,4
> smp_cpus:       1   0   0   0
> db> x/bx smp_cpus+4,4
> smp_cpus+0x4:   0   0   0   0
>
> So smp_cpus was not incremented in memory. This
> goes along with no occurances of:
>
>      pcpup->pc_curthread = pcpup->pc_idlethread;
>      pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
>
> updates happening in init_secondary:
>
>       /* Spin until the BSP releases the APs */
>       while (!aps_ready)
>               __asm __volatile("wfe");
>
>       /* Initialize curthread */
>       KASSERT(PCPU_GET(idlethread) != NULL, ("no idle thread"));
>       pcpup->pc_curthread = pcpup->pc_idlethread;
>       pcpup->pc_curpcb = pcpup->pc_idlethread->td_pcb;
> . . .
>       mtx_lock_spin(&ap_boot_mtx);
>
>       atomic_add_rel_32(&smp_cpus, 1);
>
>       if (smp_cpus == mp_ncpus) {
>               /* enable IPI's, tlb shootdown, freezes etc */
>               atomic_store_rel_int(&smp_started, 1);
>       }
>
>       mtx_unlock_spin(&ap_boot_mtx);
>
> Which seems to imply that the aps_ready
> update:
>
>       atomic_store_rel_int(&aps_ready, 1);
>       /* Wake up the other CPUs */
>       __asm __volatile("sev");
>
> in release_aps was not seen in the:
>
>       while (!aps_ready)
>               __asm __volatile("wfe");
>
> in init_secondary.
>
> My guess is that "while (!aps_ready)" needs
> to be explicit about its atomic status.
> aps_ready is already volatile but apparently
> that is not enough for this context to be
> reliable.
>
> The other potential needs for explicit atomics
> are for later execution but may be required
> overall as well.


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"