Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list


On 2020-Jun-11, at 14:41, Brandon Bergren <bdragon at FreeBSD.org> wrote:

> An update from my end: I now have the ability to test dual processor G4 as well, now that mine is up and running.

Cool.

FYI:

Dual processors are not required for the
problem to happen: the stress based testing
showed the problem just as easily on the
single-socket/single-core contexts that I
tried.

> On Thu, Jun 11, 2020, at 4:36 PM, Mark Millard wrote:
>>
>> How did you test?
>>
>> In my context it was far easier to see the problem
>> with builds that did not use MALLOC_PRODUCTION. In
>> other words: jemalloc having its asserts tested.
>>
>> The easiest way I found to get the asserts to fail
>> was to do (multiple processes (-m) and totaling to
>> more than enough to force paging/swapping):
>>
>> stress -m 2 --vm-bytes 1700M &
>>
>> (Possibly setting up some shells first
>> to potentially later exit.)
>>
>> Normally stress itself would hit jemalloc
>> asserts. Apparently the asserts did not
>> stop the code and it ran until a failure
>> occurred (via dtv=0x0). I never had to
>> manually stop the stress processes.
>>
>> If no failures during, then exit shells
>> that likely were swapped out or partially
>> paged out during the stress run. They
>> hit jemalloc asserts during their cleanup
>> activity in my testing.
>>
>>
>>> That said, the attached patch effectively copies
>>> what's done in OEA6464 into OEA pmap.  Can you test it?
>>
>> I'll try it once I get a chance, probably later
>> today.
>>
>> I gather from what I see that moea64_protect did not
>> need the changes that you originally thought might
>> be required? I only see moea_protect changes in the
>> patch.
>

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list
In reply to this post by freebsd-ppc mailing list
On 2020-Jun-11, at 14:42, Justin Hibbits <chmeeedalf at gmail.com> wrote:

On Thu, 11 Jun 2020 14:36:37 -0700
Mark Millard <[hidden email]> wrote:

> On 2020-Jun-11, at 13:55, Justin Hibbits <chmeeedalf at gmail.com>
> wrote:
>
>> On Wed, 10 Jun 2020 18:56:57 -0700
>> Mark Millard <[hidden email]> wrote:
. . .
>
>
>> That said, the attached patch effectively copies
>> what's done in OEA6464 into OEA pmap.  Can you test it?  
>
> I'll try it once I get a chance, probably later
> today.
> . . .

No luck at the change being a fix, I'm afraid.

I verified that the build ended up with

00926cb0 <moea_protect+0x2ec> bl      008e8dc8 <PHYS_TO_VM_PAGE>
00926cb4 <moea_protect+0x2f0> mr      r27,r3
00926cb8 <moea_protect+0x2f4> addi    r3,r3,36
00926cbc <moea_protect+0x2f8> hwsync
00926cc0 <moea_protect+0x2fc> lwarx   r25,0,r3
00926cc4 <moea_protect+0x300> li      r4,0
00926cc8 <moea_protect+0x304> stwcx.  r4,0,r3
00926ccc <moea_protect+0x308> bne-    00926cc0 <moea_protect+0x2fc>
00926cd0 <moea_protect+0x30c> andi.   r3,r25,128
00926cd4 <moea_protect+0x310> beq     00926ce0 <moea_protect+0x31c>
00926cd8 <moea_protect+0x314> mr      r3,r27
00926cdc <moea_protect+0x318> bl      008e9874 <vm_page_dirty_KBI>

in the installed kernel. So I doubt a
mis-build would be involved. It is a
head -r360311 based context still. World is
without MALLOC_PRODUCTION so that jemalloc
code executes its asserts, catching more
and earlier than otherwise.

First test . . .

The only thing that the witness kernel reported was:

Jun 11 15:58:16 FBSDG4S2 kernel: lock order reversal:
Jun 11 15:58:16 FBSDG4S2 kernel:  1st 0x216fb00 Mountpoints (UMA zone) @ /usr/src/sys/vm/uma_core.c:4387
Jun 11 15:58:16 FBSDG4S2 kernel:  2nd 0x1192d2c kernelpmap (kernelpmap) @ /usr/src/sys/powerpc/aim/mmu_oea.c:1524
Jun 11 15:58:16 FBSDG4S2 kernel: stack backtrace:
Jun 11 15:58:16 FBSDG4S2 kernel: #0 0x5ec164 at witness_debugger+0x94
Jun 11 15:58:16 FBSDG4S2 kernel: #1 0x5ebe3c at witness_checkorder+0xb50
Jun 11 15:58:16 FBSDG4S2 kernel: #2 0x536d5c at __mtx_lock_flags+0xcc
Jun 11 15:58:16 FBSDG4S2 kernel: #3 0x92636c at moea_kextract+0x5c
Jun 11 15:58:16 FBSDG4S2 kernel: #4 0x965d30 at pmap_kextract+0x98
Jun 11 15:58:16 FBSDG4S2 kernel: #5 0x8bfdbc at zone_release+0xf0
Jun 11 15:58:16 FBSDG4S2 kernel: #6 0x8c7854 at bucket_drain+0x2f0
Jun 11 15:58:16 FBSDG4S2 kernel: #7 0x8c728c at bucket_free+0x54
Jun 11 15:58:16 FBSDG4S2 kernel: #8 0x8c74fc at bucket_cache_reclaim+0x1bc
Jun 11 15:58:16 FBSDG4S2 kernel: #9 0x8c7004 at zone_reclaim+0x128
Jun 11 15:58:16 FBSDG4S2 kernel: #10 0x8c3a40 at uma_reclaim+0x170
Jun 11 15:58:16 FBSDG4S2 kernel: #11 0x8c3f70 at uma_reclaim_worker+0x68
Jun 11 15:58:16 FBSDG4S2 kernel: #12 0x50fbac at fork_exit+0xb0
Jun 11 15:58:16 FBSDG4S2 kernel: #13 0x9684ac at fork_trampoline+0xc

The processes that were hit were listed as:

Jun 11 15:59:11 FBSDG4S2 kernel: pid 971 (cron), jid 0, uid 0: exited on signal 11 (core dumped)
Jun 11 16:02:59 FBSDG4S2 kernel: pid 1111 (stress), jid 0, uid 0: exited on signal 6 (core dumped)
Jun 11 16:03:27 FBSDG4S2 kernel: pid 871 (mountd), jid 0, uid 0: exited on signal 6 (core dumped)
Jun 11 16:03:40 FBSDG4S2 kernel: pid 1065 (su), jid 0, uid 0: exited on signal 6
Jun 11 16:04:13 FBSDG4S2 kernel: pid 1088 (su), jid 0, uid 0: exited on signal 6
Jun 11 16:04:28 FBSDG4S2 kernel: pid 968 (sshd), jid 0, uid 0: exited on signal 6

Jun 11 16:05:42 FBSDG4S2 kernel: pid 1028 (login), jid 0, uid 0: exited on signal 6

Jun 11 16:05:46 FBSDG4S2 kernel: pid 873 (nfsd), jid 0, uid 0: exited on signal 6 (core dumped)


Rebooting and rerunning and showing the stress output and such
(I did not capture copies during the first test, but the first
test had similar messages at the same sort of points):

Second test . . .

# stress -m 2 --vm-bytes 1700M
stress: info: [1166] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"
^C

# exit
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: Failed assertion: "ret == sz_index2size_compute(index)"
Abort trap

The other stuff was similar to to first test, not repeated here.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list
On 2020-Jun-11, at 16:49, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Jun-11, at 14:42, Justin Hibbits <chmeeedalf at gmail.com> wrote:
>
> On Thu, 11 Jun 2020 14:36:37 -0700
> Mark Millard <[hidden email]> wrote:
>
>> On 2020-Jun-11, at 13:55, Justin Hibbits <chmeeedalf at gmail.com>
>> wrote:
>>
>>> On Wed, 10 Jun 2020 18:56:57 -0700
>>> Mark Millard <[hidden email]> wrote:
> . . .
>>
>>
>>> That said, the attached patch effectively copies
>>> what's done in OEA6464 into OEA pmap.  Can you test it?  
>>
>> I'll try it once I get a chance, probably later
>> today.
>> . . .
>
> No luck at the change being a fix, I'm afraid.
>
> I verified that the build ended up with
>
> 00926cb0 <moea_protect+0x2ec> bl      008e8dc8 <PHYS_TO_VM_PAGE>
> 00926cb4 <moea_protect+0x2f0> mr      r27,r3
> 00926cb8 <moea_protect+0x2f4> addi    r3,r3,36
> 00926cbc <moea_protect+0x2f8> hwsync
> 00926cc0 <moea_protect+0x2fc> lwarx   r25,0,r3
> 00926cc4 <moea_protect+0x300> li      r4,0
> 00926cc8 <moea_protect+0x304> stwcx.  r4,0,r3
> 00926ccc <moea_protect+0x308> bne-    00926cc0 <moea_protect+0x2fc>
> 00926cd0 <moea_protect+0x30c> andi.   r3,r25,128
> 00926cd4 <moea_protect+0x310> beq     00926ce0 <moea_protect+0x31c>
> 00926cd8 <moea_protect+0x314> mr      r3,r27
> 00926cdc <moea_protect+0x318> bl      008e9874 <vm_page_dirty_KBI>
>
> in the installed kernel. So I doubt a
> mis-build would be involved. It is a
> head -r360311 based context still. World is
> without MALLOC_PRODUCTION so that jemalloc
> code executes its asserts, catching more
> and earlier than otherwise.
>
> First test . . .
>
> The only thing that the witness kernel reported was:
>
> Jun 11 15:58:16 FBSDG4S2 kernel: lock order reversal:
> Jun 11 15:58:16 FBSDG4S2 kernel:  1st 0x216fb00 Mountpoints (UMA zone) @ /usr/src/sys/vm/uma_core.c:4387
> Jun 11 15:58:16 FBSDG4S2 kernel:  2nd 0x1192d2c kernelpmap (kernelpmap) @ /usr/src/sys/powerpc/aim/mmu_oea.c:1524
> Jun 11 15:58:16 FBSDG4S2 kernel: stack backtrace:
> Jun 11 15:58:16 FBSDG4S2 kernel: #0 0x5ec164 at witness_debugger+0x94
> Jun 11 15:58:16 FBSDG4S2 kernel: #1 0x5ebe3c at witness_checkorder+0xb50
> Jun 11 15:58:16 FBSDG4S2 kernel: #2 0x536d5c at __mtx_lock_flags+0xcc
> Jun 11 15:58:16 FBSDG4S2 kernel: #3 0x92636c at moea_kextract+0x5c
> Jun 11 15:58:16 FBSDG4S2 kernel: #4 0x965d30 at pmap_kextract+0x98
> Jun 11 15:58:16 FBSDG4S2 kernel: #5 0x8bfdbc at zone_release+0xf0
> Jun 11 15:58:16 FBSDG4S2 kernel: #6 0x8c7854 at bucket_drain+0x2f0
> Jun 11 15:58:16 FBSDG4S2 kernel: #7 0x8c728c at bucket_free+0x54
> Jun 11 15:58:16 FBSDG4S2 kernel: #8 0x8c74fc at bucket_cache_reclaim+0x1bc
> Jun 11 15:58:16 FBSDG4S2 kernel: #9 0x8c7004 at zone_reclaim+0x128
> Jun 11 15:58:16 FBSDG4S2 kernel: #10 0x8c3a40 at uma_reclaim+0x170
> Jun 11 15:58:16 FBSDG4S2 kernel: #11 0x8c3f70 at uma_reclaim_worker+0x68
> Jun 11 15:58:16 FBSDG4S2 kernel: #12 0x50fbac at fork_exit+0xb0
> Jun 11 15:58:16 FBSDG4S2 kernel: #13 0x9684ac at fork_trampoline+0xc
>
> The processes that were hit were listed as:
>
> Jun 11 15:59:11 FBSDG4S2 kernel: pid 971 (cron), jid 0, uid 0: exited on signal 11 (core dumped)
> Jun 11 16:02:59 FBSDG4S2 kernel: pid 1111 (stress), jid 0, uid 0: exited on signal 6 (core dumped)
> Jun 11 16:03:27 FBSDG4S2 kernel: pid 871 (mountd), jid 0, uid 0: exited on signal 6 (core dumped)
> Jun 11 16:03:40 FBSDG4S2 kernel: pid 1065 (su), jid 0, uid 0: exited on signal 6
> Jun 11 16:04:13 FBSDG4S2 kernel: pid 1088 (su), jid 0, uid 0: exited on signal 6
> Jun 11 16:04:28 FBSDG4S2 kernel: pid 968 (sshd), jid 0, uid 0: exited on signal 6
>
> Jun 11 16:05:42 FBSDG4S2 kernel: pid 1028 (login), jid 0, uid 0: exited on signal 6
>
> Jun 11 16:05:46 FBSDG4S2 kernel: pid 873 (nfsd), jid 0, uid 0: exited on signal 6 (core dumped)
>
>
> Rebooting and rerunning and showing the stress output and such
> (I did not capture copies during the first test, but the first
> test had similar messages at the same sort of points):
>
> Second test . . .
>
> # stress -m 2 --vm-bytes 1700M
> stress: info: [1166] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
> <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"
> <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"
> ^C
>
> # exit
> <jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200: Failed assertion: "ret == sz_index2size_compute(index)"
> Abort trap
>
> The other stuff was similar to to first test, not repeated here.

The updated code looks odd to me for how "m" is
handled (part of a egrep to ensure I show all the
usage of m):

moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
        vm_page_t       m;
                        if (pm != kernel_pmap && m != NULL &&
                            (m->a.flags & PGA_EXECUTABLE) == 0 &&
                                if ((m->oflags & VPO_UNMANAGED) == 0)
                                        vm_page_aflag_set(m, PGA_EXECUTABLE);
                                m = PHYS_TO_VM_PAGE(old_pte.pte_lo & PTE_RPGN);
                                refchg = atomic_readandclear_32(&m->md.mdpg_attrs);
                                        vm_page_dirty(m);
                                        vm_page_aflag_set(m, PGA_REFERENCED);

Or more completely, with notes mixed in:

void
moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
    vm_prot_t prot)
{
        . . .
        vm_page_t       m;
        . . .
        for (pvo = RB_NFIND(pvo_tree, &pm->pmap_pvo, &key);
            pvo != NULL && PVO_VADDR(pvo) < eva; pvo = tpvo) {
                . . .
                if (pt != NULL) {
                        . . .
                        if (pm != kernel_pmap && m != NULL &&

NOTE: m seems to be uninitialized but tested for being NULL
above.

                            (m->a.flags & PGA_EXECUTABLE) == 0 &&

Note: This looks to potentially be using a random, non-NULL
value for m during evaluation of m->a.flags .

                        . . .

                        if ((pvo->pvo_vaddr & PVO_MANAGED) &&
                            (pvo->pvo_pte.prot & VM_PROT_WRITE)) {
                                m = PHYS_TO_VM_PAGE(old_pte.pte_lo & PTE_RPGN);

Note: m finally is potentially initialized(/set).

                                refchg = atomic_readandclear_32(&m->md.mdpg_attrs);
                                if (refchg & PTE_CHG)
                                        vm_page_dirty(m);
                                if (refchg & PTE_REF)
                                        vm_page_aflag_set(m, PGA_REFERENCED);
. . .

Note: So, if m is set above, then the next loop
iteration(s) would use this then-old value
instead of an initialized value.

It looks to me like at least one assignment
to m is missing.

moea64_pvo_protect has pg that seems analogous to
m and has:

        pg = PHYS_TO_VM_PAGE(pvo->pvo_pte.pa & LPTE_RPGN);
. . .
        if (pm != kernel_pmap && pg != NULL &&
            (pg->a.flags & PGA_EXECUTABLE) == 0 &&
            (pvo->pvo_pte.pa & (LPTE_I | LPTE_G | LPTE_NOEXEC)) == 0) {
                if ((pg->oflags & VPO_UNMANAGED) == 0)
                        vm_page_aflag_set(pg, PGA_EXECUTABLE);

. . .
        if (pg != NULL && (pvo->pvo_vaddr & PVO_MANAGED) &&
            (oldprot & VM_PROT_WRITE)) {
                refchg |= atomic_readandclear_32(&pg->md.mdpg_attrs);
                if (refchg & LPTE_CHG)
                        vm_page_dirty(pg);
                if (refchg & LPTE_REF)
                        vm_page_aflag_set(pg, PGA_REFERENCED);


This might suggest some about what is missing.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

Justin Hibbits-3
On Thu, 11 Jun 2020 17:30:24 -0700
Mark Millard <[hidden email]> wrote:

> On 2020-Jun-11, at 16:49, Mark Millard <marklmi at yahoo.com> wrote:
>
> > On 2020-Jun-11, at 14:42, Justin Hibbits <chmeeedalf at gmail.com>
> > wrote:
> >
> > On Thu, 11 Jun 2020 14:36:37 -0700
> > Mark Millard <[hidden email]> wrote:
> >  
> >> On 2020-Jun-11, at 13:55, Justin Hibbits <chmeeedalf at gmail.com>
> >> wrote:
> >>  
> >>> On Wed, 10 Jun 2020 18:56:57 -0700
> >>> Mark Millard <[hidden email]> wrote:  
> > . . .  
> >>
> >>  
> >>> That said, the attached patch effectively copies
> >>> what's done in OEA6464 into OEA pmap.  Can you test it?    
> >>
> >> I'll try it once I get a chance, probably later
> >> today.
> >> . . .  
> >
> > No luck at the change being a fix, I'm afraid.
> >
> > I verified that the build ended up with
> >
> > 00926cb0 <moea_protect+0x2ec> bl      008e8dc8 <PHYS_TO_VM_PAGE>
> > 00926cb4 <moea_protect+0x2f0> mr      r27,r3
> > 00926cb8 <moea_protect+0x2f4> addi    r3,r3,36
> > 00926cbc <moea_protect+0x2f8> hwsync
> > 00926cc0 <moea_protect+0x2fc> lwarx   r25,0,r3
> > 00926cc4 <moea_protect+0x300> li      r4,0
> > 00926cc8 <moea_protect+0x304> stwcx.  r4,0,r3
> > 00926ccc <moea_protect+0x308> bne-    00926cc0 <moea_protect+0x2fc>
> > 00926cd0 <moea_protect+0x30c> andi.   r3,r25,128
> > 00926cd4 <moea_protect+0x310> beq     00926ce0 <moea_protect+0x31c>
> > 00926cd8 <moea_protect+0x314> mr      r3,r27
> > 00926cdc <moea_protect+0x318> bl      008e9874 <vm_page_dirty_KBI>
> >
> > in the installed kernel. So I doubt a
> > mis-build would be involved. It is a
> > head -r360311 based context still. World is
> > without MALLOC_PRODUCTION so that jemalloc
> > code executes its asserts, catching more
> > and earlier than otherwise.
> >
> > First test . . .
> >
> > The only thing that the witness kernel reported was:
> >
> > Jun 11 15:58:16 FBSDG4S2 kernel: lock order reversal:
> > Jun 11 15:58:16 FBSDG4S2 kernel:  1st 0x216fb00 Mountpoints (UMA
> > zone) @ /usr/src/sys/vm/uma_core.c:4387 Jun 11 15:58:16 FBSDG4S2
> > kernel:  2nd 0x1192d2c kernelpmap (kernelpmap) @
> > /usr/src/sys/powerpc/aim/mmu_oea.c:1524 Jun 11 15:58:16 FBSDG4S2
> > kernel: stack backtrace: Jun 11 15:58:16 FBSDG4S2 kernel: #0
> > 0x5ec164 at witness_debugger+0x94 Jun 11 15:58:16 FBSDG4S2 kernel:
> > #1 0x5ebe3c at witness_checkorder+0xb50 Jun 11 15:58:16 FBSDG4S2
> > kernel: #2 0x536d5c at __mtx_lock_flags+0xcc Jun 11 15:58:16
> > FBSDG4S2 kernel: #3 0x92636c at moea_kextract+0x5c Jun 11 15:58:16
> > FBSDG4S2 kernel: #4 0x965d30 at pmap_kextract+0x98 Jun 11 15:58:16
> > FBSDG4S2 kernel: #5 0x8bfdbc at zone_release+0xf0 Jun 11 15:58:16
> > FBSDG4S2 kernel: #6 0x8c7854 at bucket_drain+0x2f0 Jun 11 15:58:16
> > FBSDG4S2 kernel: #7 0x8c728c at bucket_free+0x54 Jun 11 15:58:16
> > FBSDG4S2 kernel: #8 0x8c74fc at bucket_cache_reclaim+0x1bc Jun 11
> > 15:58:16 FBSDG4S2 kernel: #9 0x8c7004 at zone_reclaim+0x128 Jun 11
> > 15:58:16 FBSDG4S2 kernel: #10 0x8c3a40 at uma_reclaim+0x170 Jun 11
> > 15:58:16 FBSDG4S2 kernel: #11 0x8c3f70 at uma_reclaim_worker+0x68
> > Jun 11 15:58:16 FBSDG4S2 kernel: #12 0x50fbac at fork_exit+0xb0 Jun
> > 11 15:58:16 FBSDG4S2 kernel: #13 0x9684ac at fork_trampoline+0xc
> >
> > The processes that were hit were listed as:
> >
> > Jun 11 15:59:11 FBSDG4S2 kernel: pid 971 (cron), jid 0, uid 0:
> > exited on signal 11 (core dumped) Jun 11 16:02:59 FBSDG4S2 kernel:
> > pid 1111 (stress), jid 0, uid 0: exited on signal 6 (core dumped)
> > Jun 11 16:03:27 FBSDG4S2 kernel: pid 871 (mountd), jid 0, uid 0:
> > exited on signal 6 (core dumped) Jun 11 16:03:40 FBSDG4S2 kernel:
> > pid 1065 (su), jid 0, uid 0: exited on signal 6 Jun 11 16:04:13
> > FBSDG4S2 kernel: pid 1088 (su), jid 0, uid 0: exited on signal 6
> > Jun 11 16:04:28 FBSDG4S2 kernel: pid 968 (sshd), jid 0, uid 0:
> > exited on signal 6
> >
> > Jun 11 16:05:42 FBSDG4S2 kernel: pid 1028 (login), jid 0, uid 0:
> > exited on signal 6
> >
> > Jun 11 16:05:46 FBSDG4S2 kernel: pid 873 (nfsd), jid 0, uid 0:
> > exited on signal 6 (core dumped)
> >
> >
> > Rebooting and rerunning and showing the stress output and such
> > (I did not capture copies during the first test, but the first
> > test had similar messages at the same sort of points):
> >
> > Second test . . .
> >
> > # stress -m 2 --vm-bytes 1700M
> > stress: info: [1166] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
> > <jemalloc>:
> > /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
> > Failed assertion: "slab == extent_slab_get(extent)" <jemalloc>:
> > /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
> > Failed assertion: "slab == extent_slab_get(extent)" ^C
> >
> > # exit
> > <jemalloc>:
> > /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200:
> > Failed assertion: "ret == sz_index2size_compute(index)" Abort trap
> >
> > The other stuff was similar to to first test, not repeated here.  
>
> The updated code looks odd to me for how "m" is
> handled (part of a egrep to ensure I show all the
> usage of m):
>
> moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
>         vm_page_t       m;
>                         if (pm != kernel_pmap && m != NULL &&
>                             (m->a.flags & PGA_EXECUTABLE) == 0 &&
>                                 if ((m->oflags & VPO_UNMANAGED) == 0)
>                                         vm_page_aflag_set(m,
> PGA_EXECUTABLE); m = PHYS_TO_VM_PAGE(old_pte.pte_lo & PTE_RPGN);
>                                 refchg =
> atomic_readandclear_32(&m->md.mdpg_attrs); vm_page_dirty(m);
>                                         vm_page_aflag_set(m,
> PGA_REFERENCED);
>
> Or more completely, with notes mixed in:
>
> void
> moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
>     vm_prot_t prot)
> {
>         . . .
>         vm_page_t       m;
>         . . .
>         for (pvo = RB_NFIND(pvo_tree, &pm->pmap_pvo, &key);
>             pvo != NULL && PVO_VADDR(pvo) < eva; pvo = tpvo) {
>                 . . .
>                 if (pt != NULL) {
>                         . . .
>                         if (pm != kernel_pmap && m != NULL &&
>
> NOTE: m seems to be uninitialized but tested for being NULL
> above.
>
>                             (m->a.flags & PGA_EXECUTABLE) == 0 &&
>
> Note: This looks to potentially be using a random, non-NULL
> value for m during evaluation of m->a.flags .
>
>                         . . .
>
>                         if ((pvo->pvo_vaddr & PVO_MANAGED) &&
>                             (pvo->pvo_pte.prot & VM_PROT_WRITE)) {
>                                 m = PHYS_TO_VM_PAGE(old_pte.pte_lo &
> PTE_RPGN);
>
> Note: m finally is potentially initialized(/set).
>
>                                 refchg =
> atomic_readandclear_32(&m->md.mdpg_attrs); if (refchg & PTE_CHG)
>                                         vm_page_dirty(m);
>                                 if (refchg & PTE_REF)
>                                         vm_page_aflag_set(m,
> PGA_REFERENCED); . . .
>
> Note: So, if m is set above, then the next loop
> iteration(s) would use this then-old value
> instead of an initialized value.
>
> It looks to me like at least one assignment
> to m is missing.
>
> moea64_pvo_protect has pg that seems analogous to
> m and has:
>
>         pg = PHYS_TO_VM_PAGE(pvo->pvo_pte.pa & LPTE_RPGN);
> . . .
>         if (pm != kernel_pmap && pg != NULL &&
>             (pg->a.flags & PGA_EXECUTABLE) == 0 &&
>             (pvo->pvo_pte.pa & (LPTE_I | LPTE_G | LPTE_NOEXEC)) == 0)
> { if ((pg->oflags & VPO_UNMANAGED) == 0)
>                         vm_page_aflag_set(pg, PGA_EXECUTABLE);
>
> . . .
>         if (pg != NULL && (pvo->pvo_vaddr & PVO_MANAGED) &&
>             (oldprot & VM_PROT_WRITE)) {
>                 refchg |= atomic_readandclear_32(&pg->md.mdpg_attrs);
>                 if (refchg & LPTE_CHG)
>                         vm_page_dirty(pg);
>                 if (refchg & LPTE_REF)
>                         vm_page_aflag_set(pg, PGA_REFERENCED);
>
>
> This might suggest some about what is missing.

Can you try moving the assignment to 'm' to right below the
moea_pte_change() call?

- Justin
>
>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
>

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list


On 2020-Jun-11, at 19:25, Justin Hibbits <chmeeedalf at gmail.com> wrote:

> On Thu, 11 Jun 2020 17:30:24 -0700
> Mark Millard <[hidden email]> wrote:
>
>> On 2020-Jun-11, at 16:49, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> On 2020-Jun-11, at 14:42, Justin Hibbits <chmeeedalf at gmail.com>
>>> wrote:
>>>
>>> On Thu, 11 Jun 2020 14:36:37 -0700
>>> Mark Millard <[hidden email]> wrote:
>>>
>>>> On 2020-Jun-11, at 13:55, Justin Hibbits <chmeeedalf at gmail.com>
>>>> wrote:
>>>>
>>>>> On Wed, 10 Jun 2020 18:56:57 -0700
>>>>> Mark Millard <[hidden email]> wrote:  
>>> . . .  
>>>>
>>>>
>>>>> That said, the attached patch effectively copies
>>>>> what's done in OEA6464 into OEA pmap.  Can you test it?    
>>>>
>>>> I'll try it once I get a chance, probably later
>>>> today.
>>>> . . .  
>>>
>>> No luck at the change being a fix, I'm afraid.
>>>
>>> I verified that the build ended up with
>>>
>>> 00926cb0 <moea_protect+0x2ec> bl      008e8dc8 <PHYS_TO_VM_PAGE>
>>> 00926cb4 <moea_protect+0x2f0> mr      r27,r3
>>> 00926cb8 <moea_protect+0x2f4> addi    r3,r3,36
>>> 00926cbc <moea_protect+0x2f8> hwsync
>>> 00926cc0 <moea_protect+0x2fc> lwarx   r25,0,r3
>>> 00926cc4 <moea_protect+0x300> li      r4,0
>>> 00926cc8 <moea_protect+0x304> stwcx.  r4,0,r3
>>> 00926ccc <moea_protect+0x308> bne-    00926cc0 <moea_protect+0x2fc>
>>> 00926cd0 <moea_protect+0x30c> andi.   r3,r25,128
>>> 00926cd4 <moea_protect+0x310> beq     00926ce0 <moea_protect+0x31c>
>>> 00926cd8 <moea_protect+0x314> mr      r3,r27
>>> 00926cdc <moea_protect+0x318> bl      008e9874 <vm_page_dirty_KBI>
>>>
>>> in the installed kernel. So I doubt a
>>> mis-build would be involved. It is a
>>> head -r360311 based context still. World is
>>> without MALLOC_PRODUCTION so that jemalloc
>>> code executes its asserts, catching more
>>> and earlier than otherwise.
>>>
>>> First test . . .
>>>
>>> The only thing that the witness kernel reported was:
>>>
>>> Jun 11 15:58:16 FBSDG4S2 kernel: lock order reversal:
>>> Jun 11 15:58:16 FBSDG4S2 kernel:  1st 0x216fb00 Mountpoints (UMA
>>> zone) @ /usr/src/sys/vm/uma_core.c:4387 Jun 11 15:58:16 FBSDG4S2
>>> kernel:  2nd 0x1192d2c kernelpmap (kernelpmap) @
>>> /usr/src/sys/powerpc/aim/mmu_oea.c:1524 Jun 11 15:58:16 FBSDG4S2
>>> kernel: stack backtrace: Jun 11 15:58:16 FBSDG4S2 kernel: #0
>>> 0x5ec164 at witness_debugger+0x94 Jun 11 15:58:16 FBSDG4S2 kernel:
>>> #1 0x5ebe3c at witness_checkorder+0xb50 Jun 11 15:58:16 FBSDG4S2
>>> kernel: #2 0x536d5c at __mtx_lock_flags+0xcc Jun 11 15:58:16
>>> FBSDG4S2 kernel: #3 0x92636c at moea_kextract+0x5c Jun 11 15:58:16
>>> FBSDG4S2 kernel: #4 0x965d30 at pmap_kextract+0x98 Jun 11 15:58:16
>>> FBSDG4S2 kernel: #5 0x8bfdbc at zone_release+0xf0 Jun 11 15:58:16
>>> FBSDG4S2 kernel: #6 0x8c7854 at bucket_drain+0x2f0 Jun 11 15:58:16
>>> FBSDG4S2 kernel: #7 0x8c728c at bucket_free+0x54 Jun 11 15:58:16
>>> FBSDG4S2 kernel: #8 0x8c74fc at bucket_cache_reclaim+0x1bc Jun 11
>>> 15:58:16 FBSDG4S2 kernel: #9 0x8c7004 at zone_reclaim+0x128 Jun 11
>>> 15:58:16 FBSDG4S2 kernel: #10 0x8c3a40 at uma_reclaim+0x170 Jun 11
>>> 15:58:16 FBSDG4S2 kernel: #11 0x8c3f70 at uma_reclaim_worker+0x68
>>> Jun 11 15:58:16 FBSDG4S2 kernel: #12 0x50fbac at fork_exit+0xb0 Jun
>>> 11 15:58:16 FBSDG4S2 kernel: #13 0x9684ac at fork_trampoline+0xc
>>>
>>> The processes that were hit were listed as:
>>>
>>> Jun 11 15:59:11 FBSDG4S2 kernel: pid 971 (cron), jid 0, uid 0:
>>> exited on signal 11 (core dumped) Jun 11 16:02:59 FBSDG4S2 kernel:
>>> pid 1111 (stress), jid 0, uid 0: exited on signal 6 (core dumped)
>>> Jun 11 16:03:27 FBSDG4S2 kernel: pid 871 (mountd), jid 0, uid 0:
>>> exited on signal 6 (core dumped) Jun 11 16:03:40 FBSDG4S2 kernel:
>>> pid 1065 (su), jid 0, uid 0: exited on signal 6 Jun 11 16:04:13
>>> FBSDG4S2 kernel: pid 1088 (su), jid 0, uid 0: exited on signal 6
>>> Jun 11 16:04:28 FBSDG4S2 kernel: pid 968 (sshd), jid 0, uid 0:
>>> exited on signal 6
>>>
>>> Jun 11 16:05:42 FBSDG4S2 kernel: pid 1028 (login), jid 0, uid 0:
>>> exited on signal 6
>>>
>>> Jun 11 16:05:46 FBSDG4S2 kernel: pid 873 (nfsd), jid 0, uid 0:
>>> exited on signal 6 (core dumped)
>>>
>>>
>>> Rebooting and rerunning and showing the stress output and such
>>> (I did not capture copies during the first test, but the first
>>> test had similar messages at the same sort of points):
>>>
>>> Second test . . .
>>>
>>> # stress -m 2 --vm-bytes 1700M
>>> stress: info: [1166] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
>>> <jemalloc>:
>>> /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
>>> Failed assertion: "slab == extent_slab_get(extent)" <jemalloc>:
>>> /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258:
>>> Failed assertion: "slab == extent_slab_get(extent)" ^C
>>>
>>> # exit
>>> <jemalloc>:
>>> /usr/src/contrib/jemalloc/include/jemalloc/internal/sz.h:200:
>>> Failed assertion: "ret == sz_index2size_compute(index)" Abort trap
>>>
>>> The other stuff was similar to to first test, not repeated here.  
>>
>> The updated code looks odd to me for how "m" is
>> handled (part of a egrep to ensure I show all the
>> usage of m):
>>
>> moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
>>        vm_page_t       m;
>>                        if (pm != kernel_pmap && m != NULL &&
>>                            (m->a.flags & PGA_EXECUTABLE) == 0 &&
>>                                if ((m->oflags & VPO_UNMANAGED) == 0)
>>                                        vm_page_aflag_set(m,
>> PGA_EXECUTABLE); m = PHYS_TO_VM_PAGE(old_pte.pte_lo & PTE_RPGN);
>>                                refchg =
>> atomic_readandclear_32(&m->md.mdpg_attrs); vm_page_dirty(m);
>>                                        vm_page_aflag_set(m,
>> PGA_REFERENCED);
>>
>> Or more completely, with notes mixed in:
>>
>> void
>> moea_protect(mmu_t mmu, pmap_t pm, vm_offset_t sva, vm_offset_t eva,
>>    vm_prot_t prot)
>> {
>>        . . .
>>        vm_page_t       m;
>>        . . .
>>        for (pvo = RB_NFIND(pvo_tree, &pm->pmap_pvo, &key);
>>            pvo != NULL && PVO_VADDR(pvo) < eva; pvo = tpvo) {
>>                . . .
>>                if (pt != NULL) {
>>                        . . .
>>                        if (pm != kernel_pmap && m != NULL &&
>>
>> NOTE: m seems to be uninitialized but tested for being NULL
>> above.
>>
>>                            (m->a.flags & PGA_EXECUTABLE) == 0 &&
>>
>> Note: This looks to potentially be using a random, non-NULL
>> value for m during evaluation of m->a.flags .
>>
>>                        . . .
>>
>>                        if ((pvo->pvo_vaddr & PVO_MANAGED) &&
>>                            (pvo->pvo_pte.prot & VM_PROT_WRITE)) {
>>                                m = PHYS_TO_VM_PAGE(old_pte.pte_lo &
>> PTE_RPGN);
>>
>> Note: m finally is potentially initialized(/set).
>>
>>                                refchg =
>> atomic_readandclear_32(&m->md.mdpg_attrs); if (refchg & PTE_CHG)
>>                                        vm_page_dirty(m);
>>                                if (refchg & PTE_REF)
>>                                        vm_page_aflag_set(m,
>> PGA_REFERENCED); . . .
>>
>> Note: So, if m is set above, then the next loop
>> iteration(s) would use this then-old value
>> instead of an initialized value.
>>
>> It looks to me like at least one assignment
>> to m is missing.
>>
>> moea64_pvo_protect has pg that seems analogous to
>> m and has:
>>
>>        pg = PHYS_TO_VM_PAGE(pvo->pvo_pte.pa & LPTE_RPGN);
>> . . .
>>        if (pm != kernel_pmap && pg != NULL &&
>>            (pg->a.flags & PGA_EXECUTABLE) == 0 &&
>>            (pvo->pvo_pte.pa & (LPTE_I | LPTE_G | LPTE_NOEXEC)) == 0)
>> { if ((pg->oflags & VPO_UNMANAGED) == 0)
>>                        vm_page_aflag_set(pg, PGA_EXECUTABLE);
>>
>> . . .
>>        if (pg != NULL && (pvo->pvo_vaddr & PVO_MANAGED) &&
>>            (oldprot & VM_PROT_WRITE)) {
>>                refchg |= atomic_readandclear_32(&pg->md.mdpg_attrs);
>>                if (refchg & LPTE_CHG)
>>                        vm_page_dirty(pg);
>>                if (refchg & LPTE_REF)
>>                        vm_page_aflag_set(pg, PGA_REFERENCED);
>>
>>
>> This might suggest some about what is missing.
>
> Can you try moving the assignment to 'm' to right below the
> moea_pte_change() call?

Panics during boot. svnlite diff shown later.

That change got me a panic just after the lines about ada0
and cd0 details. (Unknown what internal stage.) Hand
translated from a picture of the screen:

panic: vm_page_free_prep: mapping flags set in page 0xd032a078
. . .
panic
vm_page_free_prep
vm_page_free_toq
vm_page_free
vm_object_collapse
vm_object_deallocate
vm_map_process_deferred
vm_map_remove
exec_new_vmspace
exec_elf32_imgact
kern_execve
sys_execve
trap
powerpc_interrupt
user SC trap by 0x100d7af8 . . .




# svnlite diff /usr/src/sys/powerpc/aim/mmu_oea.c
Index: /usr/src/sys/powerpc/aim/mmu_oea.c
===================================================================
--- /usr/src/sys/powerpc/aim/mmu_oea.c (revision 360322)
+++ /usr/src/sys/powerpc/aim/mmu_oea.c (working copy)
@@ -1773,6 +1773,9 @@
 {
  struct pvo_entry *pvo, *tpvo, key;
  struct pte *pt;
+ struct pte old_pte;
+ vm_page_t m;
+ int32_t refchg;
 
  KASSERT(pm == &curproc->p_vmspace->vm_pmap || pm == kernel_pmap,
     ("moea_protect: non current pmap"));
@@ -1800,12 +1803,31 @@
  pvo->pvo_pte.pte.pte_lo &= ~PTE_PP;
  pvo->pvo_pte.pte.pte_lo |= PTE_BR;
 
+ old_pte = *pt;
+
  /*
  * If the PVO is in the page table, update that pte as well.
  */
  if (pt != NULL) {
  moea_pte_change(pt, &pvo->pvo_pte.pte, pvo->pvo_vaddr);
+ m = PHYS_TO_VM_PAGE(old_pte.pte_lo & PTE_RPGN);
+ if (pm != kernel_pmap && m != NULL &&
+    (m->a.flags & PGA_EXECUTABLE) == 0 &&
+    (pvo->pvo_pte.pa & (PTE_I | PTE_G)) == 0) {
+ if ((m->oflags & VPO_UNMANAGED) == 0)
+ vm_page_aflag_set(m, PGA_EXECUTABLE);
+ moea_syncicache(pvo->pvo_pte.pa & PTE_RPGN,
+    PAGE_SIZE);
+ }
  mtx_unlock(&moea_table_mutex);
+ if ((pvo->pvo_vaddr & PVO_MANAGED) &&
+    (pvo->pvo_pte.prot & VM_PROT_WRITE)) {
+ refchg = atomic_readandclear_32(&m->md.mdpg_attrs);
+ if (refchg & PTE_CHG)
+ vm_page_dirty(m);
+ if (refchg & PTE_REF)
+ vm_page_aflag_set(m, PGA_REFERENCED);
+ }
  }
  }
  rw_wunlock(&pvh_global_lock);


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list
There is another oddity in the code structure, in
that if pt was ever NULL the code would misuse the
NULL before the test for non-NULL is made:

                pt = moea_pvo_to_pte(pvo, -1);
. . .
                old_pte = *pt;

                /*
                 * If the PVO is in the page table, update that pte as well.
                 */
                if (pt != NULL) {

(I'm not claiming that this explains the panic.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

Justin Hibbits-3
(Removing hackers and current, too many cross-lists already, and those
interested in reading this are probably already on ppc@)

Mark,

Can you try this updated patch?  Again, I've only compiled it, I
haven't tested it, so it may also explode.  However, it more closely
mimics exactly what moea64 does.

- Justin

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"

moea_protect.diff (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list


On 2020-Jun-16, at 19:32, Justin Hibbits <[hidden email]> wrote:

> (Removing hackers and current, too many cross-lists already, and those
> interested in reading this are probably already on ppc@)
>
> Mark,
>
> Can you try this updated patch?  Again, I've only compiled it, I
> haven't tested it, so it may also explode.  However, it more closely
> mimics exactly what moea64 does.

Sure . . . But no luck.

Same crash, same backtrace related information,
other than the "in page" figure and the "time"
figure:

panic: vm_page_free_prep: mapping flags set in page 0xd0300fc0
cpuid = 0
time = 1592362496
KDB: stack backtrace:
0xd2dc4340: at kdb_backtrace+0x64
0xd2dc43a0: at vpanic+0x208
0xd2dc4410: at panic+0x64
0xd2dc4450: at vm_page_free_prep+0x348
0xd2dc4470: at vm_page_free_toq+0x3c
0xd2dc4490: at vm_page_free+0x20
0xd2dc44a0: at vm_object_collapse+0x4ac
0xd2dc4510: at vm_object_deallocate+0x430
0xd2dc4550: at vm_map_process_deferred+0xec
0xd2dc4570: at vm_map_remove+0x12c
0xd2dc4590: at exec_new_vmspace+0x20c
0xd2dc45f0: at exec_elf32_imgact+0xa70
0xd2dc46a0: at kern_execve+0x600
0xd2dc4910: at sys_execve+0x84
0xd2dc4970: at trap+0x748
0xd2dc4a10: at powerpc_interrupt+0x178
0xd2dc4a40: user SC trap by 0x100d71f8: srr1=0xf032
            r1=0xffffd810 cr=0x82000280 xer=0 ctr=0x10173810 frame=0xd2dc4a48
KDB: enter: panic

/wrkdirs/usr/ports/devel/gdb/work-py37/gdb-9.1/gdb/inferior.c:283: internal-error: struct inferior *find_inferior_pid(int): Assertion `pid != 0' failed.


FYI . . .

(m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0
is failing when (m->oflags & VPO_UNMANAGED) == 0 holds in
vm_page_free_prep. See the last KASSERT in the code
quoted below. Does this suggest the lack of someplace
not clearing some flags in m->a.flags that should be
doing so?

static bool
vm_page_free_prep(vm_page_t m)
{
 
        /*
         * Synchronize with threads that have dropped a reference to this
         * page.
         */
        atomic_thread_fence_acq();
 
#if defined(DIAGNOSTIC) && defined(PHYS_TO_DMAP)
        if (PMAP_HAS_DMAP && (m->flags & PG_ZERO) != 0) {
                uint64_t *p;
                int i;
                p = (uint64_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m));
                for (i = 0; i < PAGE_SIZE / sizeof(uint64_t); i++, p++)
                        KASSERT(*p == 0, ("vm_page_free_prep %p PG_ZERO %d %jx",
                            m, i, (uintmax_t)*p));
        }
#endif
        if ((m->oflags & VPO_UNMANAGED) == 0) {
                KASSERT(!pmap_page_is_mapped(m),
                    ("vm_page_free_prep: freeing mapped page %p", m));
                KASSERT((m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0,
                    ("vm_page_free_prep: mapping flags set in page %p", m));
        } else {
. . .



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list
[I found a cause of the crash problem for the patched code,
or so I expect. More missing code.]

On 2020-Jun-16, at 20:23, Mark Millard <marklmi at yahoo.com> wrote:

>
> On 2020-Jun-16, at 19:32, Justin Hibbits <chmeeedalf at gmail.com> wrote:
>
>> (Removing hackers and current, too many cross-lists already, and those
>> interested in reading this are probably already on ppc@)
>>
>> Mark,
>>
>> Can you try this updated patch?  Again, I've only compiled it, I
>> haven't tested it, so it may also explode.  However, it more closely
>> mimics exactly what moea64 does.
>
> Sure . . . But no luck.
>
> Same crash, same backtrace related information,
> other than the "in page" figure and the "time"
> figure:
>
> panic: vm_page_free_prep: mapping flags set in page 0xd0300fc0
> cpuid = 0
> time = 1592362496
> KDB: stack backtrace:
> 0xd2dc4340: at kdb_backtrace+0x64
> 0xd2dc43a0: at vpanic+0x208
> 0xd2dc4410: at panic+0x64
> 0xd2dc4450: at vm_page_free_prep+0x348
> 0xd2dc4470: at vm_page_free_toq+0x3c
> 0xd2dc4490: at vm_page_free+0x20
> 0xd2dc44a0: at vm_object_collapse+0x4ac
> 0xd2dc4510: at vm_object_deallocate+0x430
> 0xd2dc4550: at vm_map_process_deferred+0xec
> 0xd2dc4570: at vm_map_remove+0x12c
> 0xd2dc4590: at exec_new_vmspace+0x20c
> 0xd2dc45f0: at exec_elf32_imgact+0xa70
> 0xd2dc46a0: at kern_execve+0x600
> 0xd2dc4910: at sys_execve+0x84
> 0xd2dc4970: at trap+0x748
> 0xd2dc4a10: at powerpc_interrupt+0x178
> 0xd2dc4a40: user SC trap by 0x100d71f8: srr1=0xf032
>            r1=0xffffd810 cr=0x82000280 xer=0 ctr=0x10173810 frame=0xd2dc4a48
> KDB: enter: panic
>
> /wrkdirs/usr/ports/devel/gdb/work-py37/gdb-9.1/gdb/inferior.c:283: internal-error: struct inferior *find_inferior_pid(int): Assertion `pid != 0' failed.
>
>
> FYI . . .
>
> (m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0
> is failing when (m->oflags & VPO_UNMANAGED) == 0 holds in
> vm_page_free_prep. See the last KASSERT in the code
> quoted below. Does this suggest the lack of someplace
> not clearing some flags in m->a.flags that should be
> doing so?
>
> static bool
> vm_page_free_prep(vm_page_t m)
> {
>
>        /*
>         * Synchronize with threads that have dropped a reference to this
>         * page.
>         */
>        atomic_thread_fence_acq();
>
> #if defined(DIAGNOSTIC) && defined(PHYS_TO_DMAP)
>        if (PMAP_HAS_DMAP && (m->flags & PG_ZERO) != 0) {
>                uint64_t *p;
>                int i;
>                p = (uint64_t *)PHYS_TO_DMAP(VM_PAGE_TO_PHYS(m));
>                for (i = 0; i < PAGE_SIZE / sizeof(uint64_t); i++, p++)
>                        KASSERT(*p == 0, ("vm_page_free_prep %p PG_ZERO %d %jx",
>                            m, i, (uintmax_t)*p));
>        }
> #endif
>        if ((m->oflags & VPO_UNMANAGED) == 0) {
>                KASSERT(!pmap_page_is_mapped(m),
>                    ("vm_page_free_prep: freeing mapped page %p", m));
>                KASSERT((m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0,
>                    ("vm_page_free_prep: mapping flags set in page %p", m));
>        } else {
> . . .
>


From what I can tell there is another 32-bit vs. 64 bit difference
for the following code that involves 32-bit not using
vm_page_aflag_clear(???,PGA_WRITEABLE | PGA_EXECUTABLE) where
64-bit does involve such.

Starting a trace of the issue at exec_new_vmspace . . .

int
exec_new_vmspace(struct image_params *imgp, struct sysentvec *sv)
{
        . . .
        if (vmspace->vm_refcnt == 1 && vm_map_min(map) == sv_minuser &&
            vm_map_max(map) == sv->sv_maxuser &&
            cpu_exec_vmspace_reuse(p, map)) {
                shmexit(vmspace);
                pmap_remove_pages(vmspace_pmap(vmspace));
                vm_map_remove(map, vm_map_min(map), vm_map_max(map));
. . .

The "pmap_remove_pages(vmspace_pmap(vmspace))" before the
vm_map_remove use has a very different handling for
64-bit (does something) vs. 32-bit (no-op) . . .

moea64_remove_pages is (and eventually involves vm_page_aflag_clear):

void
moea64_remove_pages(mmu_t mmu, pmap_t pm)
{
        . . .
        while (!SLIST_EMPTY(&tofree)) {
                pvo = SLIST_FIRST(&tofree);
                SLIST_REMOVE_HEAD(&tofree, pvo_dlink);
                moea64_pvo_remove_from_page(mmu, pvo);
                free_pvo_entry(pvo);
        }
}

where moea64_pvo_remove_from_page involves
vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE) via:

static inline void
moea64_pvo_remove_from_page_locked(mmu_t mmu, struct pvo_entry *pvo,
    vm_page_t m)
{
               
        . . .
        /*
         * Update vm about page writeability/executability if managed
         */
        PV_LOCKASSERT(pvo->pvo_pte.pa & LPTE_RPGN);
        if (pvo->pvo_vaddr & PVO_MANAGED) {
                if (m != NULL) {
                        LIST_REMOVE(pvo, pvo_vlink);
                        if (LIST_EMPTY(vm_page_to_pvoh(m)))
                                vm_page_aflag_clear(m,
                                    PGA_WRITEABLE | PGA_EXECUTABLE);
                }
        }
        . . .
}

But 32-bit has/uses:

        static void mmu_null_remove_pages(mmu_t mmu, pmap_t pmap)
        {
                return;
        }


so it does not involve:

    vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE)

but apparently should involve such in order to pass:

               KASSERT((m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0,
                   ("vm_page_free_prep: mapping flags set in page %p", m));

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

Brandon Bergren-3
On Sat, Jun 27, 2020, at 5:32 PM, Mark Millard wrote:

> where moea64_pvo_remove_from_page involves
> vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE) via:
>
> static inline void
> moea64_pvo_remove_from_page_locked(mmu_t mmu, struct pvo_entry *pvo,
>     vm_page_t m)
> {
>                
>         . . .
>         /*
>          * Update vm about page writeability/executability if managed
>          */
>         PV_LOCKASSERT(pvo->pvo_pte.pa & LPTE_RPGN);
>         if (pvo->pvo_vaddr & PVO_MANAGED) {
>                 if (m != NULL) {
>                         LIST_REMOVE(pvo, pvo_vlink);
>                         if (LIST_EMPTY(vm_page_to_pvoh(m)))
>                                 vm_page_aflag_clear(m,
>                                     PGA_WRITEABLE | PGA_EXECUTABLE);
>                 }
>         }
>         . . .
> }
>
> But 32-bit has/uses:
>
>         static void mmu_null_remove_pages(mmu_t mmu, pmap_t pmap)
>         {
>                 return;
>         }
>
>
> so it does not involve:
>
>     vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE)
>
> but apparently should involve such in order to pass:
>
>                KASSERT((m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0,
>                    ("vm_page_free_prep: mapping flags set in page %p", m));
>

looking at the history of the 64 bit code:
r233017 -- "Implement pmap_remove_pages(). This will be added later to the 32-bit MMU module."

Oops!


--
  Brandon Bergren
  [hidden email]
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

freebsd-ppc mailing list
On 2020-Jun-27, at 17:02, Brandon Bergren <bdragon at FreeBSD.org> wrote:

> On Sat, Jun 27, 2020, at 5:32 PM, Mark Millard wrote:
>
>> where moea64_pvo_remove_from_page involves
>> vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE) via:
>>
>> static inline void
>> moea64_pvo_remove_from_page_locked(mmu_t mmu, struct pvo_entry *pvo,
>>    vm_page_t m)
>> {
>>
>>        . . .
>>        /*
>>         * Update vm about page writeability/executability if managed
>>         */
>>        PV_LOCKASSERT(pvo->pvo_pte.pa & LPTE_RPGN);
>>        if (pvo->pvo_vaddr & PVO_MANAGED) {
>>                if (m != NULL) {
>>                        LIST_REMOVE(pvo, pvo_vlink);
>>                        if (LIST_EMPTY(vm_page_to_pvoh(m)))
>>                                vm_page_aflag_clear(m,
>>                                    PGA_WRITEABLE | PGA_EXECUTABLE);
>>                }
>>        }
>>        . . .
>> }
>>
>> But 32-bit has/uses:
>>
>>        static void mmu_null_remove_pages(mmu_t mmu, pmap_t pmap)
>>        {
>>                return;
>>        }
>>
>>
>> so it does not involve:
>>
>>    vm_page_aflag_clear(????,PGA_WRITEABLE | PGA_EXECUTABLE)
>>
>> but apparently should involve such in order to pass:
>>
>>               KASSERT((m->a.flags & (PGA_EXECUTABLE | PGA_WRITEABLE)) == 0,
>>                   ("vm_page_free_prep: mapping flags set in page %p", m));
>>
>
> looking at the history of the 64 bit code:
> r233017 -- "Implement pmap_remove_pages(). This will be added later to the 32-bit MMU module."
>
> Oops!

Looks like -r233949 is the first version of mmu_oea64.c
to involve clearing PGA_EXECUTABLE from the a.flags .
Later versions changed various aspects over the years
but clearing PGA_EXECUTABLE and PGA_WRITEABLE has been
a sustained property for PVO_MANANGED contexts from
what I see. (Not that I have any general understanding
of the code involved or what can be common for 32-bit
vs. what can not.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
12