amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context)

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context)

Mark Millard-2
I'm still at a loss about how to figure out what stages are messed
up. (Memory coherency? Some memory not swapped out? Bad data swapped
out? Wrong data swapped in?)

But at least I've found a much smaller/simpler example to demonstrate
some problem with in my Pine64+_ 2GB context.

The Pine64+ 2GB is the only amd64 context that I have access to.


The following program fails its check for data
having its expected byte pattern in dynamically
allocated memory after a fork/swap-out/swap-in
sequence.

I'll note that the program sleeps for 60s after
forking to give time to do something else to
cause the parent and child processes to swap
out (RES=0 as seen in top).

Note the source code line:

    // test_check(); // Adding this line prevents failure.

It seem that accessing the region contents before forking
and swapping avoids the problem. But there is a problem
if the region was only written-to before the fork/swap.

Another point is the size of the region matters: <= 14K Bytes
fails and > 14K Bytes works for as much has I have tested.


# more swap_testing.c
// swap_testing.c

// Built via (c++ was clang++ 4.0 in my case):
//
// cc -g -std=c11 -Wpedantic swap_testing.c
// -O0 and -O2 also gets the problem.

#include <unistd.h>     // for fork(), sleep(.)
#include <sys/types.h>  // for pid_t
#include <sys/wait.h>   // for wait(.)

extern void test_setup(void); // Sets up the memory byte pattern.
extern void test_check(void); // Tests the memory byte pattern.

int main(void)
{
    test_setup();
    // test_check(); // Adding this line prevents failure.

    pid_t pid = fork();
    int wait_status = 0;;

    if (0<pid) { wait(&wait_status); }

    if (-1!=wait_status && 0<=pid)
    {
        if (0==pid)
        {
            sleep(60);

            // During this manually force this process to
            // swap out. I use something like:

            // stress -m 1 --vm-bytes 1800M

            // in another shell and ^C'ing it after top
            // shows the swapped status desired. 1800M
            // just happened to work on the Pine64+ 2GB
            // that I was using.
        }

        test_check();
    }
}

// The memory and test code follows.

#include <stdbool.h>    // for bool, true, false
#include <stddef.h>     // for size_t, NULL
#include <stdlib.h>     // for malloc(.), free(.)

#include <signal.h>     // for raise(.), SIGABRT

#define region_size (14u*1024u)
                        // Bad dyn_region pattern, parent and child
                        // processes:
                        //  256u, 4u*1024u, 8u*1024u, 9u*1024u,
                        // 12u*1024u, 14u*1024u

                        // Works:
                        // 14u*1024u+1u, 15u*1024u, 16u*1024u,
                        // 32u*1024u, 256u*1024u*1024u

typedef volatile unsigned char value_type;

struct region_struct { value_type array[region_size]; };
typedef struct region_struct region;

static region            gbl_region;
static region * volatile dyn_region = NULL;

static value_type value(size_t v) { return (value_type)v; }

void test_setup(void) {
    dyn_region = malloc(sizeof(region));
    if (!dyn_region) raise(SIGABRT);

    for(size_t i=0u; i<region_size; i++) {
        (*dyn_region).array[i] = gbl_region.array[i] = value(i);
    }
}

static volatile bool gbl_failed = false; // Until potentially disproved
static volatile size_t gbl_pos = 0u;

static volatile bool dyn_failed = false; // Until potentially disproved
static volatile size_t dyn_pos = 0u;

void test_check(void) {
    while (!gbl_failed && gbl_pos<region_size) {
        gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]);
        gbl_pos++;
    }

    while (!dyn_failed && dyn_pos<region_size) {
        dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]);
        // Note: When the memory pattern fails this case is that
        //       records the failure.
        dyn_pos++;
    }

    if (gbl_failed) raise(SIGABRT);
    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
                                    // when it fails (both parent and child processes).
}


Other details from lldb (not using -O2 so things are
simpler, not presented in the order examined):

# lldb a.out -c /var/crash/a.out.11575.core
(lldb) target create "a.out" --core "/var/crash/a.out.11575.core"
Core file '/var/crash/a.out.11575.core' (aarch64) was loaded.
(lldb) bt
* thread #1, name = 'a.out', stop reason = signal SIGABRT
  * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8
    frame #1: libc.so.7`__raise(s=6) at raise.c:52
    frame #2: a.out`test_check at swap_testing.c:103
    frame #3: a.out`main at swap_testing.c:42
    frame #4: 0x0000000000020184 a.out`__start + 364
    frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41

(lldb) up 2
frame #2: a.out`test_check at swap_testing.c:103
   100    }
   101
   102    if (gbl_failed) raise(SIGABRT);
-> 103    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
   104                                    // when it fails (both parent and child processes).
   105 }

(lldb) print dyn_pos
(size_t) $0 = 2

(That is one after the failure position.)


(lldb) print dyn_region
(region *volatile) $3 = 0x0000000040616000

(lldb) print *dyn_region
(region) $1 = {
  array = {
    [0] = '\0'
    [1] = '\0'
    [2] = '\0'
. . . (all '\0' bytes) . . .
    [251] = '\0'
    [252] = '\0'
    [253] = '\0'
    [254] = '\0'
    [255] = '\0'
    ...
  }
}

(lldb) print gbl_region
(region) $2 = {
  array = {
    [0] = '\0'
    [1] = '\x01'
    [2] = '\x02'
. . .
    [251] = '\xfb'
    [252] = '\xfc'
    [253] = '\xfd'
    [254] = '\xfe'
    [255] = '\xff'
    ...
  }
}

(lldb) disass -n main
a.out`main:
    0x2022c <+0>:   sub    sp, sp, #0x30             ; =0x30
    0x20230 <+4>:   stp    x29, x30, [sp, #0x20]
    0x20234 <+8>:   add    x29, sp, #0x20            ; =0x20
    0x20238 <+12>:  stur   wzr, [x29, #-0x4]
    0x2023c <+16>:  bl     0x202b0                   ; test_setup at swap_testing.c:74
    0x20240 <+20>:  bl     0x20580                   ; symbol stub for: fork
    0x20244 <+24>:  mov    w8, wzr
    0x20248 <+28>:  stur   w0, [x29, #-0x8]
    0x2024c <+32>:  stur   wzr, [x29, #-0xc]
    0x20250 <+36>:  ldur   w0, [x29, #-0x8]
    0x20254 <+40>:  cmp    w8, w0
    0x20258 <+44>:  b.ge   0x20268                   ; <+60> at swap_testing.c
    0x2025c <+48>:  sub    x0, x29, #0xc             ; =0xc
    0x20260 <+52>:  bl     0x20590                   ; symbol stub for: wait
    0x20264 <+56>:  str    w0, [sp, #0x10]
    0x20268 <+60>:  mov    w8, #-0x1
    0x2026c <+64>:  ldur   w9, [x29, #-0xc]
    0x20270 <+68>:  cmp    w8, w9
    0x20274 <+72>:  b.eq   0x202a0                   ; <+116> at swap_testing.c:44
    0x20278 <+76>:  mov    w8, wzr
    0x2027c <+80>:  ldur   w9, [x29, #-0x8]
    0x20280 <+84>:  cmp    w8, w9
    0x20284 <+88>:  b.gt   0x202a0                   ; <+116> at swap_testing.c:44
    0x20288 <+92>:  ldur   w8, [x29, #-0x8]
    0x2028c <+96>:  cbnz   w8, 0x2029c               ; <+112> at swap_testing.c:42
    0x20290 <+100>: orr    w0, wzr, #0x3c
    0x20294 <+104>: bl     0x205a0                   ; symbol stub for: sleep
    0x20298 <+108>: str    w0, [sp, #0xc]
    0x2029c <+112>: bl     0x20348                   ; test_check at swap_testing.c:89
    0x202a0 <+116>: ldur   w0, [x29, #-0x4]
    0x202a4 <+120>: ldp    x29, x30, [sp, #0x20]
    0x202a8 <+124>: add    sp, sp, #0x30             ; =0x30
    0x202ac <+128>: ret    

(lldb) disass -n value
a.out`value:
    0x204cc <+0>:  sub    sp, sp, #0x10             ; =0x10
    0x204d0 <+4>:  str    x0, [sp, #0x8]
    0x204d4 <+8>:  ldrb   w8, [sp, #0x8]
    0x204d8 <+12>: mov    w1, w8
    0x204dc <+16>: mov    w0, w8
    0x204e0 <+20>: str    w1, [sp, #0x4]
    0x204e4 <+24>: add    sp, sp, #0x10             ; =0x10
    0x204e8 <+28>: ret    

(lldb) disass -n test_setup
a.out`test_setup:
    0x202b0 <+0>:   sub    sp, sp, #0x20             ; =0x20
    0x202b4 <+4>:   stp    x29, x30, [sp, #0x10]
    0x202b8 <+8>:   add    x29, sp, #0x10            ; =0x10
    0x202bc <+12>:  orr    x0, xzr, #0x3800
    0x202c0 <+16>:  bl     0x205b0                   ; symbol stub for: malloc
    0x202c4 <+20>:  adrp   x30, 48
    0x202c8 <+24>:  add    x30, x30, #0x0            ; =0x0
    0x202cc <+28>:  str    x0, [x30]
    0x202d0 <+32>:  ldr    x0, [x30]
    0x202d4 <+36>:  cbnz   x0, 0x202e4               ; <+52> at swap_testing.c:78
    0x202d8 <+40>:  orr    w0, wzr, #0x6
    0x202dc <+44>:  bl     0x205c0                   ; symbol stub for: raise
    0x202e0 <+48>:  str    w0, [sp, #0x4]
    0x202e4 <+52>:  str    xzr, [sp, #0x8]
    0x202e8 <+56>:  orr    x8, xzr, #0x3800
    0x202ec <+60>:  ldr    x9, [sp, #0x8]
    0x202f0 <+64>:  cmp    x9, x8
    0x202f4 <+68>:  b.hs   0x2033c                   ; <+140> at swap_testing.c:81
    0x202f8 <+72>:  ldr    x0, [sp, #0x8]
    0x202fc <+76>:  bl     0x204cc                   ; value at swap_testing.c:72
    0x20300 <+80>:  adrp   x30, 48
    0x20304 <+84>:  add    x30, x30, #0x0            ; =0x0
    0x20308 <+88>:  adrp   x8, 48
    0x2030c <+92>:  add    x8, x8, #0x8              ; =0x8
    0x20310 <+96>:  ldr    x9, [sp, #0x8]
    0x20314 <+100>: add    x8, x8, x9
    0x20318 <+104>: strb   w0, [x8]
    0x2031c <+108>: ldr    x8, [x30]
    0x20320 <+112>: ldr    x9, [sp, #0x8]
    0x20324 <+116>: add    x8, x8, x9
    0x20328 <+120>: strb   w0, [x8]
    0x2032c <+124>: ldr    x8, [sp, #0x8]
    0x20330 <+128>: add    x8, x8, #0x1              ; =0x1
    0x20334 <+132>: str    x8, [sp, #0x8]
    0x20338 <+136>: b      0x202e8                   ; <+56> at swap_testing.c
    0x2033c <+140>: ldp    x29, x30, [sp, #0x10]
    0x20340 <+144>: add    sp, sp, #0x20             ; =0x20
    0x20344 <+148>: ret    

(lldb) disass -n test_check
a.out`test_check:
    0x20348 <+0>:   sub    sp, sp, #0x20             ; =0x20
    0x2034c <+4>:   stp    x29, x30, [sp, #0x10]
    0x20350 <+8>:   add    x29, sp, #0x10            ; =0x10
    0x20354 <+12>:  b      0x20358                   ; <+16> at swap_testing.c
    0x20358 <+16>:  mov    w8, wzr
    0x2035c <+20>:  adrp   x9, 51
    0x20360 <+24>:  add    x9, x9, #0x808            ; =0x808
    0x20364 <+28>:  ldrb   w10, [x9]
    0x20368 <+32>:  stur   w8, [x29, #-0x4]
    0x2036c <+36>:  tbnz   w10, #0x0, 0x2038c        ; <+68> at swap_testing.c
    0x20370 <+40>:  orr    x8, xzr, #0x3800
    0x20374 <+44>:  adrp   x9, 51
    0x20378 <+48>:  add    x9, x9, #0x810            ; =0x810
    0x2037c <+52>:  ldr    x9, [x9]
    0x20380 <+56>:  cmp    x9, x8
    0x20384 <+60>:  cset   w10, lo
    0x20388 <+64>:  stur   w10, [x29, #-0x4]
    0x2038c <+68>:  ldur   w8, [x29, #-0x4]
    0x20390 <+72>:  tbz    w8, #0x0, 0x203ec         ; <+164> at swap_testing.c:95
    0x20394 <+76>:  adrp   x8, 51
    0x20398 <+80>:  add    x8, x8, #0x810            ; =0x810
    0x2039c <+84>:  ldr    x0, [x8]
    0x203a0 <+88>:  bl     0x204cc                   ; value at swap_testing.c:72
    0x203a4 <+92>:  adrp   x8, 51
    0x203a8 <+96>:  add    x8, x8, #0x810            ; =0x810
    0x203ac <+100>: adrp   x30, 51
    0x203b0 <+104>: add    x30, x30, #0x808          ; =0x808
    0x203b4 <+108>: adrp   x9, 48
    0x203b8 <+112>: add    x9, x9, #0x8              ; =0x8
    0x203bc <+116>: uxtb   w0, w0
    0x203c0 <+120>: ldr    x10, [x8]
    0x203c4 <+124>: add    x9, x9, x10
    0x203c8 <+128>: ldrb   w11, [x9]
    0x203cc <+132>: cmp    w0, w11
    0x203d0 <+136>: cset   w11, ne
    0x203d4 <+140>: and    w11, w11, #0x1
    0x203d8 <+144>: strb   w11, [x30]
    0x203dc <+148>: ldr    x9, [x8]
    0x203e0 <+152>: add    x9, x9, #0x1              ; =0x1
    0x203e4 <+156>: str    x9, [x8]
    0x203e8 <+160>: b      0x20358                   ; <+16> at swap_testing.c
    0x203ec <+164>: b      0x203f0                   ; <+168> at swap_testing.c
    0x203f0 <+168>: mov    w8, wzr
    0x203f4 <+172>: adrp   x9, 51
    0x203f8 <+176>: add    x9, x9, #0x818            ; =0x818
    0x203fc <+180>: ldrb   w10, [x9]
    0x20400 <+184>: str    w8, [sp, #0x8]
    0x20404 <+188>: tbnz   w10, #0x0, 0x20424        ; <+220> at swap_testing.c
    0x20408 <+192>: orr    x8, xzr, #0x3800
    0x2040c <+196>: adrp   x9, 51
    0x20410 <+200>: add    x9, x9, #0x820            ; =0x820
    0x20414 <+204>: ldr    x9, [x9]
    0x20418 <+208>: cmp    x9, x8
    0x2041c <+212>: cset   w10, lo
    0x20420 <+216>: str    w10, [sp, #0x8]
    0x20424 <+220>: ldr    w8, [sp, #0x8]
    0x20428 <+224>: tbz    w8, #0x0, 0x20488         ; <+320> at swap_testing.c
    0x2042c <+228>: adrp   x8, 51
    0x20430 <+232>: add    x8, x8, #0x820            ; =0x820
    0x20434 <+236>: ldr    x0, [x8]
    0x20438 <+240>: bl     0x204cc                   ; value at swap_testing.c:72
    0x2043c <+244>: adrp   x8, 51
    0x20440 <+248>: add    x8, x8, #0x820            ; =0x820
    0x20444 <+252>: adrp   x30, 51
    0x20448 <+256>: add    x30, x30, #0x818          ; =0x818
    0x2044c <+260>: adrp   x9, 48
    0x20450 <+264>: add    x9, x9, #0x0              ; =0x0
    0x20454 <+268>: uxtb   w0, w0
    0x20458 <+272>: ldr    x9, [x9]
    0x2045c <+276>: ldr    x10, [x8]
    0x20460 <+280>: add    x9, x9, x10
    0x20464 <+284>: ldrb   w11, [x9]
    0x20468 <+288>: cmp    w0, w11
    0x2046c <+292>: cset   w11, ne
    0x20470 <+296>: and    w11, w11, #0x1
    0x20474 <+300>: strb   w11, [x30]
    0x20478 <+304>: ldr    x9, [x8]
    0x2047c <+308>: add    x9, x9, #0x1              ; =0x1
    0x20480 <+312>: str    x9, [x8]
    0x20484 <+316>: b      0x203f0                   ; <+168> at swap_testing.c
    0x20488 <+320>: adrp   x8, 51
    0x2048c <+324>: add    x8, x8, #0x808            ; =0x808
    0x20490 <+328>: ldrb   w9, [x8]
    0x20494 <+332>: tbz    w9, #0x0, 0x204a4         ; <+348> at swap_testing.c
    0x20498 <+336>: orr    w0, wzr, #0x6
    0x2049c <+340>: bl     0x205c0                   ; symbol stub for: raise
    0x204a0 <+344>: str    w0, [sp, #0x4]
    0x204a4 <+348>: adrp   x8, 51
    0x204a8 <+352>: add    x8, x8, #0x818            ; =0x818
    0x204ac <+356>: ldrb   w9, [x8]
    0x204b0 <+360>: tbz    w9, #0x0, 0x204c0         ; <+376> at swap_testing.c:105
    0x204b4 <+364>: orr    w0, wzr, #0x6
    0x204b8 <+368>: bl     0x205c0                   ; symbol stub for: raise
->  0x204bc <+372>: str    w0, [sp]
    0x204c0 <+376>: ldp    x29, x30, [sp, #0x10]
    0x204c4 <+380>: add    sp, sp, #0x20             ; =0x20
    0x204c8 <+384>: ret    

# uname -apKU
FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r314638M  arm64 aarch64 1200023 1200023

buildworld buildlkernel did not have MALLOC_PRODUCTION= defined. The kernel is a
non-debug kernel. (Previous to these experiments my other corruption examples
were not caught by a debug kernel. I'm not hopeful that this simpler context
would either.)



===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context)

Mark Millard-2

On 2017-Mar-13, at 11:52 PM, Mark Millard <[hidden email]> wrote:

> I'm still at a loss about how to figure out what stages are messed
> up. (Memory coherency? Some memory not swapped out? Bad data swapped
> out? Wrong data swapped in?)
>
> But at least I've found a much smaller/simpler example to demonstrate
> some problem with in my Pine64+_ 2GB context.
>
> The Pine64+ 2GB is the only amd64 context that I have access to.

Someday I'll learn to type arm64 the first time instead of amd64.

> The following program fails its check for data
> having its expected byte pattern in dynamically
> allocated memory after a fork/swap-out/swap-in
> sequence.
>
> I'll note that the program sleeps for 60s after
> forking to give time to do something else to
> cause the parent and child processes to swap
> out (RES=0 as seen in top).
>
> Note the source code line:
>
>    // test_check(); // Adding this line prevents failure.
>
> It seem that accessing the region contents before forking
> and swapping avoids the problem. But there is a problem
> if the region was only written-to before the fork/swap.
>
> Another point is the size of the region matters: <= 14K Bytes
> fails and > 14K Bytes works for as much has I have tested.
>
>
> # more swap_testing.c
> // swap_testing.c
>
> // Built via (c++ was clang++ 4.0 in my case):
> //
> // cc -g -std=c11 -Wpedantic swap_testing.c
> // -O0 and -O2 also gets the problem.
>
> #include <unistd.h>     // for fork(), sleep(.)
> #include <sys/types.h>  // for pid_t
> #include <sys/wait.h>   // for wait(.)
>
> extern void test_setup(void); // Sets up the memory byte pattern.
> extern void test_check(void); // Tests the memory byte pattern.
>
> int main(void)
> {
>    test_setup();
>    // test_check(); // Adding this line prevents failure.
>
>    pid_t pid = fork();
>    int wait_status = 0;;
>
>    if (0<pid) { wait(&wait_status); }
>
>    if (-1!=wait_status && 0<=pid)
>    {
>        if (0==pid)
>        {
>            sleep(60);
>
>            // During this manually force this process to
>            // swap out. I use something like:
>
>            // stress -m 1 --vm-bytes 1800M
>
>            // in another shell and ^C'ing it after top
>            // shows the swapped status desired. 1800M
>            // just happened to work on the Pine64+ 2GB
>            // that I was using.
>        }
>
>        test_check();
>    }
> }
>
> // The memory and test code follows.
>
> #include <stdbool.h>    // for bool, true, false
> #include <stddef.h>     // for size_t, NULL
> #include <stdlib.h>     // for malloc(.), free(.)
>
> #include <signal.h>     // for raise(.), SIGABRT
>
> #define region_size (14u*1024u)
>                        // Bad dyn_region pattern, parent and child
>                        // processes:
>                        //  256u, 4u*1024u, 8u*1024u, 9u*1024u,
>                        // 12u*1024u, 14u*1024u
>
>                        // Works:
>                        // 14u*1024u+1u, 15u*1024u, 16u*1024u,
>                        // 32u*1024u, 256u*1024u*1024u
>
> typedef volatile unsigned char value_type;
>
> struct region_struct { value_type array[region_size]; };
> typedef struct region_struct region;
>
> static region            gbl_region;
> static region * volatile dyn_region = NULL;
>
> static value_type value(size_t v) { return (value_type)v; }
>
> void test_setup(void) {
>    dyn_region = malloc(sizeof(region));
>    if (!dyn_region) raise(SIGABRT);
>
>    for(size_t i=0u; i<region_size; i++) {
>        (*dyn_region).array[i] = gbl_region.array[i] = value(i);
>    }
> }
>
> static volatile bool gbl_failed = false; // Until potentially disproved
> static volatile size_t gbl_pos = 0u;
>
> static volatile bool dyn_failed = false; // Until potentially disproved
> static volatile size_t dyn_pos = 0u;
>
> void test_check(void) {
>    while (!gbl_failed && gbl_pos<region_size) {
>        gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]);
>        gbl_pos++;
>    }
>
>    while (!dyn_failed && dyn_pos<region_size) {
>        dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]);
>        // Note: When the memory pattern fails this case is that
>        //       records the failure.
>        dyn_pos++;
>    }
>
>    if (gbl_failed) raise(SIGABRT);
>    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
>                                    // when it fails (both parent and child processes).
> }
>
>
> Other details from lldb (not using -O2 so things are
> simpler, not presented in the order examined):
>
> # lldb a.out -c /var/crash/a.out.11575.core
> (lldb) target create "a.out" --core "/var/crash/a.out.11575.core"
> Core file '/var/crash/a.out.11575.core' (aarch64) was loaded.
> (lldb) bt
> * thread #1, name = 'a.out', stop reason = signal SIGABRT
>  * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8
>    frame #1: libc.so.7`__raise(s=6) at raise.c:52
>    frame #2: a.out`test_check at swap_testing.c:103
>    frame #3: a.out`main at swap_testing.c:42
>    frame #4: 0x0000000000020184 a.out`__start + 364
>    frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41
>
> (lldb) up 2
> frame #2: a.out`test_check at swap_testing.c:103
>   100    }
>   101
>   102    if (gbl_failed) raise(SIGABRT);
> -> 103    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
>   104                                    // when it fails (both parent and child processes).
>   105 }
>
> (lldb) print dyn_pos
> (size_t) $0 = 2
>
> (That is one after the failure position.)
>
>
> (lldb) print dyn_region
> (region *volatile) $3 = 0x0000000040616000
>
> (lldb) print *dyn_region
> (region) $1 = {
>  array = {
>    [0] = '\0'
>    [1] = '\0'
>    [2] = '\0'
> . . . (all '\0' bytes) . . .
>    [251] = '\0'
>    [252] = '\0'
>    [253] = '\0'
>    [254] = '\0'
>    [255] = '\0'
>    ...
>  }
> }
>
> (lldb) print gbl_region
> (region) $2 = {
>  array = {
>    [0] = '\0'
>    [1] = '\x01'
>    [2] = '\x02'
> . . .
>    [251] = '\xfb'
>    [252] = '\xfc'
>    [253] = '\xfd'
>    [254] = '\xfe'
>    [255] = '\xff'
>    ...
>  }
> }
>
> (lldb) disass -n main
> a.out`main:
>    0x2022c <+0>:   sub    sp, sp, #0x30             ; =0x30
>    0x20230 <+4>:   stp    x29, x30, [sp, #0x20]
>    0x20234 <+8>:   add    x29, sp, #0x20            ; =0x20
>    0x20238 <+12>:  stur   wzr, [x29, #-0x4]
>    0x2023c <+16>:  bl     0x202b0                   ; test_setup at swap_testing.c:74
>    0x20240 <+20>:  bl     0x20580                   ; symbol stub for: fork
>    0x20244 <+24>:  mov    w8, wzr
>    0x20248 <+28>:  stur   w0, [x29, #-0x8]
>    0x2024c <+32>:  stur   wzr, [x29, #-0xc]
>    0x20250 <+36>:  ldur   w0, [x29, #-0x8]
>    0x20254 <+40>:  cmp    w8, w0
>    0x20258 <+44>:  b.ge   0x20268                   ; <+60> at swap_testing.c
>    0x2025c <+48>:  sub    x0, x29, #0xc             ; =0xc
>    0x20260 <+52>:  bl     0x20590                   ; symbol stub for: wait
>    0x20264 <+56>:  str    w0, [sp, #0x10]
>    0x20268 <+60>:  mov    w8, #-0x1
>    0x2026c <+64>:  ldur   w9, [x29, #-0xc]
>    0x20270 <+68>:  cmp    w8, w9
>    0x20274 <+72>:  b.eq   0x202a0                   ; <+116> at swap_testing.c:44
>    0x20278 <+76>:  mov    w8, wzr
>    0x2027c <+80>:  ldur   w9, [x29, #-0x8]
>    0x20280 <+84>:  cmp    w8, w9
>    0x20284 <+88>:  b.gt   0x202a0                   ; <+116> at swap_testing.c:44
>    0x20288 <+92>:  ldur   w8, [x29, #-0x8]
>    0x2028c <+96>:  cbnz   w8, 0x2029c               ; <+112> at swap_testing.c:42
>    0x20290 <+100>: orr    w0, wzr, #0x3c
>    0x20294 <+104>: bl     0x205a0                   ; symbol stub for: sleep
>    0x20298 <+108>: str    w0, [sp, #0xc]
>    0x2029c <+112>: bl     0x20348                   ; test_check at swap_testing.c:89
>    0x202a0 <+116>: ldur   w0, [x29, #-0x4]
>    0x202a4 <+120>: ldp    x29, x30, [sp, #0x20]
>    0x202a8 <+124>: add    sp, sp, #0x30             ; =0x30
>    0x202ac <+128>: ret    
>
> (lldb) disass -n value
> a.out`value:
>    0x204cc <+0>:  sub    sp, sp, #0x10             ; =0x10
>    0x204d0 <+4>:  str    x0, [sp, #0x8]
>    0x204d4 <+8>:  ldrb   w8, [sp, #0x8]
>    0x204d8 <+12>: mov    w1, w8
>    0x204dc <+16>: mov    w0, w8
>    0x204e0 <+20>: str    w1, [sp, #0x4]
>    0x204e4 <+24>: add    sp, sp, #0x10             ; =0x10
>    0x204e8 <+28>: ret    
>
> (lldb) disass -n test_setup
> a.out`test_setup:
>    0x202b0 <+0>:   sub    sp, sp, #0x20             ; =0x20
>    0x202b4 <+4>:   stp    x29, x30, [sp, #0x10]
>    0x202b8 <+8>:   add    x29, sp, #0x10            ; =0x10
>    0x202bc <+12>:  orr    x0, xzr, #0x3800
>    0x202c0 <+16>:  bl     0x205b0                   ; symbol stub for: malloc
>    0x202c4 <+20>:  adrp   x30, 48
>    0x202c8 <+24>:  add    x30, x30, #0x0            ; =0x0
>    0x202cc <+28>:  str    x0, [x30]
>    0x202d0 <+32>:  ldr    x0, [x30]
>    0x202d4 <+36>:  cbnz   x0, 0x202e4               ; <+52> at swap_testing.c:78
>    0x202d8 <+40>:  orr    w0, wzr, #0x6
>    0x202dc <+44>:  bl     0x205c0                   ; symbol stub for: raise
>    0x202e0 <+48>:  str    w0, [sp, #0x4]
>    0x202e4 <+52>:  str    xzr, [sp, #0x8]
>    0x202e8 <+56>:  orr    x8, xzr, #0x3800
>    0x202ec <+60>:  ldr    x9, [sp, #0x8]
>    0x202f0 <+64>:  cmp    x9, x8
>    0x202f4 <+68>:  b.hs   0x2033c                   ; <+140> at swap_testing.c:81
>    0x202f8 <+72>:  ldr    x0, [sp, #0x8]
>    0x202fc <+76>:  bl     0x204cc                   ; value at swap_testing.c:72
>    0x20300 <+80>:  adrp   x30, 48
>    0x20304 <+84>:  add    x30, x30, #0x0            ; =0x0
>    0x20308 <+88>:  adrp   x8, 48
>    0x2030c <+92>:  add    x8, x8, #0x8              ; =0x8
>    0x20310 <+96>:  ldr    x9, [sp, #0x8]
>    0x20314 <+100>: add    x8, x8, x9
>    0x20318 <+104>: strb   w0, [x8]
>    0x2031c <+108>: ldr    x8, [x30]
>    0x20320 <+112>: ldr    x9, [sp, #0x8]
>    0x20324 <+116>: add    x8, x8, x9
>    0x20328 <+120>: strb   w0, [x8]
>    0x2032c <+124>: ldr    x8, [sp, #0x8]
>    0x20330 <+128>: add    x8, x8, #0x1              ; =0x1
>    0x20334 <+132>: str    x8, [sp, #0x8]
>    0x20338 <+136>: b      0x202e8                   ; <+56> at swap_testing.c
>    0x2033c <+140>: ldp    x29, x30, [sp, #0x10]
>    0x20340 <+144>: add    sp, sp, #0x20             ; =0x20
>    0x20344 <+148>: ret    
>
> (lldb) disass -n test_check
> a.out`test_check:
>    0x20348 <+0>:   sub    sp, sp, #0x20             ; =0x20
>    0x2034c <+4>:   stp    x29, x30, [sp, #0x10]
>    0x20350 <+8>:   add    x29, sp, #0x10            ; =0x10
>    0x20354 <+12>:  b      0x20358                   ; <+16> at swap_testing.c
>    0x20358 <+16>:  mov    w8, wzr
>    0x2035c <+20>:  adrp   x9, 51
>    0x20360 <+24>:  add    x9, x9, #0x808            ; =0x808
>    0x20364 <+28>:  ldrb   w10, [x9]
>    0x20368 <+32>:  stur   w8, [x29, #-0x4]
>    0x2036c <+36>:  tbnz   w10, #0x0, 0x2038c        ; <+68> at swap_testing.c
>    0x20370 <+40>:  orr    x8, xzr, #0x3800
>    0x20374 <+44>:  adrp   x9, 51
>    0x20378 <+48>:  add    x9, x9, #0x810            ; =0x810
>    0x2037c <+52>:  ldr    x9, [x9]
>    0x20380 <+56>:  cmp    x9, x8
>    0x20384 <+60>:  cset   w10, lo
>    0x20388 <+64>:  stur   w10, [x29, #-0x4]
>    0x2038c <+68>:  ldur   w8, [x29, #-0x4]
>    0x20390 <+72>:  tbz    w8, #0x0, 0x203ec         ; <+164> at swap_testing.c:95
>    0x20394 <+76>:  adrp   x8, 51
>    0x20398 <+80>:  add    x8, x8, #0x810            ; =0x810
>    0x2039c <+84>:  ldr    x0, [x8]
>    0x203a0 <+88>:  bl     0x204cc                   ; value at swap_testing.c:72
>    0x203a4 <+92>:  adrp   x8, 51
>    0x203a8 <+96>:  add    x8, x8, #0x810            ; =0x810
>    0x203ac <+100>: adrp   x30, 51
>    0x203b0 <+104>: add    x30, x30, #0x808          ; =0x808
>    0x203b4 <+108>: adrp   x9, 48
>    0x203b8 <+112>: add    x9, x9, #0x8              ; =0x8
>    0x203bc <+116>: uxtb   w0, w0
>    0x203c0 <+120>: ldr    x10, [x8]
>    0x203c4 <+124>: add    x9, x9, x10
>    0x203c8 <+128>: ldrb   w11, [x9]
>    0x203cc <+132>: cmp    w0, w11
>    0x203d0 <+136>: cset   w11, ne
>    0x203d4 <+140>: and    w11, w11, #0x1
>    0x203d8 <+144>: strb   w11, [x30]
>    0x203dc <+148>: ldr    x9, [x8]
>    0x203e0 <+152>: add    x9, x9, #0x1              ; =0x1
>    0x203e4 <+156>: str    x9, [x8]
>    0x203e8 <+160>: b      0x20358                   ; <+16> at swap_testing.c
>    0x203ec <+164>: b      0x203f0                   ; <+168> at swap_testing.c
>    0x203f0 <+168>: mov    w8, wzr
>    0x203f4 <+172>: adrp   x9, 51
>    0x203f8 <+176>: add    x9, x9, #0x818            ; =0x818
>    0x203fc <+180>: ldrb   w10, [x9]
>    0x20400 <+184>: str    w8, [sp, #0x8]
>    0x20404 <+188>: tbnz   w10, #0x0, 0x20424        ; <+220> at swap_testing.c
>    0x20408 <+192>: orr    x8, xzr, #0x3800
>    0x2040c <+196>: adrp   x9, 51
>    0x20410 <+200>: add    x9, x9, #0x820            ; =0x820
>    0x20414 <+204>: ldr    x9, [x9]
>    0x20418 <+208>: cmp    x9, x8
>    0x2041c <+212>: cset   w10, lo
>    0x20420 <+216>: str    w10, [sp, #0x8]
>    0x20424 <+220>: ldr    w8, [sp, #0x8]
>    0x20428 <+224>: tbz    w8, #0x0, 0x20488         ; <+320> at swap_testing.c
>    0x2042c <+228>: adrp   x8, 51
>    0x20430 <+232>: add    x8, x8, #0x820            ; =0x820
>    0x20434 <+236>: ldr    x0, [x8]
>    0x20438 <+240>: bl     0x204cc                   ; value at swap_testing.c:72
>    0x2043c <+244>: adrp   x8, 51
>    0x20440 <+248>: add    x8, x8, #0x820            ; =0x820
>    0x20444 <+252>: adrp   x30, 51
>    0x20448 <+256>: add    x30, x30, #0x818          ; =0x818
>    0x2044c <+260>: adrp   x9, 48
>    0x20450 <+264>: add    x9, x9, #0x0              ; =0x0
>    0x20454 <+268>: uxtb   w0, w0
>    0x20458 <+272>: ldr    x9, [x9]
>    0x2045c <+276>: ldr    x10, [x8]
>    0x20460 <+280>: add    x9, x9, x10
>    0x20464 <+284>: ldrb   w11, [x9]
>    0x20468 <+288>: cmp    w0, w11
>    0x2046c <+292>: cset   w11, ne
>    0x20470 <+296>: and    w11, w11, #0x1
>    0x20474 <+300>: strb   w11, [x30]
>    0x20478 <+304>: ldr    x9, [x8]
>    0x2047c <+308>: add    x9, x9, #0x1              ; =0x1
>    0x20480 <+312>: str    x9, [x8]
>    0x20484 <+316>: b      0x203f0                   ; <+168> at swap_testing.c
>    0x20488 <+320>: adrp   x8, 51
>    0x2048c <+324>: add    x8, x8, #0x808            ; =0x808
>    0x20490 <+328>: ldrb   w9, [x8]
>    0x20494 <+332>: tbz    w9, #0x0, 0x204a4         ; <+348> at swap_testing.c
>    0x20498 <+336>: orr    w0, wzr, #0x6
>    0x2049c <+340>: bl     0x205c0                   ; symbol stub for: raise
>    0x204a0 <+344>: str    w0, [sp, #0x4]
>    0x204a4 <+348>: adrp   x8, 51
>    0x204a8 <+352>: add    x8, x8, #0x818            ; =0x818
>    0x204ac <+356>: ldrb   w9, [x8]
>    0x204b0 <+360>: tbz    w9, #0x0, 0x204c0         ; <+376> at swap_testing.c:105
>    0x204b4 <+364>: orr    w0, wzr, #0x6
>    0x204b8 <+368>: bl     0x205c0                   ; symbol stub for: raise
> ->  0x204bc <+372>: str    w0, [sp]
>    0x204c0 <+376>: ldp    x29, x30, [sp, #0x10]
>    0x204c4 <+380>: add    sp, sp, #0x20             ; =0x20
>    0x204c8 <+384>: ret    
>
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r314638M  arm64 aarch64 1200023 1200023
>
> buildworld buildlkernel did not have MALLOC_PRODUCTION= defined. The kernel is a
> non-debug kernel. (Previous to these experiments my other corruption examples
> were not caught by a debug kernel. I'm not hopeful that this simpler context
> would either.)



===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: amd64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context)

Mark Millard-2
[Another correction I'm afraid --about alternative program variations
this time.]

On 2017-Mar-13, at 11:52 PM, Mark Millard <[hidden email]> wrote:

> I'm still at a loss about how to figure out what stages are messed
> up. (Memory coherency? Some memory not swapped out? Bad data swapped
> out? Wrong data swapped in?)
>
> But at least I've found a much smaller/simpler example to demonstrate
> some problem with in my Pine64+_ 2GB context.
>
> The Pine64+ 2GB is the only amd64 context that I have access to.

Someday I'll learn to type arm64 the first time instead of amd64.

> The following program fails its check for data
> having its expected byte pattern in dynamically
> allocated memory after a fork/swap-out/swap-in
> sequence.
>
> I'll note that the program sleeps for 60s after
> forking to give time to do something else to
> cause the parent and child processes to swap
> out (RES=0 as seen in top).

The following about the extra test_check() was
wrong.

> Note the source code line:
>
>   // test_check(); // Adding this line prevents failure.
>
> It seem that accessing the region contents before forking
> and swapping avoids the problem. But there is a problem
> if the region was only written-to before the fork/swap.

This was because I'd carelessly moved some loop variables to
globals in a way that depended on the initialization of the
globals and the extra call changed those values.

I've noted code adjustments below (3 lines). I get the failures
with them as well.

> Another point is the size of the region matters: <= 14K Bytes
> fails and > 14K Bytes works for as much has I have tested.
>
>
> # more swap_testing.c
> // swap_testing.c
>
> // Built via (c++ was clang++ 4.0 in my case):
> //
> // cc -g -std=c11 -Wpedantic swap_testing.c
> // -O0 and -O2 also gets the problem.
>
> #include <unistd.h>     // for fork(), sleep(.)
> #include <sys/types.h>  // for pid_t
> #include <sys/wait.h>   // for wait(.)
>
> extern void test_setup(void); // Sets up the memory byte pattern.
> extern void test_check(void); // Tests the memory byte pattern.
>
> int main(void)
> {
>   test_setup();
   test_check(); // This test passes.

>
>   pid_t pid = fork();
>   int wait_status = 0;;
>
>   if (0<pid) { wait(&wait_status); }
>
>   if (-1!=wait_status && 0<=pid)
>   {
>       if (0==pid)
>       {
>           sleep(60);
>
>           // During this manually force this process to
>           // swap out. I use something like:
>
>           // stress -m 1 --vm-bytes 1800M
>
>           // in another shell and ^C'ing it after top
>           // shows the swapped status desired. 1800M
>           // just happened to work on the Pine64+ 2GB
>           // that I was using.
>       }
>
>       test_check();
>   }
> }
>
> // The memory and test code follows.
>
> #include <stdbool.h>    // for bool, true, false
> #include <stddef.h>     // for size_t, NULL
> #include <stdlib.h>     // for malloc(.), free(.)
>
> #include <signal.h>     // for raise(.), SIGABRT
>
> #define region_size (14u*1024u)
>                       // Bad dyn_region pattern, parent and child
>                       // processes:
>                       //  256u, 4u*1024u, 8u*1024u, 9u*1024u,
>                       // 12u*1024u, 14u*1024u
>
>                       // Works:
>                       // 14u*1024u+1u, 15u*1024u, 16u*1024u,
>                       // 32u*1024u, 256u*1024u*1024u
>
> typedef volatile unsigned char value_type;
>
> struct region_struct { value_type array[region_size]; };
> typedef struct region_struct region;
>
> static region            gbl_region;
> static region * volatile dyn_region = NULL;
>
> static value_type value(size_t v) { return (value_type)v; }
>
> void test_setup(void) {
>   dyn_region = malloc(sizeof(region));
>   if (!dyn_region) raise(SIGABRT);
>
>   for(size_t i=0u; i<region_size; i++) {
>       (*dyn_region).array[i] = gbl_region.array[i] = value(i);
>   }
> }
>
> static volatile bool gbl_failed = false; // Until potentially disproved
> static volatile size_t gbl_pos = 0u;
>
> static volatile bool dyn_failed = false; // Until potentially disproved
> static volatile size_t dyn_pos = 0u;
>
> void test_check(void) {
   gbl_pos = 0u;
>   while (!gbl_failed && gbl_pos<region_size) {
>       gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]);
>       gbl_pos++;
>   }
>
   dyn_pos = 0u;

>   while (!dyn_failed && dyn_pos<region_size) {
>       dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]);
>       // Note: When the memory pattern fails this case is that
>       //       records the failure.
>       dyn_pos++;
>   }
>
>   if (gbl_failed) raise(SIGABRT);
>   if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
>                                   // when it fails (both parent and child processes).
> }

I'm not bothering to redo the details below for the
line number variations.

> Other details from lldb (not using -O2 so things are
> simpler, not presented in the order examined):
>
> # lldb a.out -c /var/crash/a.out.11575.core
> (lldb) target create "a.out" --core "/var/crash/a.out.11575.core"
> Core file '/var/crash/a.out.11575.core' (aarch64) was loaded.
> (lldb) bt
> * thread #1, name = 'a.out', stop reason = signal SIGABRT
> * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8
>   frame #1: libc.so.7`__raise(s=6) at raise.c:52
>   frame #2: a.out`test_check at swap_testing.c:103
>   frame #3: a.out`main at swap_testing.c:42
>   frame #4: 0x0000000000020184 a.out`__start + 364
>   frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41
>
> (lldb) up 2
> frame #2: a.out`test_check at swap_testing.c:103
>  100    }
>  101
>  102    if (gbl_failed) raise(SIGABRT);
> -> 103    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
>  104                                    // when it fails (both parent and child processes).
>  105 }
>
> (lldb) print dyn_pos
> (size_t) $0 = 2
>
> (That is one after the failure position.)
>
>
> (lldb) print dyn_region
> (region *volatile) $3 = 0x0000000040616000
>
> (lldb) print *dyn_region
> (region) $1 = {
> array = {
>   [0] = '\0'
>   [1] = '\0'
>   [2] = '\0'
> . . . (all '\0' bytes) . . .
>   [251] = '\0'
>   [252] = '\0'
>   [253] = '\0'
>   [254] = '\0'
>   [255] = '\0'
>   ...
> }
> }
>
> (lldb) print gbl_region
> (region) $2 = {
> array = {
>   [0] = '\0'
>   [1] = '\x01'
>   [2] = '\x02'
> . . .
>   [251] = '\xfb'
>   [252] = '\xfc'
>   [253] = '\xfd'
>   [254] = '\xfe'
>   [255] = '\xff'
>   ...
> }
> }
>
> (lldb) disass -n main
> a.out`main:
>   0x2022c <+0>:   sub    sp, sp, #0x30             ; =0x30
>   0x20230 <+4>:   stp    x29, x30, [sp, #0x20]
>   0x20234 <+8>:   add    x29, sp, #0x20            ; =0x20
>   0x20238 <+12>:  stur   wzr, [x29, #-0x4]
>   0x2023c <+16>:  bl     0x202b0                   ; test_setup at swap_testing.c:74
>   0x20240 <+20>:  bl     0x20580                   ; symbol stub for: fork
>   0x20244 <+24>:  mov    w8, wzr
>   0x20248 <+28>:  stur   w0, [x29, #-0x8]
>   0x2024c <+32>:  stur   wzr, [x29, #-0xc]
>   0x20250 <+36>:  ldur   w0, [x29, #-0x8]
>   0x20254 <+40>:  cmp    w8, w0
>   0x20258 <+44>:  b.ge   0x20268                   ; <+60> at swap_testing.c
>   0x2025c <+48>:  sub    x0, x29, #0xc             ; =0xc
>   0x20260 <+52>:  bl     0x20590                   ; symbol stub for: wait
>   0x20264 <+56>:  str    w0, [sp, #0x10]
>   0x20268 <+60>:  mov    w8, #-0x1
>   0x2026c <+64>:  ldur   w9, [x29, #-0xc]
>   0x20270 <+68>:  cmp    w8, w9
>   0x20274 <+72>:  b.eq   0x202a0                   ; <+116> at swap_testing.c:44
>   0x20278 <+76>:  mov    w8, wzr
>   0x2027c <+80>:  ldur   w9, [x29, #-0x8]
>   0x20280 <+84>:  cmp    w8, w9
>   0x20284 <+88>:  b.gt   0x202a0                   ; <+116> at swap_testing.c:44
>   0x20288 <+92>:  ldur   w8, [x29, #-0x8]
>   0x2028c <+96>:  cbnz   w8, 0x2029c               ; <+112> at swap_testing.c:42
>   0x20290 <+100>: orr    w0, wzr, #0x3c
>   0x20294 <+104>: bl     0x205a0                   ; symbol stub for: sleep
>   0x20298 <+108>: str    w0, [sp, #0xc]
>   0x2029c <+112>: bl     0x20348                   ; test_check at swap_testing.c:89
>   0x202a0 <+116>: ldur   w0, [x29, #-0x4]
>   0x202a4 <+120>: ldp    x29, x30, [sp, #0x20]
>   0x202a8 <+124>: add    sp, sp, #0x30             ; =0x30
>   0x202ac <+128>: ret    
>
> (lldb) disass -n value
> a.out`value:
>   0x204cc <+0>:  sub    sp, sp, #0x10             ; =0x10
>   0x204d0 <+4>:  str    x0, [sp, #0x8]
>   0x204d4 <+8>:  ldrb   w8, [sp, #0x8]
>   0x204d8 <+12>: mov    w1, w8
>   0x204dc <+16>: mov    w0, w8
>   0x204e0 <+20>: str    w1, [sp, #0x4]
>   0x204e4 <+24>: add    sp, sp, #0x10             ; =0x10
>   0x204e8 <+28>: ret    
>
> (lldb) disass -n test_setup
> a.out`test_setup:
>   0x202b0 <+0>:   sub    sp, sp, #0x20             ; =0x20
>   0x202b4 <+4>:   stp    x29, x30, [sp, #0x10]
>   0x202b8 <+8>:   add    x29, sp, #0x10            ; =0x10
>   0x202bc <+12>:  orr    x0, xzr, #0x3800
>   0x202c0 <+16>:  bl     0x205b0                   ; symbol stub for: malloc
>   0x202c4 <+20>:  adrp   x30, 48
>   0x202c8 <+24>:  add    x30, x30, #0x0            ; =0x0
>   0x202cc <+28>:  str    x0, [x30]
>   0x202d0 <+32>:  ldr    x0, [x30]
>   0x202d4 <+36>:  cbnz   x0, 0x202e4               ; <+52> at swap_testing.c:78
>   0x202d8 <+40>:  orr    w0, wzr, #0x6
>   0x202dc <+44>:  bl     0x205c0                   ; symbol stub for: raise
>   0x202e0 <+48>:  str    w0, [sp, #0x4]
>   0x202e4 <+52>:  str    xzr, [sp, #0x8]
>   0x202e8 <+56>:  orr    x8, xzr, #0x3800
>   0x202ec <+60>:  ldr    x9, [sp, #0x8]
>   0x202f0 <+64>:  cmp    x9, x8
>   0x202f4 <+68>:  b.hs   0x2033c                   ; <+140> at swap_testing.c:81
>   0x202f8 <+72>:  ldr    x0, [sp, #0x8]
>   0x202fc <+76>:  bl     0x204cc                   ; value at swap_testing.c:72
>   0x20300 <+80>:  adrp   x30, 48
>   0x20304 <+84>:  add    x30, x30, #0x0            ; =0x0
>   0x20308 <+88>:  adrp   x8, 48
>   0x2030c <+92>:  add    x8, x8, #0x8              ; =0x8
>   0x20310 <+96>:  ldr    x9, [sp, #0x8]
>   0x20314 <+100>: add    x8, x8, x9
>   0x20318 <+104>: strb   w0, [x8]
>   0x2031c <+108>: ldr    x8, [x30]
>   0x20320 <+112>: ldr    x9, [sp, #0x8]
>   0x20324 <+116>: add    x8, x8, x9
>   0x20328 <+120>: strb   w0, [x8]
>   0x2032c <+124>: ldr    x8, [sp, #0x8]
>   0x20330 <+128>: add    x8, x8, #0x1              ; =0x1
>   0x20334 <+132>: str    x8, [sp, #0x8]
>   0x20338 <+136>: b      0x202e8                   ; <+56> at swap_testing.c
>   0x2033c <+140>: ldp    x29, x30, [sp, #0x10]
>   0x20340 <+144>: add    sp, sp, #0x20             ; =0x20
>   0x20344 <+148>: ret    
>
> (lldb) disass -n test_check
> a.out`test_check:
>   0x20348 <+0>:   sub    sp, sp, #0x20             ; =0x20
>   0x2034c <+4>:   stp    x29, x30, [sp, #0x10]
>   0x20350 <+8>:   add    x29, sp, #0x10            ; =0x10
>   0x20354 <+12>:  b      0x20358                   ; <+16> at swap_testing.c
>   0x20358 <+16>:  mov    w8, wzr
>   0x2035c <+20>:  adrp   x9, 51
>   0x20360 <+24>:  add    x9, x9, #0x808            ; =0x808
>   0x20364 <+28>:  ldrb   w10, [x9]
>   0x20368 <+32>:  stur   w8, [x29, #-0x4]
>   0x2036c <+36>:  tbnz   w10, #0x0, 0x2038c        ; <+68> at swap_testing.c
>   0x20370 <+40>:  orr    x8, xzr, #0x3800
>   0x20374 <+44>:  adrp   x9, 51
>   0x20378 <+48>:  add    x9, x9, #0x810            ; =0x810
>   0x2037c <+52>:  ldr    x9, [x9]
>   0x20380 <+56>:  cmp    x9, x8
>   0x20384 <+60>:  cset   w10, lo
>   0x20388 <+64>:  stur   w10, [x29, #-0x4]
>   0x2038c <+68>:  ldur   w8, [x29, #-0x4]
>   0x20390 <+72>:  tbz    w8, #0x0, 0x203ec         ; <+164> at swap_testing.c:95
>   0x20394 <+76>:  adrp   x8, 51
>   0x20398 <+80>:  add    x8, x8, #0x810            ; =0x810
>   0x2039c <+84>:  ldr    x0, [x8]
>   0x203a0 <+88>:  bl     0x204cc                   ; value at swap_testing.c:72
>   0x203a4 <+92>:  adrp   x8, 51
>   0x203a8 <+96>:  add    x8, x8, #0x810            ; =0x810
>   0x203ac <+100>: adrp   x30, 51
>   0x203b0 <+104>: add    x30, x30, #0x808          ; =0x808
>   0x203b4 <+108>: adrp   x9, 48
>   0x203b8 <+112>: add    x9, x9, #0x8              ; =0x8
>   0x203bc <+116>: uxtb   w0, w0
>   0x203c0 <+120>: ldr    x10, [x8]
>   0x203c4 <+124>: add    x9, x9, x10
>   0x203c8 <+128>: ldrb   w11, [x9]
>   0x203cc <+132>: cmp    w0, w11
>   0x203d0 <+136>: cset   w11, ne
>   0x203d4 <+140>: and    w11, w11, #0x1
>   0x203d8 <+144>: strb   w11, [x30]
>   0x203dc <+148>: ldr    x9, [x8]
>   0x203e0 <+152>: add    x9, x9, #0x1              ; =0x1
>   0x203e4 <+156>: str    x9, [x8]
>   0x203e8 <+160>: b      0x20358                   ; <+16> at swap_testing.c
>   0x203ec <+164>: b      0x203f0                   ; <+168> at swap_testing.c
>   0x203f0 <+168>: mov    w8, wzr
>   0x203f4 <+172>: adrp   x9, 51
>   0x203f8 <+176>: add    x9, x9, #0x818            ; =0x818
>   0x203fc <+180>: ldrb   w10, [x9]
>   0x20400 <+184>: str    w8, [sp, #0x8]
>   0x20404 <+188>: tbnz   w10, #0x0, 0x20424        ; <+220> at swap_testing.c
>   0x20408 <+192>: orr    x8, xzr, #0x3800
>   0x2040c <+196>: adrp   x9, 51
>   0x20410 <+200>: add    x9, x9, #0x820            ; =0x820
>   0x20414 <+204>: ldr    x9, [x9]
>   0x20418 <+208>: cmp    x9, x8
>   0x2041c <+212>: cset   w10, lo
>   0x20420 <+216>: str    w10, [sp, #0x8]
>   0x20424 <+220>: ldr    w8, [sp, #0x8]
>   0x20428 <+224>: tbz    w8, #0x0, 0x20488         ; <+320> at swap_testing.c
>   0x2042c <+228>: adrp   x8, 51
>   0x20430 <+232>: add    x8, x8, #0x820            ; =0x820
>   0x20434 <+236>: ldr    x0, [x8]
>   0x20438 <+240>: bl     0x204cc                   ; value at swap_testing.c:72
>   0x2043c <+244>: adrp   x8, 51
>   0x20440 <+248>: add    x8, x8, #0x820            ; =0x820
>   0x20444 <+252>: adrp   x30, 51
>   0x20448 <+256>: add    x30, x30, #0x818          ; =0x818
>   0x2044c <+260>: adrp   x9, 48
>   0x20450 <+264>: add    x9, x9, #0x0              ; =0x0
>   0x20454 <+268>: uxtb   w0, w0
>   0x20458 <+272>: ldr    x9, [x9]
>   0x2045c <+276>: ldr    x10, [x8]
>   0x20460 <+280>: add    x9, x9, x10
>   0x20464 <+284>: ldrb   w11, [x9]
>   0x20468 <+288>: cmp    w0, w11
>   0x2046c <+292>: cset   w11, ne
>   0x20470 <+296>: and    w11, w11, #0x1
>   0x20474 <+300>: strb   w11, [x30]
>   0x20478 <+304>: ldr    x9, [x8]
>   0x2047c <+308>: add    x9, x9, #0x1              ; =0x1
>   0x20480 <+312>: str    x9, [x8]
>   0x20484 <+316>: b      0x203f0                   ; <+168> at swap_testing.c
>   0x20488 <+320>: adrp   x8, 51
>   0x2048c <+324>: add    x8, x8, #0x808            ; =0x808
>   0x20490 <+328>: ldrb   w9, [x8]
>   0x20494 <+332>: tbz    w9, #0x0, 0x204a4         ; <+348> at swap_testing.c
>   0x20498 <+336>: orr    w0, wzr, #0x6
>   0x2049c <+340>: bl     0x205c0                   ; symbol stub for: raise
>   0x204a0 <+344>: str    w0, [sp, #0x4]
>   0x204a4 <+348>: adrp   x8, 51
>   0x204a8 <+352>: add    x8, x8, #0x818            ; =0x818
>   0x204ac <+356>: ldrb   w9, [x8]
>   0x204b0 <+360>: tbz    w9, #0x0, 0x204c0         ; <+376> at swap_testing.c:105
>   0x204b4 <+364>: orr    w0, wzr, #0x6
>   0x204b8 <+368>: bl     0x205c0                   ; symbol stub for: raise
> ->  0x204bc <+372>: str    w0, [sp]
>   0x204c0 <+376>: ldp    x29, x30, [sp, #0x10]
>   0x204c4 <+380>: add    sp, sp, #0x20             ; =0x20
>   0x204c8 <+384>: ret    
>
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r314638M  arm64 aarch64 1200023 1200023
>
> buildworld buildlkernel did not have MALLOC_PRODUCTION= defined. The kernel is a
> non-debug kernel. (Previous to these experiments my other corruption examples
> were not caught by a debug kernel. I'm not hopeful that this simpler context
> would either.)



===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
[This is just a correction to the subject-line text to say arm64
instead of amd64.]

On 2017-Mar-14, at 12:58 AM, Mark Millard <[hidden email]> wrote:

[Another correction I'm afraid --about alternative program variations
this time.]

On 2017-Mar-13, at 11:52 PM, Mark Millard <[hidden email]> wrote:

> I'm still at a loss about how to figure out what stages are messed
> up. (Memory coherency? Some memory not swapped out? Bad data swapped
> out? Wrong data swapped in?)
>
> But at least I've found a much smaller/simpler example to demonstrate
> some problem with in my Pine64+_ 2GB context.
>
> The Pine64+ 2GB is the only amd64 context that I have access to.

Someday I'll learn to type arm64 the first time instead of amd64.

> The following program fails its check for data
> having its expected byte pattern in dynamically
> allocated memory after a fork/swap-out/swap-in
> sequence.
>
> I'll note that the program sleeps for 60s after
> forking to give time to do something else to
> cause the parent and child processes to swap
> out (RES=0 as seen in top).

The following about the extra test_check() was
wrong.

> Note the source code line:
>
>  // test_check(); // Adding this line prevents failure.
>
> It seem that accessing the region contents before forking
> and swapping avoids the problem. But there is a problem
> if the region was only written-to before the fork/swap.

This was because I'd carelessly moved some loop variables to
globals in a way that depended on the initialization of the
globals and the extra call changed those values.

I've noted code adjustments below (3 lines). I get the failures
with them as well.

> Another point is the size of the region matters: <= 14K Bytes
> fails and > 14K Bytes works for as much has I have tested.
>
>
> # more swap_testing.c
> // swap_testing.c
>
> // Built via (c++ was clang++ 4.0 in my case):
> //
> // cc -g -std=c11 -Wpedantic swap_testing.c
> // -O0 and -O2 also gets the problem.
>
> #include <unistd.h>     // for fork(), sleep(.)
> #include <sys/types.h>  // for pid_t
> #include <sys/wait.h>   // for wait(.)
>
> extern void test_setup(void); // Sets up the memory byte pattern.
> extern void test_check(void); // Tests the memory byte pattern.
>
> int main(void)
> {
>  test_setup();
  test_check(); // This test passes.

>
>  pid_t pid = fork();
>  int wait_status = 0;;
>
>  if (0<pid) { wait(&wait_status); }
>
>  if (-1!=wait_status && 0<=pid)
>  {
>      if (0==pid)
>      {
>          sleep(60);
>
>          // During this manually force this process to
>          // swap out. I use something like:
>
>          // stress -m 1 --vm-bytes 1800M
>
>          // in another shell and ^C'ing it after top
>          // shows the swapped status desired. 1800M
>          // just happened to work on the Pine64+ 2GB
>          // that I was using.
>      }
>
>      test_check();
>  }
> }
>
> // The memory and test code follows.
>
> #include <stdbool.h>    // for bool, true, false
> #include <stddef.h>     // for size_t, NULL
> #include <stdlib.h>     // for malloc(.), free(.)
>
> #include <signal.h>     // for raise(.), SIGABRT
>
> #define region_size (14u*1024u)
>                      // Bad dyn_region pattern, parent and child
>                      // processes:
>                      //  256u, 4u*1024u, 8u*1024u, 9u*1024u,
>                      // 12u*1024u, 14u*1024u
>
>                      // Works:
>                      // 14u*1024u+1u, 15u*1024u, 16u*1024u,
>                      // 32u*1024u, 256u*1024u*1024u
>
> typedef volatile unsigned char value_type;
>
> struct region_struct { value_type array[region_size]; };
> typedef struct region_struct region;
>
> static region            gbl_region;
> static region * volatile dyn_region = NULL;
>
> static value_type value(size_t v) { return (value_type)v; }
>
> void test_setup(void) {
>  dyn_region = malloc(sizeof(region));
>  if (!dyn_region) raise(SIGABRT);
>
>  for(size_t i=0u; i<region_size; i++) {
>      (*dyn_region).array[i] = gbl_region.array[i] = value(i);
>  }
> }
>
> static volatile bool gbl_failed = false; // Until potentially disproved
> static volatile size_t gbl_pos = 0u;
>
> static volatile bool dyn_failed = false; // Until potentially disproved
> static volatile size_t dyn_pos = 0u;
>
> void test_check(void) {
  gbl_pos = 0u;
>  while (!gbl_failed && gbl_pos<region_size) {
>      gbl_failed = (value(gbl_pos) != gbl_region.array[gbl_pos]);
>      gbl_pos++;
>  }
>
  dyn_pos = 0u;

>  while (!dyn_failed && dyn_pos<region_size) {
>      dyn_failed = (value(dyn_pos) != (*dyn_region).array[dyn_pos]);
>      // Note: When the memory pattern fails this case is that
>      //       records the failure.
>      dyn_pos++;
>  }
>
>  if (gbl_failed) raise(SIGABRT);
>  if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
>                                  // when it fails (both parent and child processes).
> }

I'm not bothering to redo the details below for the
line number variations.

> Other details from lldb (not using -O2 so things are
> simpler, not presented in the order examined):
>
> # lldb a.out -c /var/crash/a.out.11575.core
> (lldb) target create "a.out" --core "/var/crash/a.out.11575.core"
> Core file '/var/crash/a.out.11575.core' (aarch64) was loaded.
> (lldb) bt
> * thread #1, name = 'a.out', stop reason = signal SIGABRT
> * frame #0: 0x0000000040113d38 libc.so.7`_thr_kill + 8
>  frame #1: libc.so.7`__raise(s=6) at raise.c:52
>  frame #2: a.out`test_check at swap_testing.c:103
>  frame #3: a.out`main at swap_testing.c:42
>  frame #4: 0x0000000000020184 a.out`__start + 364
>  frame #5: ld-elf.so.1`.rtld_start at rtld_start.S:41
>
> (lldb) up 2
> frame #2: a.out`test_check at swap_testing.c:103
> 100    }
> 101
> 102    if (gbl_failed) raise(SIGABRT);
> -> 103    if (dyn_failed) raise(SIGABRT); // lldb reports this line for the __raise call.
> 104                                    // when it fails (both parent and child processes).
> 105 }
>
> (lldb) print dyn_pos
> (size_t) $0 = 2
>
> (That is one after the failure position.)
>
>
> (lldb) print dyn_region
> (region *volatile) $3 = 0x0000000040616000
>
> (lldb) print *dyn_region
> (region) $1 = {
> array = {
>  [0] = '\0'
>  [1] = '\0'
>  [2] = '\0'
> . . . (all '\0' bytes) . . .
>  [251] = '\0'
>  [252] = '\0'
>  [253] = '\0'
>  [254] = '\0'
>  [255] = '\0'
>  ...
> }
> }
>
> (lldb) print gbl_region
> (region) $2 = {
> array = {
>  [0] = '\0'
>  [1] = '\x01'
>  [2] = '\x02'
> . . .
>  [251] = '\xfb'
>  [252] = '\xfc'
>  [253] = '\xfd'
>  [254] = '\xfe'
>  [255] = '\xff'
>  ...
> }
> }
>
> (lldb) disass -n main
> a.out`main:
>  0x2022c <+0>:   sub    sp, sp, #0x30             ; =0x30
>  0x20230 <+4>:   stp    x29, x30, [sp, #0x20]
>  0x20234 <+8>:   add    x29, sp, #0x20            ; =0x20
>  0x20238 <+12>:  stur   wzr, [x29, #-0x4]
>  0x2023c <+16>:  bl     0x202b0                   ; test_setup at swap_testing.c:74
>  0x20240 <+20>:  bl     0x20580                   ; symbol stub for: fork
>  0x20244 <+24>:  mov    w8, wzr
>  0x20248 <+28>:  stur   w0, [x29, #-0x8]
>  0x2024c <+32>:  stur   wzr, [x29, #-0xc]
>  0x20250 <+36>:  ldur   w0, [x29, #-0x8]
>  0x20254 <+40>:  cmp    w8, w0
>  0x20258 <+44>:  b.ge   0x20268                   ; <+60> at swap_testing.c
>  0x2025c <+48>:  sub    x0, x29, #0xc             ; =0xc
>  0x20260 <+52>:  bl     0x20590                   ; symbol stub for: wait
>  0x20264 <+56>:  str    w0, [sp, #0x10]
>  0x20268 <+60>:  mov    w8, #-0x1
>  0x2026c <+64>:  ldur   w9, [x29, #-0xc]
>  0x20270 <+68>:  cmp    w8, w9
>  0x20274 <+72>:  b.eq   0x202a0                   ; <+116> at swap_testing.c:44
>  0x20278 <+76>:  mov    w8, wzr
>  0x2027c <+80>:  ldur   w9, [x29, #-0x8]
>  0x20280 <+84>:  cmp    w8, w9
>  0x20284 <+88>:  b.gt   0x202a0                   ; <+116> at swap_testing.c:44
>  0x20288 <+92>:  ldur   w8, [x29, #-0x8]
>  0x2028c <+96>:  cbnz   w8, 0x2029c               ; <+112> at swap_testing.c:42
>  0x20290 <+100>: orr    w0, wzr, #0x3c
>  0x20294 <+104>: bl     0x205a0                   ; symbol stub for: sleep
>  0x20298 <+108>: str    w0, [sp, #0xc]
>  0x2029c <+112>: bl     0x20348                   ; test_check at swap_testing.c:89
>  0x202a0 <+116>: ldur   w0, [x29, #-0x4]
>  0x202a4 <+120>: ldp    x29, x30, [sp, #0x20]
>  0x202a8 <+124>: add    sp, sp, #0x30             ; =0x30
>  0x202ac <+128>: ret    
>
> (lldb) disass -n value
> a.out`value:
>  0x204cc <+0>:  sub    sp, sp, #0x10             ; =0x10
>  0x204d0 <+4>:  str    x0, [sp, #0x8]
>  0x204d4 <+8>:  ldrb   w8, [sp, #0x8]
>  0x204d8 <+12>: mov    w1, w8
>  0x204dc <+16>: mov    w0, w8
>  0x204e0 <+20>: str    w1, [sp, #0x4]
>  0x204e4 <+24>: add    sp, sp, #0x10             ; =0x10
>  0x204e8 <+28>: ret    
>
> (lldb) disass -n test_setup
> a.out`test_setup:
>  0x202b0 <+0>:   sub    sp, sp, #0x20             ; =0x20
>  0x202b4 <+4>:   stp    x29, x30, [sp, #0x10]
>  0x202b8 <+8>:   add    x29, sp, #0x10            ; =0x10
>  0x202bc <+12>:  orr    x0, xzr, #0x3800
>  0x202c0 <+16>:  bl     0x205b0                   ; symbol stub for: malloc
>  0x202c4 <+20>:  adrp   x30, 48
>  0x202c8 <+24>:  add    x30, x30, #0x0            ; =0x0
>  0x202cc <+28>:  str    x0, [x30]
>  0x202d0 <+32>:  ldr    x0, [x30]
>  0x202d4 <+36>:  cbnz   x0, 0x202e4               ; <+52> at swap_testing.c:78
>  0x202d8 <+40>:  orr    w0, wzr, #0x6
>  0x202dc <+44>:  bl     0x205c0                   ; symbol stub for: raise
>  0x202e0 <+48>:  str    w0, [sp, #0x4]
>  0x202e4 <+52>:  str    xzr, [sp, #0x8]
>  0x202e8 <+56>:  orr    x8, xzr, #0x3800
>  0x202ec <+60>:  ldr    x9, [sp, #0x8]
>  0x202f0 <+64>:  cmp    x9, x8
>  0x202f4 <+68>:  b.hs   0x2033c                   ; <+140> at swap_testing.c:81
>  0x202f8 <+72>:  ldr    x0, [sp, #0x8]
>  0x202fc <+76>:  bl     0x204cc                   ; value at swap_testing.c:72
>  0x20300 <+80>:  adrp   x30, 48
>  0x20304 <+84>:  add    x30, x30, #0x0            ; =0x0
>  0x20308 <+88>:  adrp   x8, 48
>  0x2030c <+92>:  add    x8, x8, #0x8              ; =0x8
>  0x20310 <+96>:  ldr    x9, [sp, #0x8]
>  0x20314 <+100>: add    x8, x8, x9
>  0x20318 <+104>: strb   w0, [x8]
>  0x2031c <+108>: ldr    x8, [x30]
>  0x20320 <+112>: ldr    x9, [sp, #0x8]
>  0x20324 <+116>: add    x8, x8, x9
>  0x20328 <+120>: strb   w0, [x8]
>  0x2032c <+124>: ldr    x8, [sp, #0x8]
>  0x20330 <+128>: add    x8, x8, #0x1              ; =0x1
>  0x20334 <+132>: str    x8, [sp, #0x8]
>  0x20338 <+136>: b      0x202e8                   ; <+56> at swap_testing.c
>  0x2033c <+140>: ldp    x29, x30, [sp, #0x10]
>  0x20340 <+144>: add    sp, sp, #0x20             ; =0x20
>  0x20344 <+148>: ret    
>
> (lldb) disass -n test_check
> a.out`test_check:
>  0x20348 <+0>:   sub    sp, sp, #0x20             ; =0x20
>  0x2034c <+4>:   stp    x29, x30, [sp, #0x10]
>  0x20350 <+8>:   add    x29, sp, #0x10            ; =0x10
>  0x20354 <+12>:  b      0x20358                   ; <+16> at swap_testing.c
>  0x20358 <+16>:  mov    w8, wzr
>  0x2035c <+20>:  adrp   x9, 51
>  0x20360 <+24>:  add    x9, x9, #0x808            ; =0x808
>  0x20364 <+28>:  ldrb   w10, [x9]
>  0x20368 <+32>:  stur   w8, [x29, #-0x4]
>  0x2036c <+36>:  tbnz   w10, #0x0, 0x2038c        ; <+68> at swap_testing.c
>  0x20370 <+40>:  orr    x8, xzr, #0x3800
>  0x20374 <+44>:  adrp   x9, 51
>  0x20378 <+48>:  add    x9, x9, #0x810            ; =0x810
>  0x2037c <+52>:  ldr    x9, [x9]
>  0x20380 <+56>:  cmp    x9, x8
>  0x20384 <+60>:  cset   w10, lo
>  0x20388 <+64>:  stur   w10, [x29, #-0x4]
>  0x2038c <+68>:  ldur   w8, [x29, #-0x4]
>  0x20390 <+72>:  tbz    w8, #0x0, 0x203ec         ; <+164> at swap_testing.c:95
>  0x20394 <+76>:  adrp   x8, 51
>  0x20398 <+80>:  add    x8, x8, #0x810            ; =0x810
>  0x2039c <+84>:  ldr    x0, [x8]
>  0x203a0 <+88>:  bl     0x204cc                   ; value at swap_testing.c:72
>  0x203a4 <+92>:  adrp   x8, 51
>  0x203a8 <+96>:  add    x8, x8, #0x810            ; =0x810
>  0x203ac <+100>: adrp   x30, 51
>  0x203b0 <+104>: add    x30, x30, #0x808          ; =0x808
>  0x203b4 <+108>: adrp   x9, 48
>  0x203b8 <+112>: add    x9, x9, #0x8              ; =0x8
>  0x203bc <+116>: uxtb   w0, w0
>  0x203c0 <+120>: ldr    x10, [x8]
>  0x203c4 <+124>: add    x9, x9, x10
>  0x203c8 <+128>: ldrb   w11, [x9]
>  0x203cc <+132>: cmp    w0, w11
>  0x203d0 <+136>: cset   w11, ne
>  0x203d4 <+140>: and    w11, w11, #0x1
>  0x203d8 <+144>: strb   w11, [x30]
>  0x203dc <+148>: ldr    x9, [x8]
>  0x203e0 <+152>: add    x9, x9, #0x1              ; =0x1
>  0x203e4 <+156>: str    x9, [x8]
>  0x203e8 <+160>: b      0x20358                   ; <+16> at swap_testing.c
>  0x203ec <+164>: b      0x203f0                   ; <+168> at swap_testing.c
>  0x203f0 <+168>: mov    w8, wzr
>  0x203f4 <+172>: adrp   x9, 51
>  0x203f8 <+176>: add    x9, x9, #0x818            ; =0x818
>  0x203fc <+180>: ldrb   w10, [x9]
>  0x20400 <+184>: str    w8, [sp, #0x8]
>  0x20404 <+188>: tbnz   w10, #0x0, 0x20424        ; <+220> at swap_testing.c
>  0x20408 <+192>: orr    x8, xzr, #0x3800
>  0x2040c <+196>: adrp   x9, 51
>  0x20410 <+200>: add    x9, x9, #0x820            ; =0x820
>  0x20414 <+204>: ldr    x9, [x9]
>  0x20418 <+208>: cmp    x9, x8
>  0x2041c <+212>: cset   w10, lo
>  0x20420 <+216>: str    w10, [sp, #0x8]
>  0x20424 <+220>: ldr    w8, [sp, #0x8]
>  0x20428 <+224>: tbz    w8, #0x0, 0x20488         ; <+320> at swap_testing.c
>  0x2042c <+228>: adrp   x8, 51
>  0x20430 <+232>: add    x8, x8, #0x820            ; =0x820
>  0x20434 <+236>: ldr    x0, [x8]
>  0x20438 <+240>: bl     0x204cc                   ; value at swap_testing.c:72
>  0x2043c <+244>: adrp   x8, 51
>  0x20440 <+248>: add    x8, x8, #0x820            ; =0x820
>  0x20444 <+252>: adrp   x30, 51
>  0x20448 <+256>: add    x30, x30, #0x818          ; =0x818
>  0x2044c <+260>: adrp   x9, 48
>  0x20450 <+264>: add    x9, x9, #0x0              ; =0x0
>  0x20454 <+268>: uxtb   w0, w0
>  0x20458 <+272>: ldr    x9, [x9]
>  0x2045c <+276>: ldr    x10, [x8]
>  0x20460 <+280>: add    x9, x9, x10
>  0x20464 <+284>: ldrb   w11, [x9]
>  0x20468 <+288>: cmp    w0, w11
>  0x2046c <+292>: cset   w11, ne
>  0x20470 <+296>: and    w11, w11, #0x1
>  0x20474 <+300>: strb   w11, [x30]
>  0x20478 <+304>: ldr    x9, [x8]
>  0x2047c <+308>: add    x9, x9, #0x1              ; =0x1
>  0x20480 <+312>: str    x9, [x8]
>  0x20484 <+316>: b      0x203f0                   ; <+168> at swap_testing.c
>  0x20488 <+320>: adrp   x8, 51
>  0x2048c <+324>: add    x8, x8, #0x808            ; =0x808
>  0x20490 <+328>: ldrb   w9, [x8]
>  0x20494 <+332>: tbz    w9, #0x0, 0x204a4         ; <+348> at swap_testing.c
>  0x20498 <+336>: orr    w0, wzr, #0x6
>  0x2049c <+340>: bl     0x205c0                   ; symbol stub for: raise
>  0x204a0 <+344>: str    w0, [sp, #0x4]
>  0x204a4 <+348>: adrp   x8, 51
>  0x204a8 <+352>: add    x8, x8, #0x818            ; =0x818
>  0x204ac <+356>: ldrb   w9, [x8]
>  0x204b0 <+360>: tbz    w9, #0x0, 0x204c0         ; <+376> at swap_testing.c:105
>  0x204b4 <+364>: orr    w0, wzr, #0x6
>  0x204b8 <+368>: bl     0x205c0                   ; symbol stub for: raise
> ->  0x204bc <+372>: str    w0, [sp]
>  0x204c0 <+376>: ldp    x29, x30, [sp, #0x10]
>  0x204c4 <+380>: add    sp, sp, #0x20             ; =0x20
>  0x204c8 <+384>: ret    
>
> # uname -apKU
> FreeBSD pine64 12.0-CURRENT FreeBSD 12.0-CURRENT  r314638M  arm64 aarch64 1200023 1200023
>
> buildworld buildlkernel did not have MALLOC_PRODUCTION= defined. The kernel is a
> non-debug kernel. (Previous to these experiments my other corruption examples
> were not caught by a debug kernel. I'm not hopeful that this simpler context
> would either.)



===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
[test_check() between the fork and the wait/sleep prevents the
failure from occurring. Even a small access to the memory at
that stage prevents the failure. Details follow.]

On 2017-Mar-14, at 11:07 AM, Mark Millard <[hidden email]> wrote:

> [This is just a correction to the subject-line text to say arm64
> instead of amd64.]
>
> On 2017-Mar-14, at 12:58 AM, Mark Millard <[hidden email]> wrote:
>
> [Another correction I'm afraid --about alternative program variations
> this time.]
>
> On 2017-Mar-13, at 11:52 PM, Mark Millard <[hidden email]> wrote:
>
>> I'm still at a loss about how to figure out what stages are messed
>> up. (Memory coherency? Some memory not swapped out? Bad data swapped
>> out? Wrong data swapped in?)
>>
>> But at least I've found a much smaller/simpler example to demonstrate
>> some problem with in my Pine64+_ 2GB context.
>>
>> The Pine64+ 2GB is the only amd64 context that I have access to.
>
> Someday I'll learn to type arm64 the first time instead of amd64.
>
>> The following program fails its check for data
>> having its expected byte pattern in dynamically
>> allocated memory after a fork/swap-out/swap-in
>> sequence.
>>
>> I'll note that the program sleeps for 60s after
>> forking to give time to do something else to
>> cause the parent and child processes to swap
>> out (RES=0 as seen in top).
>
> The following about the extra test_check() was
> wrong.
>
>> Note the source code line:
>>
>> // test_check(); // Adding this line prevents failure.
>>
>> It seem that accessing the region contents before forking
>> and swapping avoids the problem. But there is a problem
>> if the region was only written-to before the fork/swap.

There is a place that if a test_check call is put then the
problem does not happen at any stage: I tried putting a
call between the fork and the later wait/sleep code:

int main(void) {
    test_setup();
    test_check(); // Before fork() [passes]

    pid_t pid = fork();
    int wait_status = 0;;

    // test_check(); // After fork(); before wait/sleep.
                     // If used it prevents failure later!

    if (0<pid) { wait(&wait_status); }

    if (-1!=wait_status && 0<=pid) {
        if (0==pid) {
            sleep(60);

            // During this manually force this process to
            // swap out. I use something like:

            // stress -m 1 --vm-bytes 1800M

            // in another shell and ^C'ing it after top
            // shows the swapped status desired. 1800M
            // just happened to work on the Pine64+ 2GB
            // that I was using. I watch with top -PCwaopid .
        }

        test_check(); // After wait/sleep [fails for small-enough region_sizes]
    }
}

My guess is that the forced access attempt when the line is
uncommented causes local some sort of status/caching update
for that memory and with that in place swap-out gets the
right information swapped out and then later that information
is swapped back in.

But an interesting point is that the failing case fails in both
the parent process of the fork and the child process, both seeing
an all-zero pattern for the dynamically allocated region.

Even for using:

void partial_test_check(void) {
    if (1u!=gbl_region.array[1])    raise(SIGABRT);
    if (1u!=(*dyn_region).array[1]) raise(SIGABRT);
}

instead of test_check as what to call between the fork
and the wait/sleep the following no longer gets the
problem at any stage:

extern void partial_test_check(void); // Tests some of the memory byte pattern  
                                      // but not all of it.

int main(void) {
    test_setup();
    test_check(); // Before fork() [passes]

    pid_t pid = fork();
    int wait_status = 0;;

    // test_check(); // After fork(); before wait/sleep.
                     // If used it prevents failure later!

    partial_test_check(); // Does a small access do such?

    if (0<pid) { wait(&wait_status); }

    if (-1!=wait_status && 0<=pid) {
        if (0==pid) {
            sleep(60);

            // During this manually force this process to
            // swap out. I use something like:

            // stress -m 1 --vm-bytes 1800M

            // in another shell and ^C'ing it after top
            // shows the swapped status desired. 1800M
            // just happened to work on the Pine64+ 2GB
            // that I was using. I watch with top -PCwaopid .
        }

        test_check(); // After wait/sleep [fails for small-enough region_sizes]
    }
}

This suggests to me that the small access is forcing one or more things to
be initialized for memory access that fork is not establishing of itself.
It appears that if established correctly then the swap-out/swap-in
sequence would work okay without needing the manual access to the memory.


So far via this test I've not seen any evidence of problems with the global
region but only the dynamically allocated region.

However, the symptoms that started this investigation in a much more
complicated context had an area of global memory from a .so that ended
up being zero.

I think that things should be fixed for this simpler context first and
that further investigation of the sh/su related should wait to see what
things are like after this test case works.


Side note:

The "extern"s are from a stage where I was investigating
having a .so involved but it turned out no shared library
had to be involved for what I ran into during this.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Bernd Walter-4
On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> [test_check() between the fork and the wait/sleep prevents the
> failure from occurring. Even a small access to the memory at
> that stage prevents the failure. Details follow.]

Maybe a stupid question, since you might have written it somewhere.
What medium do you swap to?
I've seen broken firmware on microSD cards doing silent data
corruption for some access patterns.

--
B.Walter <[hidden email]> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
On 2017-Mar-14, at 4:44 PM, Bernd Walter <[hidden email]> wrote:

> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>> [test_check() between the fork and the wait/sleep prevents the
>> failure from occurring. Even a small access to the memory at
>> that stage prevents the failure. Details follow.]
>
> Maybe a stupid question, since you might have written it somewhere.
> What medium do you swap to?
> I've seen broken firmware on microSD cards doing silent data
> corruption for some access patterns.

The root filesystem is on a USB SSD on a powered hub.

Only the kernel is from the microSD card.

I have several examples of the USB SSD model and have
never observed such problems in any other context.


The original issue that started this investigation
has been reported by several people on the lists:

Failed assertion: "tsd_booted"

on arm64 specifically, no other contexts so far as
I know. Earlier I had discovered that:

A) I could use a swap-in to cause the messages from
   instances of sh or su that had swapped out earlier.

B) The core dumps showed that a large memory region
   containing the global tsd_booted had all turned
   to be zero bytes. The assert is just exposing one
   of those zeros. (tsd_booted is from jemalloc that
   is in a .so that is loaded.)

This prompted me to look for simpler contexts involving
swapping that also show memory corruption.

So far I've only managed to produce corrupted memory when
fork and later swapping are both involved. Being a shared
library global is not a requirement for the problem,
although such contexts can have an issue. I've not made a
simpler example of that yet, although I tried.

I have not explored vfork, rfork, or any other alternatives.

So far all failure examples end up with zeroed memory when
the memory does not match the original pattern from before
the fork. At least that is what the core dumps show for all
examples that I've looked at.

See bugzilla 217138 and 217239. In some respects this example
is more analogous to the 217239 context as I remember.

My tests on amd64, armv6 (really -mcpu=cortex-a7 so armv7),
and powerpc64 have never produced any problems, including
never getting the failed assertion. Only arm64. (But I've
access to only one arm64 system, a Pine64+ 2GB.)

Prior to this I tracked down a different arm64 problem to
the fork_trampline code (for the child process) modifying
a system register but in a place allowing interrupts that
could also change the value. Andrew Turner fixed that
one at the time.

For this fork/swapping kind of issue I'm not sure that
I'll be able to do more than provide the simpler
example and the steps that I used. My isolating the
internal stage(s) and specific problem(s) at the code
level of detail does not seem likely.

But whatever is found needs to be able to explain the
contrast with an access after the fork but before the
swap preventing the failing behavior. So what I've got
so far hopefully does provide some hints to someone.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
In reply to this post by Mark Millard-2
A single Byte access to a 4K Byte aligned region between
the fork and wait/sleep/swap-out prevents that specific
4K Byte region from having the (bad) zeros.

Sounds like a page sized unit of behavior to me.

Details follow.

On 2017-Mar-14, at 3:28 PM, Mark Millard <[hidden email]> wrote:

> [test_check() between the fork and the wait/sleep prevents the
> failure from occurring. Even a small access to the memory at
> that stage prevents the failure. Details follow.]
>
> On 2017-Mar-14, at 11:07 AM, Mark Millard <[hidden email]> wrote:
>
>> [This is just a correction to the subject-line text to say arm64
>> instead of amd64.]
>>
>> On 2017-Mar-14, at 12:58 AM, Mark Millard <[hidden email]> wrote:
>>
>> [Another correction I'm afraid --about alternative program variations
>> this time.]
>>
>> On 2017-Mar-13, at 11:52 PM, Mark Millard <[hidden email]> wrote:
>>
>>> I'm still at a loss about how to figure out what stages are messed
>>> up. (Memory coherency? Some memory not swapped out? Bad data swapped
>>> out? Wrong data swapped in?)
>>>
>>> But at least I've found a much smaller/simpler example to demonstrate
>>> some problem with in my Pine64+_ 2GB context.
>>>
>>> The Pine64+ 2GB is the only amd64 context that I have access to.
>>
>> Someday I'll learn to type arm64 the first time instead of amd64.
>>
>>> The following program fails its check for data
>>> having its expected byte pattern in dynamically
>>> allocated memory after a fork/swap-out/swap-in
>>> sequence.
>>>
>>> I'll note that the program sleeps for 60s after
>>> forking to give time to do something else to
>>> cause the parent and child processes to swap
>>> out (RES=0 as seen in top).
>>
>> The following about the extra test_check() was
>> wrong.
>>
>>> Note the source code line:
>>>
>>> // test_check(); // Adding this line prevents failure.
>>>
>>> It seem that accessing the region contents before forking
>>> and swapping avoids the problem. But there is a problem
>>> if the region was only written-to before the fork/swap.
>
> There is a place that if a test_check call is put then the
> problem does not happen at any stage: I tried putting a
> call between the fork and the later wait/sleep code:

I changed the byte sequence patterns to avoid
zero values since the bad values are zeros:

static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); }
                  // value now avoids the zero value since the failures
                  // are zeros.

With that I can then test accurately what bytes have
bad values vs. do not. I also changed to:

void partial_test_check(void) {
    if (value(0u)!=gbl_region.array[0])    raise(SIGABRT);
    if (value(0u)!=(*dyn_region).array[0]) raise(SIGABRT);
}

since previously [0] had a zero value and so I'd used [1].

On this basis I'm now using the below. See the comments tied
to partial_test_check() calls:

extern void test_setup(void);         // Sets up the memory byte patterns.
extern void test_check(void);         // Tests the memory byte patterns.
extern void partial_test_check(void); // Tests just [0] of each region
                                      // (gbl_region and dyn_region).

int main(void) {
    test_setup();
    test_check(); // Before fork() [passes]

    pid_t pid = fork();
    int wait_status = 0;;

    // After fork; before waitsleep/swap-out.

    if (0==pid) partial_test_check();
                     // Even the above is sufficient by
                     // itself to prevent failure for
                     // region_size 1u through
                     // 4u*1024u!
                     // But 4u*1024u+1u and above fail
                     // with this access to memory.
                     // The failing test is of
                     // (*dyn_region).array[4096u].
                     // This test never fails here.

    if (0<pid) partial_test_check(); // This never prevents
                                     // later failures (and
                                     // never fails here).

    if (0<pid) { wait(&wait_status); }

    if (-1!=wait_status && 0<=pid) {
        if (0==pid) {
            sleep(60);

            // During this manually force this process to
            // swap out. I use something like:

            // stress -m 1 --vm-bytes 1800M

            // in another shell and ^C'ing it after top
            // shows the swapped status desired. 1800M
            // just happened to work on the Pine64+ 2GB
            // that I was using. I watch with top -PCwaopid .
        }

        test_check(); // After wait/sleep [fails for small-enough region_sizes]
    }
}

> This suggests to me that the small access is forcing one or more things to
> be initialized for memory access that fork is not establishing of itself.
> It appears that if established correctly then the swap-out/swap-in
> sequence would work okay without needing the manual access to the memory.
>
>
> So far via this test I've not seen any evidence of problems with the global
> region but only the dynamically allocated region.
>
> However, the symptoms that started this investigation in a much more
> complicated context had an area of global memory from a .so that ended
> up being zero.
>
> I think that things should be fixed for this simpler context first and
> that further investigation of the sh/su related should wait to see what
> things are like after this test case works.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
In reply to this post by Mark Millard-2
[Something strange happened to the automatic CC: fill-in for my original
reply. Also I should have mentioned that for my test program if a
variant is made that does not fork the swapping works fine.]

On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote:
>
>>    On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>> <markmi at dsl-only.net> wrote:
>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <[hidden email]> wrote:
>>>
>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>>>> [test_check() between the fork and the wait/sleep prevents the
>>>>> failure from occurring. Even a small access to the memory at
>>>>> that stage prevents the failure. Details follow.]
>>>>
>>>> Maybe a stupid question, since you might have written it somewhere.
>>>> What medium do you swap to?
>>>> I've seen broken firmware on microSD cards doing silent data
>>>> corruption for some access patterns.
>>>
>>> The root filesystem is on a USB SSD on a powered hub.
>>>
>>> Only the kernel is from the microSD card.
>>>
>>> I have several examples of the USB SSD model and have
>>> never observed such problems in any other context.
>>>
>>> [remainder of irrelevant material deleted  --SB]
>>
>>    You gave a very long-winded non-answer to Bernd's question, so I'll
>> repeat it here.  What medium do you swap to?
>
> My wording of:
>
> The root filesystem is on a USB SSD on a powered hub.
>
> was definitely poor. It should have explicitly mentioned the
> swap partition too:
>
> The root filesystem and swap partition are both on the same
> USB SSD on a powered hub.
>
> More detail from dmesg -a for usb:
>
> usbus0: 12Mbps Full Speed USB v1.0
> usbus1: 480Mbps High Speed USB v2.0
> usbus2: 12Mbps Full Speed USB v1.0
> usbus3: 480Mbps High Speed USB v2.0
> ugen0.1: <Generic OHCI root HUB> at usbus0
> uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
> ugen1.1: <Allwinner EHCI root HUB> at usbus1
> uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
> ugen2.1: <Generic OHCI root HUB> at usbus2
> uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
> ugen3.1: <Allwinner EHCI root HUB> at usbus3
> uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
> . . .
> uhub0: 1 port with 1 removable, self powered
> uhub2: 1 port with 1 removable, self powered
> uhub1: 1 port with 1 removable, self powered
> uhub3: 1 port with 1 removable, self powered
> ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3
> uhub4 on uhub3
> uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3
> uhub4: MTT enabled
> uhub4: 4 ports with 4 removable, self powered
> ugen3.3: <OWC Envoy Pro mini> at usbus3
> umass0 on uhub4
> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:0:0: Attached to scbus0
> . . .
> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number <REPLACED>
> da0: 40.000MB/s transfers
>
> (Edited a bit because there is other material interlaced, even
> internal to some lines. Also: I removed the serial number of the
> specific example device.)
>
>>    I will further note that any kind of USB device cannot automatically
>> be trusted to behave properly.  USB devices are notorious, for example,
>> for momentarily dropping off-line and then immediately reconnecting.  (ZFS
>> reacts very poorly to such events, BTW.)  This misbehavior can be caused
>> by either processor involved, i.e., the one controlling either the
>> upstream or the downstream device.  Hubs are really bad about this, but
>> any USB device can be guilty.  You may have a defective storage device,
>> its controller may be defective, or any controller in the chain all the
>> way back to the motherboard may be defective or, not defective, but
>> corrupted by having been connected to another device with corrupted
>> (infected) firmware that tries to flash itself into the firmware flash
>> chips in its potential victim.
>>    Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be
>> defective.  Without parity bits, the devices may return bad data and lie
>> about the presence of corrupted data.  That, for example, is where ZFS
>> is better than any kind of classical RAID because ZFS keeps checksums on
>> everything, so it has a reasonable chance of detecting corruption even
>> without parity support and, if there is any redundancy in the pool or the
>> data set, fixing the bad data machine.  Even having parity generally
>> allows only the detection of single-bit errors, but not of repairing them.
>>    You should identify where you page/swap to and then try substituting
>> a different device for that function as a test to eliminate the possibility
>> of a bad storage device/controller.  If the problem still occurs, that
>> means there still remains the possibility that another controller or its
>> firmware is defective instead.  It could be a kernel bug, it is true, but
>> making sure there is no hardware or firmware error occurring is important,
>> and as I say, USB devices should always be considered suspect unless and
>> until proven innocent.
>
> [FYI: This is a ufs context, not a zfs one.]
>
> I'm aware of such  things. There is no evidence that has resulted in
> suggesting the USB devices that I can replace are a problem. Otherwise
> I'd not be going down this path. I only have access to the one arm64
> device (a Pine64+ 2GB) so I've no ability to substitution-test what
> is on that board.
>
> It would be neat if some folks used my code to test other arm64
> contexts and reported the results. I'd be very interested.
> (This is easier to do on devices that do not have massive
> amounts of RAM, which may limit the range of devices or
> device configurations that are reasonable to test.)
>
> There is that other people using other devices have reported
> the behavior that started this investigation. I can produce the
> behavior that they reported, although I've not seen anyone else
> listing specific steps that lead to the problem or ways to tell
> if the symptom is going to happen before it actually does. Nor
> have I seen any other core dump analysis. (I have bugzilla
> submittals 217138 and 217239 tied to symptoms others have
> reported as well as this test program material.)
>
> Also, considering that for my test program I can control which pages
> get the zeroed-problem by read-accessing even one byte of any 4K
> Byte page that I want to make work normally, doing so in the child
> process of the fork, between the fork and the sleep/swap-out, it does
> not suggest USB-device-specific behavior. The read-access is changing
> the status of the page in some way as far as I can tell.
>
> (Such read-accesses in the parent process make no difference to the
> behavior.)

I should have noted another comparison/contrast between
having memory corruption and not in my context:

I've tried variants of my test program that do not fork but
just sleep for 60s to allow me to force the swap-out. I
did this before adding fork and before using
parital_test_check, for example. I gradually added things
apparently involved in the reports others had made
until I found a combination that produced a memory
corruption test failure.

These tests without fork involved find no problems with
the memory content after the swap-in.

For my test program it appears that fork-before-swap-out
or the like is essential to having the problem occur.

===
Mark Millard
markmi at dsl-only.net


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Scott Bennett
Mark Millard <markmi ta dsl-only.net> wrote:

> [Something strange happened to the automatic CC: fill-in for my original
> reply. Also I should have mentioned that for my test program if a
> variant is made that does not fork the swapping works fine.]
>
> On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:
>
> > On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote:
> >
> >>    On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
> >> <markmi at dsl-only.net> wrote:
> >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <[hidden email]> wrote:
> >>>
> >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
> >>>>> [test_check() between the fork and the wait/sleep prevents the
> >>>>> failure from occurring. Even a small access to the memory at
> >>>>> that stage prevents the failure. Details follow.]
> >>>>
> >>>> Maybe a stupid question, since you might have written it somewhere.
> >>>> What medium do you swap to?
> >>>> I've seen broken firmware on microSD cards doing silent data
> >>>> corruption for some access patterns.
> >>>
> >>> The root filesystem is on a USB SSD on a powered hub.
> >>>
> >>> Only the kernel is from the microSD card.
> >>>
> >>> I have several examples of the USB SSD model and have
> >>> never observed such problems in any other context.
> >>>
> >>> [remainder of irrelevant material deleted  --SB]
> >>
> >>    You gave a very long-winded non-answer to Bernd's question, so I'll
> >> repeat it here.  What medium do you swap to?
> >
> > My wording of:
> >
> > The root filesystem is on a USB SSD on a powered hub.
> >
> > was definitely poor. It should have explicitly mentioned the
> > swap partition too:
> >
> > The root filesystem and swap partition are both on the same
> > USB SSD on a powered hub.
> >
> > More detail from dmesg -a for usb:
> >
> > usbus0: 12Mbps Full Speed USB v1.0
> > usbus1: 480Mbps High Speed USB v2.0
> > usbus2: 12Mbps Full Speed USB v1.0
> > usbus3: 480Mbps High Speed USB v2.0
> > ugen0.1: <Generic OHCI root HUB> at usbus0
> > uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
> > ugen1.1: <Allwinner EHCI root HUB> at usbus1
> > uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
> > ugen2.1: <Generic OHCI root HUB> at usbus2
> > uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
> > ugen3.1: <Allwinner EHCI root HUB> at usbus3
> > uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
> > . . .
> > uhub0: 1 port with 1 removable, self powered
> > uhub2: 1 port with 1 removable, self powered
> > uhub1: 1 port with 1 removable, self powered
> > uhub3: 1 port with 1 removable, self powered
> > ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3
> > uhub4 on uhub3
> > uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3
> > uhub4: MTT enabled
> > uhub4: 4 ports with 4 removable, self powered
> > ugen3.3: <OWC Envoy Pro mini> at usbus3
> > umass0 on uhub4
> > umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3
> > umass0:  SCSI over Bulk-Only; quirks = 0x0100
> > umass0:0:0: Attached to scbus0
> > . . .
> > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
> > da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
> > da0: Serial Number <REPLACED>
> > da0: 40.000MB/s transfers
> >
> > (Edited a bit because there is other material interlaced, even
> > internal to some lines. Also: I removed the serial number of the
> > specific example device.)

     Thank you.  That presents a much clearer picture.

> >
> >>    I will further note that any kind of USB device cannot automatically
> >> be trusted to behave properly.  USB devices are notorious, for example,
> >>
> >>   [reasons why deleted  --SB]
> >>
> >>    You should identify where you page/swap to and then try substituting
> >> a different device for that function as a test to eliminate the possibility
> >> of a bad storage device/controller.  If the problem still occurs, that
> >> means there still remains the possibility that another controller or its
> >> firmware is defective instead.  It could be a kernel bug, it is true, but
> >> making sure there is no hardware or firmware error occurring is important,
> >> and as I say, USB devices should always be considered suspect unless and
> >> until proven innocent.
> >
> > [FYI: This is a ufs context, not a zfs one.]

     Right.  It's only a Pi, after all. :-)
> >
> > I'm aware of such  things. There is no evidence that has resulted in
> > suggesting the USB devices that I can replace are a problem. Otherwise
> > I'd not be going down this path. I only have access to the one arm64
> > device (a Pine64+ 2GB) so I've no ability to substitution-test what
> > is on that board.

     There isn't even one open port on that hub that you could plug a
flash drive into temporarily to be the paging device?  You could then
try your tests before returning to the normal configuration.  If there
isn't an open port, then how about plugging a second hub into one of
the first hub's ports and moving the displaced device to the second
hub?  A flash drive could then be plugged in.  That kind of configuration
is obviously a bad idea for the long run, but just to try your tests it
ought to work well enough.  (BTW, if a USB storage device containing a
paging area drops off=line even momentarily and the system needs to use
it, that is the beginning of the end, even though it may take up to a few
minutes for everything to lock up.  You probably won't be able to do an
orderly shutdown, but will instead have to crash it with the power switch.
In the case of something like a Pi, this is an unpleasant fact of life,
to be sure.)
     I think I buy your arguments, given the evidence you've collected
thus far, including what you've added below.  I just like to eliminate
possibilities that are much simpler to deal with before facing nastinesses
like bugs in the VM subsystem. :-)

> >
> > It would be neat if some folks used my code to test other arm64
> > contexts and reported the results. I'd be very interested.
> > (This is easier to do on devices that do not have massive
> > amounts of RAM, which may limit the range of devices or
> > device configurations that are reasonable to test.)
> >
> > There is that other people using other devices have reported
> > the behavior that started this investigation. I can produce the
> > behavior that they reported, although I've not seen anyone else
> > listing specific steps that lead to the problem or ways to tell
> > if the symptom is going to happen before it actually does. Nor
> > have I seen any other core dump analysis. (I have bugzilla
> > submittals 217138 and 217239 tied to symptoms others have
> > reported as well as this test program material.)
> >
> > Also, considering that for my test program I can control which pages
> > get the zeroed-problem by read-accessing even one byte of any 4K
> > Byte page that I want to make work normally, doing so in the child
> > process of the fork, between the fork and the sleep/swap-out, it does
> > not suggest USB-device-specific behavior. The read-access is changing
> > the status of the page in some way as far as I can tell.
> >
> > (Such read-accesses in the parent process make no difference to the
> > behavior.)
>
> I should have noted another comparison/contrast between
> having memory corruption and not in my context:
>
> I've tried variants of my test program that do not fork but
> just sleep for 60s to allow me to force the swap-out. I
> did this before adding fork and before using
> parital_test_check, for example. I gradually added things
> apparently involved in the reports others had made
> until I found a combination that produced a memory
> corruption test failure.
>
> These tests without fork involved find no problems with
> the memory content after the swap-in.
>
> For my test program it appears that fork-before-swap-out
> or the like is essential to having the problem occur.
>
     A comment about terminology seems in order here.  It bothers
me considerably to see you writing "swap out" or "swapping" where
it seems like you mean to write "page out" or "paging".  A BSD
system whose swapping mechanism gets activated has already waded
very deeply into the quicksand and frequently cannot be gotten out
in a reasonable amount of time even with manual assistance.  It is
often quicker to crash it, reboot, and wait for the fsck(8) cleanups
to complete.  Orderly shutdowns, even of the kind that results from
a quick poke to the power button, typically get mired in the same
mess that already has the system in knots.  Also, BSD systems since
3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in,
just out.  A swapped out process, once the system determines that it
has adequate resources again to attempt to run the process, will have
the interrupted text page paged in and the rest will be paged in by
the normal mechanism of page faults and page-in operations.  I assume
you must already know all this, which is a large part of why it grates
on me that you appear to be using the wrong terms.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
On 2017-Mar-15, at 11:07 PM, Scott Bennett <bennett at sdf.org> wrote:

> Mark Millard <markmi ta dsl-only.net> wrote:
>
>> [Something strange happened to the automatic CC: fill-in for my original
>> reply. Also I should have mentioned that for my test program if a
>> variant is made that does not fork the swapping works fine.]
>>
>> On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:
>>
>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote:
>>>
>>>>   On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>>>> <markmi at dsl-only.net> wrote:
>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <[hidden email]> wrote:
>>>>>
>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>>>>>> [test_check() between the fork and the wait/sleep prevents the
>>>>>>> failure from occurring. Even a small access to the memory at
>>>>>>> that stage prevents the failure. Details follow.]
>>>>>>
>>>>>> Maybe a stupid question, since you might have written it somewhere.
>>>>>> What medium do you swap to?
>>>>>> I've seen broken firmware on microSD cards doing silent data
>>>>>> corruption for some access patterns.
>>>>>
>>>>> The root filesystem is on a USB SSD on a powered hub.
>>>>>
>>>>> Only the kernel is from the microSD card.
>>>>>
>>>>> I have several examples of the USB SSD model and have
>>>>> never observed such problems in any other context.
>>>>>
>>>>> [remainder of irrelevant material deleted  --SB]
>>>>
>>>>   You gave a very long-winded non-answer to Bernd's question, so I'll
>>>> repeat it here.  What medium do you swap to?
>>>
>>> My wording of:
>>>
>>> The root filesystem is on a USB SSD on a powered hub.
>>>
>>> was definitely poor. It should have explicitly mentioned the
>>> swap partition too:
>>>
>>> The root filesystem and swap partition are both on the same
>>> USB SSD on a powered hub.
>>>
>>> More detail from dmesg -a for usb:
>>>
>>> usbus0: 12Mbps Full Speed USB v1.0
>>> usbus1: 480Mbps High Speed USB v2.0
>>> usbus2: 12Mbps Full Speed USB v1.0
>>> usbus3: 480Mbps High Speed USB v2.0
>>> ugen0.1: <Generic OHCI root HUB> at usbus0
>>> uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
>>> ugen1.1: <Allwinner EHCI root HUB> at usbus1
>>> uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
>>> ugen2.1: <Generic OHCI root HUB> at usbus2
>>> uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
>>> ugen3.1: <Allwinner EHCI root HUB> at usbus3
>>> uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
>>> . . .
>>> uhub0: 1 port with 1 removable, self powered
>>> uhub2: 1 port with 1 removable, self powered
>>> uhub1: 1 port with 1 removable, self powered
>>> uhub3: 1 port with 1 removable, self powered
>>> ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3
>>> uhub4 on uhub3
>>> uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3
>>> uhub4: MTT enabled
>>> uhub4: 4 ports with 4 removable, self powered
>>> ugen3.3: <OWC Envoy Pro mini> at usbus3
>>> umass0 on uhub4
>>> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3
>>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>>> umass0:0:0: Attached to scbus0
>>> . . .
>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
>>> da0: Serial Number <REPLACED>
>>> da0: 40.000MB/s transfers
>>>
>>> (Edited a bit because there is other material interlaced, even
>>> internal to some lines. Also: I removed the serial number of the
>>> specific example device.)
>
>     Thank you.  That presents a much clearer picture.
>>>
>>>>   I will further note that any kind of USB device cannot automatically
>>>> be trusted to behave properly.  USB devices are notorious, for example,
>>>>
>>>>  [reasons why deleted  --SB]
>>>>
>>>>   You should identify where you page/swap to and then try substituting
>>>> a different device for that function as a test to eliminate the possibility
>>>> of a bad storage device/controller.  If the problem still occurs, that
>>>> means there still remains the possibility that another controller or its
>>>> firmware is defective instead.  It could be a kernel bug, it is true, but
>>>> making sure there is no hardware or firmware error occurring is important,
>>>> and as I say, USB devices should always be considered suspect unless and
>>>> until proven innocent.
>>>
>>> [FYI: This is a ufs context, not a zfs one.]
>
>     Right.  It's only a Pi, after all. :-)

It is a Pine64+ 2GB, not an rpi3.

>>>
>>> I'm aware of such  things. There is no evidence that has resulted in
>>> suggesting the USB devices that I can replace are a problem. Otherwise
>>> I'd not be going down this path. I only have access to the one arm64
>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what
>>> is on that board.
>
>     There isn't even one open port on that hub that you could plug a
> flash drive into temporarily to be the paging device?

Why do you think that I've never tried alternative devices? It
is just that the result was no evidence that my usually-in-use
SSD is having a special/local problem: the behavior continues
across all such contexts when the Pine64+ 2GB is involved. (Again
I have not had access to an alternate to the one arm64 board.
That limits my substitution testing possibilities.)

Why would you expect a Flash drive to be better than another SSD
for such testing? (The SSD that I usually use even happens to be
a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So
is the hub that I usually use for that matter.)

> You could then
> try your tests before returning to the normal configuration.  If there
> isn't an open port, then how about plugging a second hub into one of
> the first hub's ports and moving the displaced device to the second
> hub?  A flash drive could then be plugged in.  That kind of configuration
> is obviously a bad idea for the long run, but just to try your tests it
> ought to work well enough.

I have access to more SSDs that I can use than I do to Flash drives. I
see no reason to specifically use a Flash drive.

> (BTW, if a USB storage device containing a
> paging area drops off=line even momentarily and the system needs to use
> it, that is the beginning of the end, even though it may take up to a few
> minutes for everything to lock up.

The system does not lock up, even days or weeks later, with having done
dozens of experiments that show memory corruption failures over those
days. The only processes showing memory corruption so far are those
that were the parent or child for a fork that were later swapped out
to have zero RES(ident memory) and then even later swapped back in.

The context has no such issues. You are inventing problems that do
not exist in my context. That is why none of my list submittals
mention such problems: they did not occur.

> You probably won't be able to do an
> orderly shutdown, but will instead have to crash it with the power switch.
> In the case of something like a Pi, this is an unpleasant fact of life,
> to be sure.)

Such things did not occur and has nothing to do with my context so far.

>     I think I buy your arguments, given the evidence you've collected
> thus far, including what you've added below.  I just like to eliminate
> possibilities that are much simpler to deal with before facing nastinesses
> like bugs in the VM subsystem. :-)

When I started this I found no evidence of device-specific problems.
My investigation activity goes back to long before my list submittals.

And I repeat: Other people have reported the symptoms that started
this investigation. They did so before I ever started my activities.
They were using none of the specific devices that I have access to.
Likely the types of devices were frequently even different, such as
a rpi3 instead of a Pine64+ 2GB or a different USB drive. I was able
to get the symptoms that they reported.

>>> It would be neat if some folks used my code to test other arm64
>>> contexts and reported the results. I'd be very interested.
>>> (This is easier to do on devices that do not have massive
>>> amounts of RAM, which may limit the range of devices or
>>> device configurations that are reasonable to test.)
>>>
>>> There is that other people using other devices have reported
>>> the behavior that started this investigation. I can produce the
>>> behavior that they reported, although I've not seen anyone else
>>> listing specific steps that lead to the problem or ways to tell
>>> if the symptom is going to happen before it actually does. Nor
>>> have I seen any other core dump analysis. (I have bugzilla
>>> submittals 217138 and 217239 tied to symptoms others have
>>> reported as well as this test program material.)
>>>
>>> Also, considering that for my test program I can control which pages
>>> get the zeroed-problem by read-accessing even one byte of any 4K
>>> Byte page that I want to make work normally, doing so in the child
>>> process of the fork, between the fork and the sleep/swap-out, it does
>>> not suggest USB-device-specific behavior. The read-access is changing
>>> the status of the page in some way as far as I can tell.
>>>
>>> (Such read-accesses in the parent process make no difference to the
>>> behavior.)
>>
>> I should have noted another comparison/contrast between
>> having memory corruption and not in my context:
>>
>> I've tried variants of my test program that do not fork but
>> just sleep for 60s to allow me to force the swap-out. I
>> did this before adding fork and before using
>> parital_test_check, for example. I gradually added things
>> apparently involved in the reports others had made
>> until I found a combination that produced a memory
>> corruption test failure.
>>
>> These tests without fork involved find no problems with
>> the memory content after the swap-in.
>>
>> For my test program it appears that fork-before-swap-out
>> or the like is essential to having the problem occur.
>>
>     A comment about terminology seems in order here.  It bothers
> me considerably to see you writing "swap out" or "swapping" where
> it seems like you mean to write "page out" or "paging".  A BSD
> system whose swapping mechanism gets activated has already waded
> very deeply into the quicksand and frequently cannot be gotten out
> in a reasonable amount of time even with manual assistance.  It is
> often quicker to crash it, reboot, and wait for the fsck(8) cleanups
> to complete.  Orderly shutdowns, even of the kind that results from
> a quick poke to the power button, typically get mired in the same
> mess that already has the system in knots.  Also, BSD systems since
> 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in,
> just out.  A swapped out process, once the system determines that it
> has adequate resources again to attempt to run the process, will have
> the interrupted text page paged in and the rest will be paged in by
> the normal mechanism of page faults and page-in operations.  I assume
> you must already know all this, which is a large part of why it grates
> on me that you appear to be using the wrong terms.

You apparently did not read any of the material about how the test
is done or are unfamiliar with what "stress -m 1 --vm-bytes 1800M"
does when there is only 2GB of RAM. I am deliberately inducing
swapping in other processes, including the 2 from my test program
(after the fork), not just paging. (stress is a port, not part of
the base system.)

When I say swap-out and swap-in I mean it.

From the source code of my test program:

            sleep(60);

            // During this manually force this process to
            // swap out. I use something like:

            // stress -m 1 --vm-bytes 1800M

            // in another shell and ^C'ing it after top
            // shows the swapped status desired. 1800M
            // just happened to work on the Pine64+ 2GB
            // that I was using. I watch with top -PCwaopid .

That type of stress run uses about 1.8 GiBytes after a bit,
which is enough to cause the swapping of other processes,
including the two that I am testing (post-fork). (Some RAM
is in use already before the stress run, which explains not
needing 2 GiBytes to be in use by stress.)

Look at a "top -PCwaopid" display: there are columns for
RES(ident memory) and SWAP. I cause my 2 test processes to
show zero RES and everything under SWAP, starting sometime
during the 60s sleep/wait.

Why would I cause swapping? Because buildworld causes such
swap-outs at times when there is only 2GBytes of RAM,
including processes that forked earlier, and as a result
the corrupted memory problems show up later in some processes
that were swapped out at the time. The build eventually
stops for process failures tied to the corruptions of memory
in the failing processes. (At least that is what my testing
strongly suggests.)

But that is a very complicated context to use for analysis or
testing of the problem. My test program is vastly simpler
and easier/quicker to set up and test when used with stress
as well. Such was the kind of thing I was trying to find.

I want the Pine64+ 2GB to work well enough to be able to have
buildworld (-j 4) complete correctly without having to restart
the build --even when everything has to be rebuilt. So I'm
trying to find and provide enough evidence to help someone fix
the problems that are observed to block such buildworld
activity.

Again: others have reported such arm64 problems on the lists
before I ever got into this activity. The evidence is that
the issues are not a local property of my environment.

Swapping is supposed to work. I can do buildworld (-j 4)
on armv6 (really -mcpu=cortex-a7 so armv7-a) and the
swapping it causes works fine. This is true for both a
bpim3 (2 GiBytes of RAM) and a rpi2 (1 GiByte of RAM
so even more swapping). On a powerpc64 with 16 GiBytes
I've built things that caused 26 GiBytes of swap to be
in use some of the time (during 4 ld's running in
parallel), with lots of processes having zero for
RES(ident memory) and all their space listed under SWAP
in a "top -PCwaopid" display. This too has no problems
with swapping of previously forked processes (or of any
other processes).

For the likes of a Pine64+ 2GB to be "self hosted"
for source-code based updates, swapping of previously
forked processes must work and currently such
swapping is unreliable.

===
Mark Millard
markmi at dsl-only.net


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
[Summary: I've now tested on a rpi3 in addition to a
pine64+ 2GB. Both contexts show the problem.]

On 2017-Mar-16, at 2:07 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Mar-15, at 11:07 PM, Scott Bennett <bennett at sdf.org> wrote:
>
>> Mark Millard <markmi ta dsl-only.net> wrote:
>>
>>> [Something strange happened to the automatic CC: fill-in for my original
>>> reply. Also I should have mentioned that for my test program if a
>>> variant is made that does not fork the swapping works fine.]
>>>
>>> On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:
>>>
>>>> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote:
>>>>
>>>>>  On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard
>>>>> <markmi at dsl-only.net> wrote:
>>>>>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <[hidden email]> wrote:
>>>>>>
>>>>>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote:
>>>>>>>> [test_check() between the fork and the wait/sleep prevents the
>>>>>>>> failure from occurring. Even a small access to the memory at
>>>>>>>> that stage prevents the failure. Details follow.]
>>>>>>>
>>>>>>> Maybe a stupid question, since you might have written it somewhere.
>>>>>>> What medium do you swap to?
>>>>>>> I've seen broken firmware on microSD cards doing silent data
>>>>>>> corruption for some access patterns.
>>>>>>
>>>>>> The root filesystem is on a USB SSD on a powered hub.
>>>>>>
>>>>>> Only the kernel is from the microSD card.
>>>>>>
>>>>>> I have several examples of the USB SSD model and have
>>>>>> never observed such problems in any other context.
>>>>>>
>>>>>> [remainder of irrelevant material deleted  --SB]
>>>>>
>>>>>  You gave a very long-winded non-answer to Bernd's question, so I'll
>>>>> repeat it here.  What medium do you swap to?
>>>>
>>>> My wording of:
>>>>
>>>> The root filesystem is on a USB SSD on a powered hub.
>>>>
>>>> was definitely poor. It should have explicitly mentioned the
>>>> swap partition too:
>>>>
>>>> The root filesystem and swap partition are both on the same
>>>> USB SSD on a powered hub.
>>>>
>>>> More detail from dmesg -a for usb:
>>>>
>>>> usbus0: 12Mbps Full Speed USB v1.0
>>>> usbus1: 480Mbps High Speed USB v2.0
>>>> usbus2: 12Mbps Full Speed USB v1.0
>>>> usbus3: 480Mbps High Speed USB v2.0
>>>> ugen0.1: <Generic OHCI root HUB> at usbus0
>>>> uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
>>>> ugen1.1: <Allwinner EHCI root HUB> at usbus1
>>>> uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
>>>> ugen2.1: <Generic OHCI root HUB> at usbus2
>>>> uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
>>>> ugen3.1: <Allwinner EHCI root HUB> at usbus3
>>>> uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
>>>> . . .
>>>> uhub0: 1 port with 1 removable, self powered
>>>> uhub2: 1 port with 1 removable, self powered
>>>> uhub1: 1 port with 1 removable, self powered
>>>> uhub3: 1 port with 1 removable, self powered
>>>> ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3
>>>> uhub4 on uhub3
>>>> uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3
>>>> uhub4: MTT enabled
>>>> uhub4: 4 ports with 4 removable, self powered
>>>> ugen3.3: <OWC Envoy Pro mini> at usbus3
>>>> umass0 on uhub4
>>>> umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3
>>>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>>>> umass0:0:0: Attached to scbus0
>>>> . . .
>>>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>>>> da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
>>>> da0: Serial Number <REPLACED>
>>>> da0: 40.000MB/s transfers
>>>>
>>>> (Edited a bit because there is other material interlaced, even
>>>> internal to some lines. Also: I removed the serial number of the
>>>> specific example device.)
>>
>>    Thank you.  That presents a much clearer picture.
>>>>
>>>>>  I will further note that any kind of USB device cannot automatically
>>>>> be trusted to behave properly.  USB devices are notorious, for example,
>>>>>
>>>>> [reasons why deleted  --SB]
>>>>>
>>>>>  You should identify where you page/swap to and then try substituting
>>>>> a different device for that function as a test to eliminate the possibility
>>>>> of a bad storage device/controller.  If the problem still occurs, that
>>>>> means there still remains the possibility that another controller or its
>>>>> firmware is defective instead.  It could be a kernel bug, it is true, but
>>>>> making sure there is no hardware or firmware error occurring is important,
>>>>> and as I say, USB devices should always be considered suspect unless and
>>>>> until proven innocent.
>>>>
>>>> [FYI: This is a ufs context, not a zfs one.]
>>
>>    Right.  It's only a Pi, after all. :-)
>
> It is a Pine64+ 2GB, not an rpi3.
>
>>>>
>>>> I'm aware of such  things. There is no evidence that has resulted in
>>>> suggesting the USB devices that I can replace are a problem. Otherwise
>>>> I'd not be going down this path. I only have access to the one arm64
>>>> device (a Pine64+ 2GB) so I've no ability to substitution-test what
>>>> is on that board.
>>
>>    There isn't even one open port on that hub that you could plug a
>> flash drive into temporarily to be the paging device?
>
> Why do you think that I've never tried alternative devices? It
> is just that the result was no evidence that my usually-in-use
> SSD is having a special/local problem: the behavior continues
> across all such contexts when the Pine64+ 2GB is involved. (Again
> I have not had access to an alternate to the one arm64 board.
> That limits my substitution testing possibilities.)
>
> Why would you expect a Flash drive to be better than another SSD
> for such testing? (The SSD that I usually use even happens to be
> a USB 3.0 SSD, capable of USB 3.0 speeds in USB 3.0 contexts. So
> is the hub that I usually use for that matter.)

FYI: I now have access to a rpi3 in addition to a pine64+ 2GB.

I've tested on the rpi3 using a different USB hub and a different
SSD: no hardware device in common with the recent Pine64+ 2GB
tests (other than console cabling and what handles the serial
console).

The fork-then-swap-out-then-swap-in failure happens in the
rpi3 context as well.

Because the rpi3 has only 1 GiByte of RAM the stress commands
that I used were more like:

stress -m 1 --vm-bytes 1000M

to get zero RES(ident memory) for the two processes from my
test program after it forks while they are waiting/sleeping.


>> You could then
>> try your tests before returning to the normal configuration.  If there
>> isn't an open port, then how about plugging a second hub into one of
>> the first hub's ports and moving the displaced device to the second
>> hub?  A flash drive could then be plugged in.  That kind of configuration
>> is obviously a bad idea for the long run, but just to try your tests it
>> ought to work well enough.
>
> I have access to more SSDs that I can use than I do to Flash drives. I
> see no reason to specifically use a Flash drive.
>
>> (BTW, if a USB storage device containing a
>> paging area drops off=line even momentarily and the system needs to use
>> it, that is the beginning of the end, even though it may take up to a few
>> minutes for everything to lock up.
>
> The system does not lock up, even days or weeks later, with having done
> dozens of experiments that show memory corruption failures over those
> days. The only processes showing memory corruption so far are those
> that were the parent or child for a fork that were later swapped out
> to have zero RES(ident memory) and then even later swapped back in.
>
> The context has no such issues. You are inventing problems that do
> not exist in my context. That is why none of my list submittals
> mention such problems: they did not occur.
>
>> You probably won't be able to do an
>> orderly shutdown, but will instead have to crash it with the power switch.
>> In the case of something like a Pi, this is an unpleasant fact of life,
>> to be sure.)
>
> Such things did not occur and has nothing to do with my context so far.
>
>>    I think I buy your arguments, given the evidence you've collected
>> thus far, including what you've added below.  I just like to eliminate
>> possibilities that are much simpler to deal with before facing nastinesses
>> like bugs in the VM subsystem. :-)
>
> When I started this I found no evidence of device-specific problems.
> My investigation activity goes back to long before my list submittals.
>
> And I repeat: Other people have reported the symptoms that started
> this investigation. They did so before I ever started my activities.
> They were using none of the specific devices that I have access to.
> Likely the types of devices were frequently even different, such as
> a rpi3 instead of a Pine64+ 2GB or a different USB drive. I was able
> to get the symptoms that they reported.
>
>>>> It would be neat if some folks used my code to test other arm64
>>>> contexts and reported the results. I'd be very interested.
>>>> (This is easier to do on devices that do not have massive
>>>> amounts of RAM, which may limit the range of devices or
>>>> device configurations that are reasonable to test.)
>>>>
>>>> There is that other people using other devices have reported
>>>> the behavior that started this investigation. I can produce the
>>>> behavior that they reported, although I've not seen anyone else
>>>> listing specific steps that lead to the problem or ways to tell
>>>> if the symptom is going to happen before it actually does. Nor
>>>> have I seen any other core dump analysis. (I have bugzilla
>>>> submittals 217138 and 217239 tied to symptoms others have
>>>> reported as well as this test program material.)
>>>>
>>>> Also, considering that for my test program I can control which pages
>>>> get the zeroed-problem by read-accessing even one byte of any 4K
>>>> Byte page that I want to make work normally, doing so in the child
>>>> process of the fork, between the fork and the sleep/swap-out, it does
>>>> not suggest USB-device-specific behavior. The read-access is changing
>>>> the status of the page in some way as far as I can tell.
>>>>
>>>> (Such read-accesses in the parent process make no difference to the
>>>> behavior.)
>>>
>>> I should have noted another comparison/contrast between
>>> having memory corruption and not in my context:
>>>
>>> I've tried variants of my test program that do not fork but
>>> just sleep for 60s to allow me to force the swap-out. I
>>> did this before adding fork and before using
>>> parital_test_check, for example. I gradually added things
>>> apparently involved in the reports others had made
>>> until I found a combination that produced a memory
>>> corruption test failure.
>>>
>>> These tests without fork involved find no problems with
>>> the memory content after the swap-in.
>>>
>>> For my test program it appears that fork-before-swap-out
>>> or the like is essential to having the problem occur.
>>>
>>    A comment about terminology seems in order here.  It bothers
>> me considerably to see you writing "swap out" or "swapping" where
>> it seems like you mean to write "page out" or "paging".  A BSD
>> system whose swapping mechanism gets activated has already waded
>> very deeply into the quicksand and frequently cannot be gotten out
>> in a reasonable amount of time even with manual assistance.  It is
>> often quicker to crash it, reboot, and wait for the fsck(8) cleanups
>> to complete.  Orderly shutdowns, even of the kind that results from
>> a quick poke to the power button, typically get mired in the same
>> mess that already has the system in knots.  Also, BSD systems since
>> 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in,
>> just out.  A swapped out process, once the system determines that it
>> has adequate resources again to attempt to run the process, will have
>> the interrupted text page paged in and the rest will be paged in by
>> the normal mechanism of page faults and page-in operations.  I assume
>> you must already know all this, which is a large part of why it grates
>> on me that you appear to be using the wrong terms.
>
> You apparently did not read any of the material about how the test
> is done or are unfamiliar with what "stress -m 1 --vm-bytes 1800M"
> does when there is only 2GB of RAM. I am deliberately inducing
> swapping in other processes, including the 2 from my test program
> (after the fork), not just paging. (stress is a port, not part of
> the base system.)
>
> When I say swap-out and swap-in I mean it.
>
> From the source code of my test program:
>
>            sleep(60);
>
>            // During this manually force this process to
>            // swap out. I use something like:
>
>            // stress -m 1 --vm-bytes 1800M
>
>            // in another shell and ^C'ing it after top
>            // shows the swapped status desired. 1800M
>            // just happened to work on the Pine64+ 2GB
>            // that I was using. I watch with top -PCwaopid .
>
> That type of stress run uses about 1.8 GiBytes after a bit,
> which is enough to cause the swapping of other processes,
> including the two that I am testing (post-fork). (Some RAM
> is in use already before the stress run, which explains not
> needing 2 GiBytes to be in use by stress.)
>
> Look at a "top -PCwaopid" display: there are columns for
> RES(ident memory) and SWAP. I cause my 2 test processes to
> show zero RES and everything under SWAP, starting sometime
> during the 60s sleep/wait.
>
> Why would I cause swapping? Because buildworld causes such
> swap-outs at times when there is only 2GBytes of RAM,
> including processes that forked earlier, and as a result
> the corrupted memory problems show up later in some processes
> that were swapped out at the time. The build eventually
> stops for process failures tied to the corruptions of memory
> in the failing processes. (At least that is what my testing
> strongly suggests.)
>
> But that is a very complicated context to use for analysis or
> testing of the problem. My test program is vastly simpler
> and easier/quicker to set up and test when used with stress
> as well. Such was the kind of thing I was trying to find.
>
> I want the Pine64+ 2GB to work well enough to be able to have
> buildworld (-j 4) complete correctly without having to restart
> the build --even when everything has to be rebuilt. So I'm
> trying to find and provide enough evidence to help someone fix
> the problems that are observed to block such buildworld
> activity.
>
> Again: others have reported such arm64 problems on the lists
> before I ever got into this activity. The evidence is that
> the issues are not a local property of my environment.
>
> Swapping is supposed to work. I can do buildworld (-j 4)
> on armv6 (really -mcpu=cortex-a7 so armv7-a) and the
> swapping it causes works fine. This is true for both a
> bpim3 (2 GiBytes of RAM) and a rpi2 (1 GiByte of RAM
> so even more swapping). On a powerpc64 with 16 GiBytes
> I've built things that caused 26 GiBytes of swap to be
> in use some of the time (during 4 ld's running in
> parallel), with lots of processes having zero for
> RES(ident memory) and all their space listed under SWAP
> in a "top -PCwaopid" display. This too has no problems
> with swapping of previously forked processes (or of any
> other processes).
>
> For the likes of a Pine64+ 2GB to be "self hosted"
> for source-code based updates, swapping of previously
> forked processes must work and currently such
> swapping is unreliable.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
In reply to this post by Mark Millard-2
A new, significant discovery follows. . .

While checking out use of procstat -v I ran
into the following common property for the 3
programs that I looked at:

A) My small test program that fails for
   a dynamically allocated space.

B) sh reporting Failed assertion: "tsd_booted".

C) su reporting Failed assertion: "tsd_booted".

Here are example addresses from the area of
incorrectly zeroed memory (A then B then C):

(lldb) print dyn_region
(region *volatile) $0 = 0x0000000040616000

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x0000000040618520

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x0000000040618520

The first is from dynamic allocation ending up
in the area. The other two are from libc.so.7
globals/statics ending up in the general area.

It looks like something is trashing a specific
memory area for some reason, rather independently
of what the program specifics are.


Other notes:

At least for my small program showing failure:

Being explicit about the combined conditions for failure
for my test program. . .

Both tcache enabled and allocations fitting in SMALL_MAXCLASS
are required in order to make the program fail.

Note:

lldb) print __je_tcache_maxclass
(size_t) $0 = 32768

which is larger than SMALL_MAXCLASS. I've not observed
failures for sizes above SMALL_MAXCLASS but not exceeding
__je_tcache_maxclass.

Thus tcache use by itself does not seen sufficient for
my program to get corruption of its dynamically allocated
memory: the small allocation size also matters.


Be warned that I can not eliminate the possibility that
the trashing changed what region of memory it trashed
for larger allocations or when tcache is disabled.


===
Mark Millard
markmi at dsl-only.net


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2

On 2017-Mar-18, at 5:53 PM, Mark Millard <[hidden email]> wrote:

> A new, significant discovery follows. . .
>
> While checking out use of procstat -v I ran
> into the following common property for the 3
> programs that I looked at:
>
> A) My small test program that fails for
>   a dynamically allocated space.
>
> B) sh reporting Failed assertion: "tsd_booted".
>
> C) su reporting Failed assertion: "tsd_booted".
>
> Here are example addresses from the area of
> incorrectly zeroed memory (A then B then C):
>
> (lldb) print dyn_region
> (region *volatile) $0 = 0x0000000040616000
>
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x0000000040618520
>
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x0000000040618520

That last above was a copy/paste error. Correction:

(lldb) print &__je_tsd_booted
(bool *) $0 = 0x000000004061d520

> The first is from dynamic allocation ending up
> in the area. The other two are from libc.so.7
> globals/statics ending up in the general area.
>
> It looks like something is trashing a specific
> memory area for some reason, rather independently
> of what the program specifics are.
>
>
> Other notes:
>
> At least for my small program showing failure:
>
> Being explicit about the combined conditions for failure
> for my test program. . .
>
> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
> are required in order to make the program fail.
>
> Note:
>
> lldb) print __je_tcache_maxclass
> (size_t) $0 = 32768
>
> which is larger than SMALL_MAXCLASS. I've not observed
> failures for sizes above SMALL_MAXCLASS but not exceeding
> __je_tcache_maxclass.
>
> Thus tcache use by itself does not seen sufficient for
> my program to get corruption of its dynamically allocated
> memory: the small allocation size also matters.
>
>
> Be warned that I can not eliminate the possibility that
> the trashing changed what region of memory it trashed
> for larger allocations or when tcache is disabled.

The pine64+ 2GB eventually got into a state where:

/etc/malloc.conf -> tcache:false

made no difference and the failure kept occurring
with that symbolic link in place.

But after a reboot of the pin46+ 2GB
/etc/malloc.conf -> tcache:false was again effective
for my test program. (It was still present from
before the reboot.)

I checked the .core files and the allocated address
assigned to dyn_region was the same in the tries
before and after the reboot. (I had put in an
additional raise(SIGABRT) so I'd always have
a core file to look at.)

Apparently /etc/malloc.conf -> tcache:false was
being ignored before the reboot for some reason?


===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
On 2017-Mar-18, at 9:10 PM, Mark Millard <[hidden email]> wrote:

>
> On 2017-Mar-18, at 5:53 PM, Mark Millard <[hidden email]> wrote:
>
>> A new, significant discovery follows. . .
>>
>> While checking out use of procstat -v I ran
>> into the following common property for the 3
>> programs that I looked at:
>>
>> A) My small test program that fails for
>>  a dynamically allocated space.
>>
>> B) sh reporting Failed assertion: "tsd_booted".
>>
>> C) su reporting Failed assertion: "tsd_booted".
>>
>> Here are example addresses from the area of
>> incorrectly zeroed memory (A then B then C):
>>
>> (lldb) print dyn_region
>> (region *volatile) $0 = 0x0000000040616000
>>
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x0000000040618520
>>
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x0000000040618520
>
> That last above was a copy/paste error. Correction:
>
> (lldb) print &__je_tsd_booted
> (bool *) $0 = 0x000000004061d520
>
>> The first is from dynamic allocation ending up
>> in the area. The other two are from libc.so.7
>> globals/statics ending up in the general area.
>>
>> It looks like something is trashing a specific
>> memory area for some reason, rather independently
>> of what the program specifics are.

I probably should have noted that the processes
involved were: child/parent then grandparent
and then great grandparent. The grandparent
was sh and the great grandparent was su.

The ancestors in the process tree are being
damaged, not just the instances of the
program that demonstrates the problem.

>> Other notes:
>>
>> At least for my small program showing failure:
>>
>> Being explicit about the combined conditions for failure
>> for my test program. . .
>>
>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>> are required in order to make the program fail.
>>
>> Note:
>>
>> lldb) print __je_tcache_maxclass
>> (size_t) $0 = 32768
>>
>> which is larger than SMALL_MAXCLASS. I've not observed
>> failures for sizes above SMALL_MAXCLASS but not exceeding
>> __je_tcache_maxclass.
>>
>> Thus tcache use by itself does not seen sufficient for
>> my program to get corruption of its dynamically allocated
>> memory: the small allocation size also matters.
>>
>>
>> Be warned that I can not eliminate the possibility that
>> the trashing changed what region of memory it trashed
>> for larger allocations or when tcache is disabled.
>
> The pine64+ 2GB eventually got into a state where:
>
> /etc/malloc.conf -> tcache:false
>
> made no difference and the failure kept occurring
> with that symbolic link in place.
>
> But after a reboot of the pin46+ 2GB
> /etc/malloc.conf -> tcache:false was again effective
> for my test program. (It was still present from
> before the reboot.)
>
> I checked the .core files and the allocated address
> assigned to dyn_region was the same in the tries
> before and after the reboot. (I had put in an
> additional raise(SIGABRT) so I'd always have
> a core file to look at.)
>
> Apparently /etc/malloc.conf -> tcache:false was
> being ignored before the reboot for some reason?

I have also discovered that if the child process
in an example like my program does a:

(void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);

after the fork but before the sleep/swap-out/wait
then the problem does not happen. This is without
any read or write access to the memory between the
fork and sleep/swap-out/wait.

By contrast such POSIX_MADV_WILLNEED use in the parent
process does not change the failure behavior.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]

Mark Millard-2
On 2017-Mar-21, at 7:21 PM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Mar-18, at 9:10 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
>>
>> On 2017-Mar-18, at 5:53 PM, Mark Millard <markmi at dsl-only.net> wrote:
>>
>>> A new, significant discovery follows. . .
>>>
>>> While checking out use of procstat -v I ran
>>> into the following common property for the 3
>>> programs that I looked at:
>>>
>>> A) My small test program that fails for
>>> a dynamically allocated space.
>>>
>>> B) sh reporting Failed assertion: "tsd_booted".
>>>
>>> C) su reporting Failed assertion: "tsd_booted".
>>>
>>> Here are example addresses from the area of
>>> incorrectly zeroed memory (A then B then C):
>>>
>>> (lldb) print dyn_region
>>> (region *volatile) $0 = 0x0000000040616000
>>>
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x0000000040618520
>>>
>>> (lldb) print &__je_tsd_booted
>>> (bool *) $0 = 0x0000000040618520
>>
>> That last above was a copy/paste error. Correction:
>>
>> (lldb) print &__je_tsd_booted
>> (bool *) $0 = 0x000000004061d520
>>
>>> The first is from dynamic allocation ending up
>>> in the area. The other two are from libc.so.7
>>> globals/statics ending up in the general area.
>>>
>>> It looks like something is trashing a specific
>>> memory area for some reason, rather independently
>>> of what the program specifics are.
>
> I probably should have noted that the processes
> involved were: child/parent then grandparent
> and then great grandparent. The grandparent
> was sh and the great grandparent was su.
>
> The ancestors in the process tree are being
> damaged, not just the instances of the
> program that demonstrates the problem.
>
>>> Other notes:
>>>
>>> At least for my small program showing failure:
>>>
>>> Being explicit about the combined conditions for failure
>>> for my test program. . .
>>>
>>> Both tcache enabled and allocations fitting in SMALL_MAXCLASS
>>> are required in order to make the program fail.
>>>
>>> Note:
>>>
>>> lldb) print __je_tcache_maxclass
>>> (size_t) $0 = 32768
>>>
>>> which is larger than SMALL_MAXCLASS. I've not observed
>>> failures for sizes above SMALL_MAXCLASS but not exceeding
>>> __je_tcache_maxclass.
>>>
>>> Thus tcache use by itself does not seen sufficient for
>>> my program to get corruption of its dynamically allocated
>>> memory: the small allocation size also matters.
>>>
>>>
>>> Be warned that I can not eliminate the possibility that
>>> the trashing changed what region of memory it trashed
>>> for larger allocations or when tcache is disabled.
>>
>> The pine64+ 2GB eventually got into a state where:
>>
>> /etc/malloc.conf -> tcache:false
>>
>> made no difference and the failure kept occurring
>> with that symbolic link in place.
>>
>> But after a reboot of the pin46+ 2GB
>> /etc/malloc.conf -> tcache:false was again effective
>> for my test program. (It was still present from
>> before the reboot.)
>>
>> I checked the .core files and the allocated address
>> assigned to dyn_region was the same in the tries
>> before and after the reboot. (I had put in an
>> additional raise(SIGABRT) so I'd always have
>> a core file to look at.)
>>
>> Apparently /etc/malloc.conf -> tcache:false was
>> being ignored before the reboot for some reason?
>
> I have also discovered that if the child process
> in an example like my program does a:
>
> (void) posix_madvise(dyn_region, region_size, POSIX_MADV_WILLNEED);
>
> after the fork but before the sleep/swap-out/wait
> then the problem does not happen. This is without
> any read or write access to the memory between the
> fork and sleep/swap-out/wait.
>
> By contrast such POSIX_MADV_WILLNEED use in the parent
> process does not change the failure behavior.

I've added another test program to bugzilla
217239 and 217138, one with thousands of 14
KiByte allocations.

The test program usually ends up with them all being
zeroed in the parent and child of the fork.

But I've had a couple of runs where a much smaller
prefix was messed up and then there were normal,
expected values.

#define region_size (14u*1024u)
. . .
#define num_regions (256u*1024u*1024u/region_size)

So num_regions==18724, using up most of 256 MiBytes.

Note: each region has its own 14 KiByte allocation.

But dyn_regions[1296].array[0] in one example was
the first normal value.

In another example dyn_regions[2180].array[4096] was
the first normal value.

The last is interesting for being part way through
an allocation's space. That but aligning with a 4
KiByte page size would seem odd for a pure-jemalloc
issue.

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[hidden email]"