kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

freebsd-ppc mailing list
This leads up to questioning if .sbss and .bss
in /sbin/init are always correctly zeroed. But
I may have missed something in the sequencing.
(The code is not familiar material.)

If I've tracked it down right:

sys/kern/kern_exec.c uses kern_execve top deal with
starting up /sbin/init.

kern_execve uses do_execve.

do_execve uses:

/*
 * Each of the items is a pointer to a `const struct execsw', hence the
 * double pointer here.
 */
static const struct execsw **execsw;
. . .
        /*
         *      Loop through the list of image activators, calling each one.
         *      An activator returns -1 if there is no match, 0 on success,
         *      and an error otherwise.
         */
        for (i = 0; error == -1 && execsw[i]; ++i) {
                if (execsw[i]->ex_imgact == NULL ||
                    execsw[i]->ex_imgact == img_first) {
                        continue;
                }
                error = (*execsw[i]->ex_imgact)(imgp);
        }

/usr/src/sys/kern/imgact_elf.c has:

/*
 * Tell kern_execve.c about it, with a little help from the linker.
 */
static struct execsw __elfN(execsw) = {
        .ex_imgact = __CONCAT(exec_, __elfN(imgact)),
        .ex_name = __XSTRING(__CONCAT(ELF, __ELF_WORD_SIZE))
};
EXEC_SET(__CONCAT(elf, __ELF_WORD_SIZE), __elfN(execsw));


__CONCAT(exec_, __elfN(imgact)) uses
__elfN(load_sections) .

__elfN(load_sections) uses __elfN(load_section).

__elfN(load_section) uses vm_imgact_map_page
to set up for its copyout. This appears to be
how the FileSiz (not including .sbss or .bss)
vs. MemSiz (including .sbss and .bss) is
handled (attempted?).

vm_imgact_map_page uses vm_imgact_hold_page.

vm_imgact_hold_page uses vm_pager_get_pages.

vm_pager_get_pages uses vm_page_zero_invalid
to "Zero out partially filled data".

But vm_page_zero_invalid does not zero every "invalid"
byte but works in terms of units of DEV_BSIZE :

void
vm_page_zero_invalid(vm_page_t m, boolean_t setvalid)
{
        int b;
        int i;

        VM_OBJECT_ASSERT_WLOCKED(m->object);
        /*
         * Scan the valid bits looking for invalid sections that
         * must be zeroed.  Invalid sub-DEV_BSIZE'd areas ( where the
         * valid bit may be set ) have already been zeroed by
         * vm_page_set_validclean().
         */
        for (b = i = 0; i <= PAGE_SIZE / DEV_BSIZE; ++i) {
                if (i == (PAGE_SIZE / DEV_BSIZE) ||
                    (m->valid & ((vm_page_bits_t)1 << i))) {
                        if (i > b) {
                                pmap_zero_page_area(m,
                                    b << DEV_BSHIFT, (i - b) << DEV_BSHIFT);
                        }
                        b = i + 1;
                }
        }

        /*
         * setvalid is TRUE when we can safely set the zero'd areas
         * as being valid.  We can do this if there are no cache consistancy
         * issues.  e.g. it is ok to do with UFS, but not ok to do with NFS.
         */
        if (setvalid)
                m->valid = VM_PAGE_BITS_ALL;
}

The comment indicates that areas of "sub-DEV_BSIZE"
should have been handled previously by
vm_page_set_validclean .

But no part of the sequence appears to use
vm_page_set_validclean .


So, if, say, char**environ ends up at the start of .sbss
consistently, does environ always end up zeroed independently
of FileSz for the PT_LOAD that spans them?

The following is not necessarily an example of problematical
figures but is just for showing an example structure of what
FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
and .bss :

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x01800000 0x01800000 0x1222dc 0x1222dc R E 0x10000
  LOAD           0x123000 0x01933000 0x01933000 0x0618c 0x32e88 RWE 0x10000
  NOTE           0x0000d4 0x018000d4 0x018000d4 0x00048 0x00048 R   0x4
  TLS            0x123000 0x01933000 0x01933000 0x00b10 0x00b1d R   0x10
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

 Section to Segment mapping:
  Segment Sections...
   00     .note.tag .init .text .fini .rodata .eh_frame
   01     .tdata .tbss .init_array .fini_array .ctors .dtors .jcr .data.rel.ro .data .got .sbss .bss
   02     .note.tag
   03     .tdata .tbss
   04    
There are 24 section headers, starting at offset 0x14eb20:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            018000d4 0000d4 000048 00   A  0   0  4
  [ 2] .init             PROGBITS        0180011c 00011c 000034 00  AX  0   0  4
  [ 3] .text             PROGBITS        01800150 000150 111e14 00  AX  0   0 16
  [ 4] .fini             PROGBITS        01911f64 111f64 000030 00  AX  0   0  4
  [ 5] .rodata           PROGBITS        01911fc0 111fc0 010318 00   A  0   0 64
  [ 6] .eh_frame         PROGBITS        019222d8 1222d8 000004 00   A  0   0  4
  [ 7] .tdata            PROGBITS        01933000 123000 000b10 00 WAT  0   0 16
  [ 8] .tbss             NOBITS          01933b10 123b10 00000d 00 WAT  0   0  4
  [ 9] .init_array       INIT_ARRAY      01933b10 123b10 000008 04  WA  0   0  4
  [10] .fini_array       FINI_ARRAY      01933b18 123b18 000004 04  WA  0   0  4
  [11] .ctors            PROGBITS        01933b1c 123b1c 000008 00  WA  0   0  4
  [12] .dtors            PROGBITS        01933b24 123b24 000008 00  WA  0   0  4
  [13] .jcr              PROGBITS        01933b2c 123b2c 000004 00  WA  0   0  4
  [14] .data.rel.ro      PROGBITS        01933b30 123b30 002ee4 00  WA  0   0  4
  [15] .data             PROGBITS        01936a18 126a18 002763 00  WA  0   0  8
  [16] .got              PROGBITS        0193917c 12917c 000010 04 WAX  0   0  4
  [17] .sbss             NOBITS          0193918c 12918c 0000b0 00  WA  0   0  4
  [18] .bss              NOBITS          01939240 12918c 02cc48 00  WA  0   0 64
  [19] .comment          PROGBITS        00000000 12918c 0073d4 01  MS  0   0  1
  [20] .gnu_debuglink    PROGBITS        00000000 130560 000010 00      0   0  4
  [21] .symtab           SYMTAB          00000000 130570 00fc40 10     22 1681  4
  [22] .strtab           STRTAB          00000000 1401b0 00e8b3 00      0   0  1
  [23] .shstrtab         STRTAB          00000000 14ea63 0000bc 00      0   0  1
. . .
  2652: 000000000193918c     4 OBJECT  GLOBAL DEFAULT   17 environ


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

Conrad Meyer-2
Hi Mark,

On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
<[hidden email]> wrote:

> ...
> vm_pager_get_pages uses vm_page_zero_invalid
> to "Zero out partially filled data".
>
> But vm_page_zero_invalid does not zero every "invalid"
> byte but works in terms of units of DEV_BSIZE :
> ...
> The comment indicates that areas of "sub-DEV_BSIZE"
> should have been handled previously by
> vm_page_set_validclean .

Or another VM routine, yes (e.g., vm_page_set_valid_range).  The valid
and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
region, so care must be taken when marking any sub-DEV_BSIZE region as
valid to zero out the rest of the DEV_BSIZE region.  This is part of
the VM page contract.  I'm not sure it's related to the BSS, though.

> So, if, say, char**environ ends up at the start of .sbss
> consistently, does environ always end up zeroed independently
> of FileSz for the PT_LOAD that spans them?

It is required to be zeroed, yes.  If not, there is a bug.  If FileSz
covers BSS, that's a bug in the linker.  Either the trailing bytes of
the corresponding page in the executable should be zero (wasteful; on
amd64 ".comment" is packed in there instead), or the linker/loader
must zero them at initialization.  I'm not familiar with the
particular details here, but if you are interested I would suggest
looking at __elfN(load_section) in sys/kern/imgact_elf.c.

> The following is not necessarily an example of problematical
> figures but is just for showing an example structure of what
> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
> and .bss :
> ...

Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
Offset minus Segment .tdata Offset, i.e., none of the FileSiz
corresponds to the (s)bss regions.  (Good!  At least the static linker
part looks sane.)  That said, the boundary is not page-aligned and the
section alignment requirement is much lower than page_size, so the
beginning of bss will share a file page with some data.  Something
should zero it at image activation.

(Tangent: sbss/bss probably do not need to be RWE on PPC!  On amd64,
init has three LOAD segments rather than two: one for rodata (R), one
for .text, .init, etc (RX); and one for .data (RW).)

Best,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

freebsd-ppc mailing list
[Looks like Conrad M. is partially confirming my trace of the
issue is reasonable.]

On 2019-Jun-10, at 07:37, Conrad Meyer <[hidden email]> wrote:

> Hi Mark,
>
> On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
> <[hidden email]> wrote:
>> ...
>> vm_pager_get_pages uses vm_page_zero_invalid
>> to "Zero out partially filled data".
>>
>> But vm_page_zero_invalid does not zero every "invalid"
>> byte but works in terms of units of DEV_BSIZE :
>> ...
>> The comment indicates that areas of "sub-DEV_BSIZE"
>> should have been handled previously by
>> vm_page_set_validclean .
>
> Or another VM routine, yes (e.g., vm_page_set_valid_range).  The valid
> and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
> region, so care must be taken when marking any sub-DEV_BSIZE region as
> valid to zero out the rest of the DEV_BSIZE region.  This is part of
> the VM page contract.  I'm not sure it's related to the BSS, though.

Yea, I had written from what I'd seen in __elfN(load_section):

QUOTE
__elfN(load_section) uses vm_imgact_map_page
to set up for its copyout. This appears to be
how the FileSiz (not including .sbss or .bss)
vs. MemSiz (including .sbss and .bss) is
handled (attempted?).
END QUOTE

The copyout only copies through the last byte for filesz
but the vm_imgact_map_page does not zero out all the
bytes after that on that page:

        /*
         * We have to get the remaining bit of the file into the first part
         * of the oversized map segment.  This is normally because the .data
         * segment in the file is extended to provide bss.  It's a neat idea
         * to try and save a page, but it's a pain in the behind to implement.
         */
        copy_len = filsz == 0 ? 0 : (offset + filsz) - trunc_page(offset +
            filsz);
        map_addr = trunc_page((vm_offset_t)vmaddr + filsz);
        map_len = round_page((vm_offset_t)vmaddr + memsz) - map_addr;
. . .
        if (copy_len != 0) {
                sf = vm_imgact_map_page(object, offset + filsz);
                if (sf == NULL)
                        return (EIO);

                /* send the page fragment to user space */
                off = trunc_page(offset + filsz) - trunc_page(offset + filsz);
                error = copyout((caddr_t)sf_buf_kva(sf) + off,
                    (caddr_t)map_addr, copy_len);
                vm_imgact_unmap_page(sf);
                if (error != 0)
                        return (error);
        }

I looked into the details of the DEV_BSIZE code after sending
the original message and so realized that my provided example
/sbin/init readelf material was a good example of the issue
if I'd not missed something.

>> So, if, say, char**environ ends up at the start of .sbss
>> consistently, does environ always end up zeroed independently
>> of FileSz for the PT_LOAD that spans them?
>
> It is required to be zeroed, yes.  If not, there is a bug.  If FileSz
> covers BSS, that's a bug in the linker.  Either the trailing bytes of
> the corresponding page in the executable should be zero (wasteful; on
> amd64 ".comment" is packed in there instead), or the linker/loader
> must zero them at initialization.  I'm not familiar with the
> particular details here, but if you are interested I would suggest
> looking at __elfN(load_section) in sys/kern/imgact_elf.c.

I had looked at it some, see the material around the earlier quote
above.

>> The following is not necessarily an example of problematical
>> figures but is just for showing an example structure of what
>> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
>> and .bss :
>> ...
>
> Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
> Offset minus Segment .tdata Offset, i.e., none of the FileSiz
> corresponds to the (s)bss regions.  (Good!  At least the static linker
> part looks sane.)  That said, the boundary is not page-aligned and the
> section alignment requirement is much lower than page_size, so the
> beginning of bss will share a file page with some data.  Something
> should zero it at image activation.

And, so far, I've not found anything in _start or before that does
zero any "sub-DEV_BSIZE" part after FileSz for the PT_LOAD in
question.

Thanks for checking my trace of the issue. It is good to have some
confirmation that I'd not missed something.

> (Tangent: sbss/bss probably do not need to be RWE on PPC!  On amd64,
> init has three LOAD segments rather than two: one for rodata (R), one
> for .text, .init, etc (RX); and one for .data (RW).)

Yea, the section header flags indicate just WA for .sbss and .bss (but
WAX for .got).

But such is more general: for example, the beginning of .rodata
(not executable) shares the tail part of a page with .fini
(executable) in the example. .got has executable code but is in
the middle of sections that do not. For something like /sbin/init it
is so small that the middle of a page can be the only part that is
executable, as in the example. (It is not forced onto its own page.)

The form of .got used is also writable: WAX for section header flags.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

freebsd-ppc mailing list
[I decided to compare some readelf information from some
other architectures. I was surprised by some of it. But
.bss seems to be forced to start with a large alignment
to avoid such issues as I originally traced.]

On 2019-Jun-10, at 11:24, Mark Millard <marklmi at yahoo.com> wrote:

> [Looks like Conrad M. is partially confirming my trace of the
> issue is reasonable.]
>
> On 2019-Jun-10, at 07:37, Conrad Meyer <[hidden email]> wrote:
>
>> Hi Mark,
>>
>> On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
>> <[hidden email]> wrote:
>>> ...
>>> vm_pager_get_pages uses vm_page_zero_invalid
>>> to "Zero out partially filled data".
>>>
>>> But vm_page_zero_invalid does not zero every "invalid"
>>> byte but works in terms of units of DEV_BSIZE :
>>> ...
>>> The comment indicates that areas of "sub-DEV_BSIZE"
>>> should have been handled previously by
>>> vm_page_set_validclean .
>>
>> Or another VM routine, yes (e.g., vm_page_set_valid_range).  The valid
>> and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
>> region, so care must be taken when marking any sub-DEV_BSIZE region as
>> valid to zero out the rest of the DEV_BSIZE region.  This is part of
>> the VM page contract.  I'm not sure it's related to the BSS, though.
>
> Yea, I had written from what I'd seen in __elfN(load_section):
>
> QUOTE
> __elfN(load_section) uses vm_imgact_map_page
> to set up for its copyout. This appears to be
> how the FileSiz (not including .sbss or .bss)
> vs. MemSiz (including .sbss and .bss) is
> handled (attempted?).
> END QUOTE
>
> The copyout only copies through the last byte for filesz
> but the vm_imgact_map_page does not zero out all the
> bytes after that on that page:
>
>        /*
>         * We have to get the remaining bit of the file into the first part
>         * of the oversized map segment.  This is normally because the .data
>         * segment in the file is extended to provide bss.  It's a neat idea
>         * to try and save a page, but it's a pain in the behind to implement.
>         */
>        copy_len = filsz == 0 ? 0 : (offset + filsz) - trunc_page(offset +
>            filsz);
>        map_addr = trunc_page((vm_offset_t)vmaddr + filsz);
>        map_len = round_page((vm_offset_t)vmaddr + memsz) - map_addr;
> . . .
>        if (copy_len != 0) {
>                sf = vm_imgact_map_page(object, offset + filsz);
>                if (sf == NULL)
>                        return (EIO);
>
>                /* send the page fragment to user space */
>                off = trunc_page(offset + filsz) - trunc_page(offset + filsz);
>                error = copyout((caddr_t)sf_buf_kva(sf) + off,
>                    (caddr_t)map_addr, copy_len);
>                vm_imgact_unmap_page(sf);
>                if (error != 0)
>                        return (error);
>        }
>
> I looked into the details of the DEV_BSIZE code after sending
> the original message and so realized that my provided example
> /sbin/init readelf material was a good example of the issue
> if I'd not missed something.
>
>>> So, if, say, char**environ ends up at the start of .sbss
>>> consistently, does environ always end up zeroed independently
>>> of FileSz for the PT_LOAD that spans them?
>>
>> It is required to be zeroed, yes.  If not, there is a bug.  If FileSz
>> covers BSS, that's a bug in the linker.  Either the trailing bytes of
>> the corresponding page in the executable should be zero (wasteful; on
>> amd64 ".comment" is packed in there instead), or the linker/loader
>> must zero them at initialization.  I'm not familiar with the
>> particular details here, but if you are interested I would suggest
>> looking at __elfN(load_section) in sys/kern/imgact_elf.c.
>
> I had looked at it some, see the material around the earlier quote
> above.
>
>>> The following is not necessarily an example of problematical
>>> figures but is just for showing an example structure of what
>>> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
>>> and .bss :
>>> ...
>>
>> Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
>> Offset minus Segment .tdata Offset, i.e., none of the FileSiz
>> corresponds to the (s)bss regions.  (Good!  At least the static linker
>> part looks sane.)  That said, the boundary is not page-aligned and the
>> section alignment requirement is much lower than page_size, so the
>> beginning of bss will share a file page with some data.  Something
>> should zero it at image activation.
>
> And, so far, I've not found anything in _start or before that does
> zero any "sub-DEV_BSIZE" part after FileSz for the PT_LOAD in
> question.
>
> Thanks for checking my trace of the issue. It is good to have some
> confirmation that I'd not missed something.
>
>> (Tangent: sbss/bss probably do not need to be RWE on PPC!  On amd64,
>> init has three LOAD segments rather than two: one for rodata (R), one
>> for .text, .init, etc (RX); and one for .data (RW).)
>
> Yea, the section header flags indicate just WA for .sbss and .bss (but
> WAX for .got).
>
> But such is more general: for example, the beginning of .rodata
> (not executable) shares the tail part of a page with .fini
> (executable) in the example. .got has executable code but is in
> the middle of sections that do not. For something like /sbin/init it
> is so small that the middle of a page can be the only part that is
> executable, as in the example. (It is not forced onto its own page.)
>
> The form of .got used is also writable: WAX for section header flags.



amd64's /sbin/init :

There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
  LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
  LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
  TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
  GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
  GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00    
   01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
   02     .text .init .fini .plt
   03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
   04     .tdata .tbss
   05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
   06     .eh_frame_hdr
   07    
   08     .note.tag
There are 27 section headers, starting at offset 0x157938:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
  [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
  [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
  [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
  [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
  [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
  [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
  [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
  [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
  [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
  [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
  [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
  [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
  [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
  [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
  [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
  [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
  [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
  [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
  [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
  [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
  [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
  [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
  [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
  [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
  [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1

Note that there is space after .finit_array+8 before .bss starts
with a sizable alignment. The MemSiz for 03 does span .bss .

armv7's /sbin/init is different about MemSiz spanning .bss:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00010034 0x00010034 0x00120 0x00120 R   0x4
  LOAD           0x000000 0x00010000 0x00010000 0x10674 0x10674 R   0x1000
  LOAD           0x011000 0x00021000 0x00021000 0xe9c54 0xe9c54 R E 0x1000
  LOAD           0x0fb000 0x0010b000 0x0010b000 0x03b88 0x30ccd RW  0x1000
  TLS            0x0fe000 0x0010e000 0x0010e000 0x00b60 0x00b70 R   0x20
  GNU_RELRO      0x0fe000 0x0010e000 0x0010e000 0x00b88 0x00b88 R   0x1
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0
  NOTE           0x000154 0x00010154 0x00010154 0x00064 0x00064 R   0x4
  ARM_EXIDX      0x0001b8 0x000101b8 0x000101b8 0x00220 0x00220 R   0x4

(NOTE: 0x0010b000+0x30ccd==0x13BCCD . Compare this to the later .bss
Addr of 0x10f000.)

 Section to Segment mapping:
  Segment Sections...
   00    
   01     .note.tag .ARM.exidx .rodata .ARM.extab
   02     .text .init .fini
   03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss
   04     .tdata .tbss
   05     .tdata .tbss .jcr .init_array .fini_array .got
   06    
   07     .note.tag
   08     .ARM.exidx
There are 24 section headers, starting at offset 0x12be3c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            00010154 000154 000064 00   A  0   0  4
  [ 2] .ARM.exidx        ARM_EXIDX       000101b8 0001b8 000220 00   A  5   0  4
  [ 3] .rodata           PROGBITS        00010400 000400 01022c 00 AMS  0   0 64
  [ 4] .ARM.extab        PROGBITS        0002062c 01062c 000048 00   A  0   0  4
  [ 5] .text             PROGBITS        00021000 011000 0e9c14 00  AX  0   0 128
  [ 6] .init             PROGBITS        0010ac20 0fac20 000014 00  AX  0   0 16
  [ 7] .fini             PROGBITS        0010ac40 0fac40 000014 00  AX  0   0 16
  [ 8] .data             PROGBITS        0010b000 0fb000 002734 00  WA  0   0  8
  [ 9] .tdata            PROGBITS        0010e000 0fe000 000b60 00 WAT  0   0 16
  [10] .tbss             NOBITS          0010eb60 0feb60 000010 00 WAT  0   0  4
  [11] .jcr              PROGBITS        0010eb60 0feb60 000000 00  WA  0   0  4
  [12] .init_array       INIT_ARRAY      0010eb60 0feb60 000008 00  WA  0   0  4
  [13] .fini_array       FINI_ARRAY      0010eb68 0feb68 000004 00  WA  0   0  4
  [14] .got              PROGBITS        0010eb6c 0feb6c 00001c 00  WA  0   0  4
  [15] .bss              NOBITS          0010f000 0feb88 02cccd 00  WA  0   0 64
  [16] .comment          PROGBITS        00000000 0feb88 0074b6 01  MS  0   0  1
  [17] .ARM.attributes   ARM_ATTRIBUTES  00000000 10603e 00004f 00      0   0  1
  [18] .gnu.warning.mkte PROGBITS        00000000 10608d 000043 00      0   0  1
  [19] .gnu.warning.f_pr PROGBITS        00000000 1060d0 000043 00      0   0  1
  [20] .gnu_debuglink    PROGBITS        00000000 11b314 000010 00      0   0  1
  [21] .shstrtab         STRTAB          00000000 11b324 0000e3 00      0   0  1
  [22] .symtab           SYMTAB          00000000 106114 015200 10     23 3063  4
  [23] .strtab           STRTAB          00000000 11b407 010a32 00      0   0  1

Note that there is space after .got+0x1c before .bss starts
with a sizable alignment. The MemSiz for 03 does *not* span
.bss , unlike for amd64 (and the rest).


aarch64's /sbin/init is similar to amd64 instead of armv7:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x01624f 0x01624f R   0x10000
  LOAD           0x020000 0x0000000000220000 0x0000000000220000 0x0dd354 0x0dd354 R E 0x10000
  LOAD           0x100000 0x0000000000300000 0x0000000000300000 0x011840 0x252111 RW  0x10000
  TLS            0x110000 0x0000000000310000 0x0000000000310000 0x001800 0x001820 R   0x40
  GNU_RELRO      0x110000 0x0000000000310000 0x0000000000310000 0x001840 0x001840 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000200 0x0000000000200200 0x0000000000200200 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00    
   01     .note.tag .rodata
   02     .text .init .fini
   03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss
   04     .tdata .tbss
   05     .tdata .tbss .jcr .init_array .fini_array .got
   06    
   07     .note.tag
There are 21 section headers, starting at offset 0x14b6f0:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200200 000200 000048 00   A  0   0  4
  [ 2] .rodata           PROGBITS        0000000000200280 000280 015fcf 00 AMS  0   0 64
  [ 3] .text             PROGBITS        0000000000220000 020000 0dd31c 00  AX  0   0 64
  [ 4] .init             PROGBITS        00000000002fd320 0fd320 000014 00  AX  0   0 16
  [ 5] .fini             PROGBITS        00000000002fd340 0fd340 000014 00  AX  0   0 16
  [ 6] .data             PROGBITS        0000000000300000 100000 003a20 00  WA  0   0 16
  [ 7] .tdata            PROGBITS        0000000000310000 110000 001800 00 WAT  0   0 16
  [ 8] .tbss             NOBITS          0000000000311800 111800 000020 00 WAT  0   0  8
  [ 9] .jcr              PROGBITS        0000000000311800 111800 000000 00  WA  0   0  8
  [10] .init_array       INIT_ARRAY      0000000000311800 111800 000018 00  WA  0   0  8
  [11] .fini_array       FINI_ARRAY      0000000000311818 111818 000008 00  WA  0   0  8
  [12] .got              PROGBITS        0000000000311820 111820 000020 00  WA  0   0  8
  [13] .bss              NOBITS          0000000000320000 111840 232111 00  WA  0   0 64
  [14] .comment          PROGBITS        0000000000000000 111840 007191 01  MS  0   0  1
  [15] .gnu.warning.mkte PROGBITS        0000000000000000 1189d1 000043 00      0   0  1
  [16] .gnu.warning.f_pr PROGBITS        0000000000000000 118a14 000043 00      0   0  1
  [17] .gnu_debuglink    PROGBITS        0000000000000000 13b7f8 000010 00      0   0  1
  [18] .shstrtab         STRTAB          0000000000000000 13b808 0000bd 00      0   0  1
  [19] .symtab           SYMTAB          0000000000000000 118a58 022da0 18     20 3621  8
  [20] .strtab           STRTAB          0000000000000000 13b8c5 00fe2b 00      0   0  1

Note that there is space after .got+0x20 before .bss starts
with a sizable alignment. The MemSiz for 03 does span
.bss , like for amd64 (and all but armv7).

powerpc64's /sbin/init is similar to amd64 as well:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
  LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
  LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
  TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
  GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
  GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00    
   01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
   02     .text .init .fini .plt
   03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
   04     .tdata .tbss
   05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
   06     .eh_frame_hdr
   07    
   08     .note.tag
There are 27 section headers, starting at offset 0x157938:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
  [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
  [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
  [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
  [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
  [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
  [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
  [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
  [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
  [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
  [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
  [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
  [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
  [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
  [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
  [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
  [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
  [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
  [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
  [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
  [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
  [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
  [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
  [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
  [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
  [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1


Note that there is space after .fini_array+8 before .bss starts
with a sizable alignment. The MemSiz for 03 does span
.bss , like for amd64 (and all but armv7).

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

freebsd-ppc mailing list
[I have confirmed .sbss not being zero'd out and environ
thereby starting out non-zero (garbage): a
debug.minidump=0 style dump.]

> On 2019-Jun-10, at 16:19, Mark Millard <[hidden email]> wrote:
>
> [Forcing an appropriate large .sbss alignment was not enough
> to avoid the clang-based problem for *sp++ related environ
> code in _init_tls .]
>
> On 2019-Jun-10, at 12:20, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [I decided to compare some readelf information from some
>> other architectures. I was surprised by some of it. But
>> .bss seems to be forced to start with a large alignment
>> to avoid such issues as I originally traced.]
>>
>> On 2019-Jun-10, at 11:24, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> [Looks like Conrad M. is partially confirming my trace of the
>>> issue is reasonable.]
>>>
>>> On 2019-Jun-10, at 07:37, Conrad Meyer <[hidden email]> wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
>>>> <[hidden email]> wrote:
>>>>> ...
>>>>> vm_pager_get_pages uses vm_page_zero_invalid
>>>>> to "Zero out partially filled data".
>>>>>
>>>>> But vm_page_zero_invalid does not zero every "invalid"
>>>>> byte but works in terms of units of DEV_BSIZE :
>>>>> ...
>>>>> The comment indicates that areas of "sub-DEV_BSIZE"
>>>>> should have been handled previously by
>>>>> vm_page_set_validclean .
>>>>
>>>> Or another VM routine, yes (e.g., vm_page_set_valid_range).  The valid
>>>> and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
>>>> region, so care must be taken when marking any sub-DEV_BSIZE region as
>>>> valid to zero out the rest of the DEV_BSIZE region.  This is part of
>>>> the VM page contract.  I'm not sure it's related to the BSS, though.
>>>
>>> Yea, I had written from what I'd seen in __elfN(load_section):
>>>
>>> QUOTE
>>> __elfN(load_section) uses vm_imgact_map_page
>>> to set up for its copyout. This appears to be
>>> how the FileSiz (not including .sbss or .bss)
>>> vs. MemSiz (including .sbss and .bss) is
>>> handled (attempted?).
>>> END QUOTE
>>>
>>> The copyout only copies through the last byte for filesz
>>> but the vm_imgact_map_page does not zero out all the
>>> bytes after that on that page:
>>>
>>>      /*
>>>       * We have to get the remaining bit of the file into the first part
>>>       * of the oversized map segment.  This is normally because the .data
>>>       * segment in the file is extended to provide bss.  It's a neat idea
>>>       * to try and save a page, but it's a pain in the behind to implement.
>>>       */
>>>      copy_len = filsz == 0 ? 0 : (offset + filsz) - trunc_page(offset +
>>>          filsz);
>>>      map_addr = trunc_page((vm_offset_t)vmaddr + filsz);
>>>      map_len = round_page((vm_offset_t)vmaddr + memsz) - map_addr;
>>> . . .
>>>      if (copy_len != 0) {
>>>              sf = vm_imgact_map_page(object, offset + filsz);
>>>              if (sf == NULL)
>>>                      return (EIO);
>>>
>>>              /* send the page fragment to user space */
>>>              off = trunc_page(offset + filsz) - trunc_page(offset + filsz);
>>>              error = copyout((caddr_t)sf_buf_kva(sf) + off,
>>>                  (caddr_t)map_addr, copy_len);
>>>              vm_imgact_unmap_page(sf);
>>>              if (error != 0)
>>>                      return (error);
>>>      }
>>>
>>> I looked into the details of the DEV_BSIZE code after sending
>>> the original message and so realized that my provided example
>>> /sbin/init readelf material was a good example of the issue
>>> if I'd not missed something.
>>>
>>>>> So, if, say, char**environ ends up at the start of .sbss
>>>>> consistently, does environ always end up zeroed independently
>>>>> of FileSz for the PT_LOAD that spans them?
>>>>
>>>> It is required to be zeroed, yes.  If not, there is a bug.  If FileSz
>>>> covers BSS, that's a bug in the linker.  Either the trailing bytes of
>>>> the corresponding page in the executable should be zero (wasteful; on
>>>> amd64 ".comment" is packed in there instead), or the linker/loader
>>>> must zero them at initialization.  I'm not familiar with the
>>>> particular details here, but if you are interested I would suggest
>>>> looking at __elfN(load_section) in sys/kern/imgact_elf.c.
>>>
>>> I had looked at it some, see the material around the earlier quote
>>> above.
>>>
>>>>> The following is not necessarily an example of problematical
>>>>> figures but is just for showing an example structure of what
>>>>> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
>>>>> and .bss :
>>>>> ...
>>>>
>>>> Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
>>>> Offset minus Segment .tdata Offset, i.e., none of the FileSiz
>>>> corresponds to the (s)bss regions.  (Good!  At least the static linker
>>>> part looks sane.)  That said, the boundary is not page-aligned and the
>>>> section alignment requirement is much lower than page_size, so the
>>>> beginning of bss will share a file page with some data.  Something
>>>> should zero it at image activation.
>>>
>>> And, so far, I've not found anything in _start or before that does
>>> zero any "sub-DEV_BSIZE" part after FileSz for the PT_LOAD in
>>> question.
>>>
>>> Thanks for checking my trace of the issue. It is good to have some
>>> confirmation that I'd not missed something.
>>>
>>>> (Tangent: sbss/bss probably do not need to be RWE on PPC!  On amd64,
>>>> init has three LOAD segments rather than two: one for rodata (R), one
>>>> for .text, .init, etc (RX); and one for .data (RW).)
>>>
>>> Yea, the section header flags indicate just WA for .sbss and .bss (but
>>> WAX for .got).
>>>
>>> But such is more general: for example, the beginning of .rodata
>>> (not executable) shares the tail part of a page with .fini
>>> (executable) in the example. .got has executable code but is in
>>> the middle of sections that do not. For something like /sbin/init it
>>> is so small that the middle of a page can be the only part that is
>>> executable, as in the example. (It is not forced onto its own page.)
>>>
>>> The form of .got used is also writable: WAX for section header flags.
>>
>>
>>
>> amd64's /sbin/init :
>>
>> There are 9 program headers, starting at offset 64
>>
>> Program Headers:
>> Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>> PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
>> LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
>> LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
>> LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
>> TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
>> GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
>> GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
>> GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
>> NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4
>>
>> Section to Segment mapping:
>> Segment Sections...
>>  00    
>>  01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
>>  02     .text .init .fini .plt
>>  03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
>>  04     .tdata .tbss
>>  05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
>>  06     .eh_frame_hdr
>>  07    
>>  08     .note.tag
>> There are 27 section headers, starting at offset 0x157938:
>>
>> Section Headers:
>> [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
>> [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>> [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
>> [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
>> [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
>> [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
>> [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
>> [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
>> [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
>> [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
>> [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
>> [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
>> [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
>> [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
>> [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
>> [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
>> [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
>> [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
>> [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
>> [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
>> [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
>> [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
>> [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
>> [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
>> [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
>> [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
>> [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
>> [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1
>>
>> Note that there is space after .finit_array+8 before .bss starts
>> with a sizable alignment. The MemSiz for 03 does span .bss .
>>
>> armv7's /sbin/init is different about MemSiz spanning .bss:
>>
>> Program Headers:
>> Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> PHDR           0x000034 0x00010034 0x00010034 0x00120 0x00120 R   0x4
>> LOAD           0x000000 0x00010000 0x00010000 0x10674 0x10674 R   0x1000
>> LOAD           0x011000 0x00021000 0x00021000 0xe9c54 0xe9c54 R E 0x1000
>> LOAD           0x0fb000 0x0010b000 0x0010b000 0x03b88 0x30ccd RW  0x1000
>> TLS            0x0fe000 0x0010e000 0x0010e000 0x00b60 0x00b70 R   0x20
>> GNU_RELRO      0x0fe000 0x0010e000 0x0010e000 0x00b88 0x00b88 R   0x1
>> GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0
>> NOTE           0x000154 0x00010154 0x00010154 0x00064 0x00064 R   0x4
>> ARM_EXIDX      0x0001b8 0x000101b8 0x000101b8 0x00220 0x00220 R   0x4
>>
>> (NOTE: 0x0010b000+0x30ccd==0x13BCCD . Compare this to the later .bss
>> Addr of 0x10f000.)
>>
>> Section to Segment mapping:
>> Segment Sections...
>>  00    
>>  01     .note.tag .ARM.exidx .rodata .ARM.extab
>>  02     .text .init .fini
>>  03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss
>>  04     .tdata .tbss
>>  05     .tdata .tbss .jcr .init_array .fini_array .got
>>  06    
>>  07     .note.tag
>>  08     .ARM.exidx
>> There are 24 section headers, starting at offset 0x12be3c:
>>
>> Section Headers:
>> [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>> [ 0]                   NULL            00000000 000000 000000 00      0   0  0
>> [ 1] .note.tag         NOTE            00010154 000154 000064 00   A  0   0  4
>> [ 2] .ARM.exidx        ARM_EXIDX       000101b8 0001b8 000220 00   A  5   0  4
>> [ 3] .rodata           PROGBITS        00010400 000400 01022c 00 AMS  0   0 64
>> [ 4] .ARM.extab        PROGBITS        0002062c 01062c 000048 00   A  0   0  4
>> [ 5] .text             PROGBITS        00021000 011000 0e9c14 00  AX  0   0 128
>> [ 6] .init             PROGBITS        0010ac20 0fac20 000014 00  AX  0   0 16
>> [ 7] .fini             PROGBITS        0010ac40 0fac40 000014 00  AX  0   0 16
>> [ 8] .data             PROGBITS        0010b000 0fb000 002734 00  WA  0   0  8
>> [ 9] .tdata            PROGBITS        0010e000 0fe000 000b60 00 WAT  0   0 16
>> [10] .tbss             NOBITS          0010eb60 0feb60 000010 00 WAT  0   0  4
>> [11] .jcr              PROGBITS        0010eb60 0feb60 000000 00  WA  0   0  4
>> [12] .init_array       INIT_ARRAY      0010eb60 0feb60 000008 00  WA  0   0  4
>> [13] .fini_array       FINI_ARRAY      0010eb68 0feb68 000004 00  WA  0   0  4
>> [14] .got              PROGBITS        0010eb6c 0feb6c 00001c 00  WA  0   0  4
>> [15] .bss              NOBITS          0010f000 0feb88 02cccd 00  WA  0   0 64
>> [16] .comment          PROGBITS        00000000 0feb88 0074b6 01  MS  0   0  1
>> [17] .ARM.attributes   ARM_ATTRIBUTES  00000000 10603e 00004f 00      0   0  1
>> [18] .gnu.warning.mkte PROGBITS        00000000 10608d 000043 00      0   0  1
>> [19] .gnu.warning.f_pr PROGBITS        00000000 1060d0 000043 00      0   0  1
>> [20] .gnu_debuglink    PROGBITS        00000000 11b314 000010 00      0   0  1
>> [21] .shstrtab         STRTAB          00000000 11b324 0000e3 00      0   0  1
>> [22] .symtab           SYMTAB          00000000 106114 015200 10     23 3063  4
>> [23] .strtab           STRTAB          00000000 11b407 010a32 00      0   0  1
>>
>> Note that there is space after .got+0x1c before .bss starts
>> with a sizable alignment. The MemSiz for 03 does *not* span
>> .bss , unlike for amd64 (and the rest).
>>
>>
>> aarch64's /sbin/init is similar to amd64 instead of armv7:
>>
>> Program Headers:
>> Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>> PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R   0x8
>> LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x01624f 0x01624f R   0x10000
>> LOAD           0x020000 0x0000000000220000 0x0000000000220000 0x0dd354 0x0dd354 R E 0x10000
>> LOAD           0x100000 0x0000000000300000 0x0000000000300000 0x011840 0x252111 RW  0x10000
>> TLS            0x110000 0x0000000000310000 0x0000000000310000 0x001800 0x001820 R   0x40
>> GNU_RELRO      0x110000 0x0000000000310000 0x0000000000310000 0x001840 0x001840 R   0x1
>> GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
>> NOTE           0x000200 0x0000000000200200 0x0000000000200200 0x000048 0x000048 R   0x4
>>
>> Section to Segment mapping:
>> Segment Sections...
>>  00    
>>  01     .note.tag .rodata
>>  02     .text .init .fini
>>  03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss
>>  04     .tdata .tbss
>>  05     .tdata .tbss .jcr .init_array .fini_array .got
>>  06    
>>  07     .note.tag
>> There are 21 section headers, starting at offset 0x14b6f0:
>>
>> Section Headers:
>> [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
>> [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>> [ 1] .note.tag         NOTE            0000000000200200 000200 000048 00   A  0   0  4
>> [ 2] .rodata           PROGBITS        0000000000200280 000280 015fcf 00 AMS  0   0 64
>> [ 3] .text             PROGBITS        0000000000220000 020000 0dd31c 00  AX  0   0 64
>> [ 4] .init             PROGBITS        00000000002fd320 0fd320 000014 00  AX  0   0 16
>> [ 5] .fini             PROGBITS        00000000002fd340 0fd340 000014 00  AX  0   0 16
>> [ 6] .data             PROGBITS        0000000000300000 100000 003a20 00  WA  0   0 16
>> [ 7] .tdata            PROGBITS        0000000000310000 110000 001800 00 WAT  0   0 16
>> [ 8] .tbss             NOBITS          0000000000311800 111800 000020 00 WAT  0   0  8
>> [ 9] .jcr              PROGBITS        0000000000311800 111800 000000 00  WA  0   0  8
>> [10] .init_array       INIT_ARRAY      0000000000311800 111800 000018 00  WA  0   0  8
>> [11] .fini_array       FINI_ARRAY      0000000000311818 111818 000008 00  WA  0   0  8
>> [12] .got              PROGBITS        0000000000311820 111820 000020 00  WA  0   0  8
>> [13] .bss              NOBITS          0000000000320000 111840 232111 00  WA  0   0 64
>> [14] .comment          PROGBITS        0000000000000000 111840 007191 01  MS  0   0  1
>> [15] .gnu.warning.mkte PROGBITS        0000000000000000 1189d1 000043 00      0   0  1
>> [16] .gnu.warning.f_pr PROGBITS        0000000000000000 118a14 000043 00      0   0  1
>> [17] .gnu_debuglink    PROGBITS        0000000000000000 13b7f8 000010 00      0   0  1
>> [18] .shstrtab         STRTAB          0000000000000000 13b808 0000bd 00      0   0  1
>> [19] .symtab           SYMTAB          0000000000000000 118a58 022da0 18     20 3621  8
>> [20] .strtab           STRTAB          0000000000000000 13b8c5 00fe2b 00      0   0  1
>>
>> Note that there is space after .got+0x20 before .bss starts
>> with a sizable alignment. The MemSiz for 03 does span
>> .bss , like for amd64 (and all but armv7).
>>
>> powerpc64's /sbin/init is similar to amd64 as well:
>>
>> Program Headers:
>> Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>> PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
>> LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
>> LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
>> LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
>> TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
>> GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
>> GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
>> GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
>> NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4
>>
>> Section to Segment mapping:
>> Segment Sections...
>>  00    
>>  01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
>>  02     .text .init .fini .plt
>>  03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
>>  04     .tdata .tbss
>>  05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
>>  06     .eh_frame_hdr
>>  07    
>>  08     .note.tag
>> There are 27 section headers, starting at offset 0x157938:
>>
>> Section Headers:
>> [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
>> [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
>> [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
>> [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
>> [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
>> [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
>> [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
>> [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
>> [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
>> [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
>> [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
>> [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
>> [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
>> [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
>> [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
>> [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
>> [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
>> [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
>> [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
>> [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
>> [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
>> [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
>> [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
>> [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
>> [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
>> [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
>> [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
>> [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1
>>
>>
>> Note that there is space after .fini_array+8 before .bss starts
>> with a sizable alignment. The MemSiz for 03 does span
>> .bss , like for amd64 (and all but armv7).
>
> I temporarily forced my 32-bit powerpc /sbin/init to have:
>
> Section Headers:
>  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> . . .
>  [16] .got              PROGBITS        0193845c 12845c 000010 04 WAX  0   0  4
>  [17] .sbss             NOBITS          01939000 12846c 0000b0 00  WA  0   0  4
>  [18] .bss              NOBITS          019390c0 12846c 02cc48 00  WA  0   0 64
> . . .
>
> It was not enough to avoid the problems I've elsewhere
> reported for *sp++ getting SIGSEGV ( environ related
> activity in _init_tls ).

I used debug.minidump=0 in /boot/loader.conf for
cusing a dump for the crash and a libkvm modified
enough for my working boot environment to allow me
to examine the the memory-image bytes of such a dump,
with libkvm used via /usr/local/bin/kgdb . (No support
of automatically translating user-space addresses
or other such.)

For the clang based debug buildworld and debug buildkernel
context with /sbin/init having:

  [16] .got              PROGBITS        01956ccc 146ccc 000010 04 WAX  0   0  4
  [17] .sbss             NOBITS          01956cdc 146cdc 0000b0 00  WA  0   0  4
  [18] .bss              NOBITS          01956dc0 146cdc 02ee28 00  WA  0   0 64

I confirmed that .sbss in /sbin/init's address space
is not zeroed (so environ is not assigned by handle_argv ).
I also confirmed that _start was given a good env value
(in %r5) based on where the value was stored on the
stack. It is just that the value was not used.

The detailed obvious-failure point (crash) can change based
on the garbage in the .sbss and, for the build that I used
this time, that happened in __je_arean_malloc_hard instead
of before _init_tls called _libc_allocate_tls . (I traced
the call chain in the dump.)


From what I've seen in the dump there seem to be special
uses of some values (that also have normal uses, of
course):

0xfa5005af: as yet invalid page content.
0x1c000020: as yet unassigned user-space-stack memory for /sbin/init.

These are the same locations that I previously reported as
showing up in the DSI read trap reports for /sbin/init failing.
The specific build here failed with a different value.

For reference relative to libkvm:

# svnlite diff /usr/src/lib/libkvm/
Index: /usr/src/lib/libkvm/kvm_powerpc.c
===================================================================
--- /usr/src/lib/libkvm/kvm_powerpc.c (revision 347549)
+++ /usr/src/lib/libkvm/kvm_powerpc.c (working copy)
@@ -211,6 +211,53 @@
  if (be32toh(vm->ph->p_paddr) == 0xffffffff)
  return ((int)powerpc_va2off(kd, va, ofs));
 
+ // HACK in something for what I observe in
+ // a debug.minidump=0 vmcore.* for 32-bit powerpc
+ //
+ if (  be32toh(vm->ph->p_vaddr)  == 0xffffffff
+   && be32toh(vm->ph->p_paddr)  == 0
+   && be16toh(vm->eh->e_phnum)  == 1
+   ) {
+ // Presumes p_memsz is either unsigned
+ // 32-bit or is 64-bit, same for va .
+
+ if (be32toh(vm->ph->p_memsz) <= va)
+ return 0; // Like powerpc_va2off
+
+ // If ofs was (signed) 32-bit there
+ // would be a problem for sufficiently
+ // large postive memsz's and va's
+ // near the end --because of p_offset
+ // and dmphdrsz causing overflow/wrapping
+ // for some large va values.
+ // Presumes 64-bit ofs for such cases.
+ // Also presumes dmphdrsz+p_offset
+ // is non-negative so that small
+ // non-negative va values have no
+ // problems with ofs going negative.
+
+ *ofs =    vm->dmphdrsz
+ + be32toh(vm->ph->p_offset)
+ + va;
+
+ // The normal return value overflows/wraps
+ // for p_memsz == 0x80000000u when va == 0 .
+ // Avoid this by depending on calling code's
+ // loop for sufficiently large cases.
+ // This code presumes p_memsz/2 <= MAX_INT .
+ // 32-bit powerpc FreeBSD does not allow
+ // using more than 2 GiBytes of RAM but
+ // does allow using 2 GiBytes on 64-bit
+ // hardware.
+ //
+ if (  (int)be32toh(vm->ph->p_memsz) < 0
+   && va < be32toh(vm->ph->p_memsz)/2
+   )
+ return be32toh(vm->ph->p_memsz)/2;
+
+ return be32toh(vm->ph->p_memsz) - va;
+ }
+
  _kvm_err(kd, kd->program, "Raw corefile not supported");
  return (0);
 }
Index: /usr/src/lib/libkvm/kvm_private.c
===================================================================
--- /usr/src/lib/libkvm/kvm_private.c (revision 347549)
+++ /usr/src/lib/libkvm/kvm_private.c (working copy)
@@ -131,7 +131,9 @@
 {
 
  return (kd->nlehdr.e_ident[EI_CLASS] == class &&
-    kd->nlehdr.e_type == ET_EXEC &&
+    (  kd->nlehdr.e_type == ET_EXEC ||
+       kd->nlehdr.e_type == ET_DYN
+    ) &&
     kd->nlehdr.e_machine == machine);
 }
 



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

freebsd-ppc mailing list
[Looks to me like the ->valid mask only is used for the
last page of the /sbin/init file, not based on the size
and alignment of the data requested for the PT_LOAD.]

On 2019-Jun-11, at 21:53, Mark Millard <marklmi at yahoo.com> wrote:

> [The garbage after .got up to the page boundary is
> .comment section strings. The context here is
> targeting 32-bit powerpc via system-clang-8 and
> devel/powerpc64-binutils for buildworld and
> buildkernel . ]
>
> On 2019-Jun-11, at 19:55, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [I have confirmed .sbss not being zero'd out and environ
>> thereby starting out non-zero (garbage): a
>> debug.minidump=0 style dump.]
>>
>>> On 2019-Jun-10, at 16:19, Mark Millard <[hidden email]> wrote:
>>>
>>> . . . (omitted) . . .
>>
>> I used debug.minidump=0 in /boot/loader.conf for
>> cusing a dump for the crash and a libkvm modified
>> enough for my working boot environment to allow me
>> to examine the the memory-image bytes of such a dump,
>> with libkvm used via /usr/local/bin/kgdb . (No support
>> of automatically translating user-space addresses
>> or other such.)
>>
>> For the clang based debug buildworld and debug buildkernel
>> context with /sbin/init having:
>>
>> [16] .got              PROGBITS        01956ccc 146ccc 000010 04 WAX  0   0  4
>> [17] .sbss             NOBITS          01956cdc 146cdc 0000b0 00  WA  0   0  4
>> [18] .bss              NOBITS          01956dc0 146cdc 02ee28 00  WA  0   0 64
>>
>> I confirmed that .sbss in /sbin/init's address space
>> is not zeroed (so environ is not assigned by handle_argv ).
>> I also confirmed that _start was given a good env value
>> (in %r5) based on where the value was stored on the
>> stack. It is just that the value was not used.
>>
>> The detailed obvious-failure point (crash) can change based
>> on the garbage in the .sbss and, for the build that I used
>> this time, that happened in __je_arean_malloc_hard instead
>> of before _init_tls called _libc_allocate_tls . (I traced
>> the call chain in the dump.)
>>
>>
>> From what I've seen in the dump there seem to be special
>> uses of some values (that also have normal uses, of
>> course):
>>
>> 0xfa5005af: as yet invalid page content.
>> 0x1c000020: as yet unassigned user-space-stack memory for /sbin/init.
>>
>> These are the same locations that I previously reported as
>> showing up in the DSI read trap reports for /sbin/init failing.
>> The specific build here failed with a different value.
>>
>> For reference relative to libkvm:
>>
>> # svnlite diff /usr/src/lib/libkvm/
>> Index: /usr/src/lib/libkvm/kvm_powerpc.c
>> ===================================================================
>> --- /usr/src/lib/libkvm/kvm_powerpc.c (revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_powerpc.c (working copy)
>> @@ -211,6 +211,53 @@
>> if (be32toh(vm->ph->p_paddr) == 0xffffffff)
>> return ((int)powerpc_va2off(kd, va, ofs));
>>
>> + // HACK in something for what I observe in
>> + // a debug.minidump=0 vmcore.* for 32-bit powerpc
>> + //
>> + if (  be32toh(vm->ph->p_vaddr)  == 0xffffffff
>> +   && be32toh(vm->ph->p_paddr)  == 0
>> +   && be16toh(vm->eh->e_phnum)  == 1
>> +   ) {
>> + // Presumes p_memsz is either unsigned
>> + // 32-bit or is 64-bit, same for va .
>> +
>> + if (be32toh(vm->ph->p_memsz) <= va)
>> + return 0; // Like powerpc_va2off
>> +
>> + // If ofs was (signed) 32-bit there
>> + // would be a problem for sufficiently
>> + // large postive memsz's and va's
>> + // near the end --because of p_offset
>> + // and dmphdrsz causing overflow/wrapping
>> + // for some large va values.
>> + // Presumes 64-bit ofs for such cases.
>> + // Also presumes dmphdrsz+p_offset
>> + // is non-negative so that small
>> + // non-negative va values have no
>> + // problems with ofs going negative.
>> +
>> + *ofs =    vm->dmphdrsz
>> + + be32toh(vm->ph->p_offset)
>> + + va;
>> +
>> + // The normal return value overflows/wraps
>> + // for p_memsz == 0x80000000u when va == 0 .
>> + // Avoid this by depending on calling code's
>> + // loop for sufficiently large cases.
>> + // This code presumes p_memsz/2 <= MAX_INT .
>> + // 32-bit powerpc FreeBSD does not allow
>> + // using more than 2 GiBytes of RAM but
>> + // does allow using 2 GiBytes on 64-bit
>> + // hardware.
>> + //
>> + if (  (int)be32toh(vm->ph->p_memsz) < 0
>> +   && va < be32toh(vm->ph->p_memsz)/2
>> +   )
>> + return be32toh(vm->ph->p_memsz)/2;
>> +
>> + return be32toh(vm->ph->p_memsz) - va;
>> + }
>> +
>> _kvm_err(kd, kd->program, "Raw corefile not supported");
>> return (0);
>> }
>> Index: /usr/src/lib/libkvm/kvm_private.c
>> ===================================================================
>> --- /usr/src/lib/libkvm/kvm_private.c (revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_private.c (working copy)
>> @@ -131,7 +131,9 @@
>> {
>>
>> return (kd->nlehdr.e_ident[EI_CLASS] == class &&
>> -    kd->nlehdr.e_type == ET_EXEC &&
>> +    (  kd->nlehdr.e_type == ET_EXEC ||
>> +       kd->nlehdr.e_type == ET_DYN
>> +    ) &&
>>    kd->nlehdr.e_machine == machine);
>> }
>>
>>
>>
>
> The following is was is in the .sbss/.bss up to the page
> boundry (after the .got bytes):
>
> (kgdb) x/s 0x2a66cdc
> 0x2a66cdc: "$FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 02:00:33Z pfg $"
>
> (kgdb) x/s 0x2a66d24
> 0x2a66d24: "$FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 20:59:49Z emaste $"
>
> (kgdb) x/s 0x2a66d72
> 0x2a66d72: "$FreeBSD: head/lib/csu/common/ignore_init.c 340702 2018-11-20 21:04:20Z emaste $"
>
> (kgdb) x/s 0x2a66dc3
> 0x2a66dc3: "FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)"
>
> (kgdb) x/s 0x2a66e15
> 0x2a66e15: "$FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 11:34:58Z kib $"
>
> (kgdb) x/s 0x2a66e5d
> 0x2a66e5d: "$FreeBSD: head/sbin/mount/getmntopts.c 326025 2017-11-20 19:49:47Z pfg $"
>
> (kgdb) x/s 0x2a66ea6
> 0x2a66ea6: "$FreeBSD: head/lib/libutil/login_tty.c 334106 2018-05-23 17:02:12Z jhb $"
>
> (kgdb) x/s 0x2a66eef
> 0x2a66eef: "$FreeBSD: head/lib/libutil/login_class.c 296723 2016-03-12 14:54:34Z kib $"
>
> (kgdb) x/s 0x2a66f83
> 0x2a66f83: "$FreeBSD: head/lib/libutil/_secure_path.c 139012 2004-12-18 12:31:12Z ru $"
>
> (kgdb) x/s 0x2a66fce
> 0x2a66fce: "$FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11
>
> (I truncated that last to avoid the 0xfa5005af's on the next page
> in RAM.)
>
> Compare ( from readelf /sbin/init ):
>
> String dump of section '.comment':
>  [     0]  $FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 02:00:33Z pfg $
>  [    48]  $FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 20:59:49Z emaste $
>  [    96]  $FreeBSD: head/lib/csu/common/ignore_init.c 340702 2018-11-20 21:04:20Z emaste $
>  [    e7]  FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
>  [   139]  $FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 11:34:58Z kib $
>  [   181]  $FreeBSD: head/sbin/mount/getmntopts.c 326025 2017-11-20 19:49:47Z pfg $
>  [   1ca]  $FreeBSD: head/lib/libutil/login_tty.c 334106 2018-05-23 17:02:12Z jhb $
>  [   213]  $FreeBSD: head/lib/libutil/login_class.c 296723 2016-03-12 14:54:34Z kib $
>  [   25e]  $FreeBSD: head/lib/libutil/login_cap.c 317265 2017-04-21 19:27:33Z pfg $
>  [   2a7]  $FreeBSD: head/lib/libutil/_secure_path.c 139012 2004-12-18 12:31:12Z ru $
>  [   2f2]  $FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11-26 02:00:33Z pfg $
> . . .
>
> Note:
>
> Program Headers:
>  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>  LOAD           0x000000 0x01800000 0x01800000 0x140ad4 0x140ad4 R E 0x10000
>  LOAD           0x140ae0 0x01950ae0 0x01950ae0 0x061fc 0x35108 RWE 0x10000
>  NOTE           0x0000d4 0x018000d4 0x018000d4 0x00048 0x00048 R   0x4
>  TLS            0x140ae0 0x01950ae0 0x01950ae0 0x00b10 0x00b1d R   0x10
>  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
>
> Section to Segment mapping:
>  Segment Sections...
>   00     .note.tag .init .text .fini .rodata .eh_frame
>   01     .tdata .tbss .init_array .fini_array .ctors .dtors .jcr .data.rel.ro .data .got .sbss .bss
>   02     .note.tag
>   03     .tdata .tbss
>   04    
> There are 24 section headers, starting at offset 0x16cec8:
>
> Section Headers:
>  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> . . .
>  [16] .got              PROGBITS        01956ccc 146ccc 000010 04 WAX  0   0  4
>  [17] .sbss             NOBITS          01956cdc 146cdc 0000b0 00  WA  0   0  4
>  [18] .bss              NOBITS          01956dc0 146cdc 02ee28 00  WA  0   0 64
>  [19] .comment          PROGBITS        00000000 146cdc 0073d4 01  MS  0   0  1
>
> It looks like material after the .got is being copied,
> spanning the in-file-empty .sbss and .bss sections and
> implicitly initializing (the first part of) those
> sections.


The ->valid assignments appears to trace to code like:

        /*
         * The last page has valid blocks.  Invalid part can only
         * exist at the end of file, and the page is made fully valid
         * by zeroing in vm_pager_get_pages().
         */
        if (m[count - 1]->valid != 0 && --count == 0) {
                if (iodone != NULL)
                        iodone(arg, m, 1, 0);
                return (VM_PAGER_OK);
        }

independent of if the requested data does not span
into the last page but does not span to the end of
a page.

So it appears that the use of:

QUOTE
vm_imgact_map_page uses vm_imgact_hold_page.

vm_imgact_hold_page uses vm_pager_get_pages.

vm_pager_get_pages uses vm_page_zero_invalid
to "Zero out partially filled data"
END QUOTE

simply does not do the right thing for .sbss
or .bss handling. The m->valid related code
for zeroing is basically irrelevant to .sbss
and .bss.

Note that the below code requires a m->valid bit
to be asserted in order to do any
pmap_zero_page_area operations. Thus it does not
zero out pages that are completely invalid either.
This explains why I see 0xfa5005af on the full
pages in the .sbss/.bss area for debug builds:
nothing is zeroing the full pages either.

void
vm_page_zero_invalid(vm_page_t m, boolean_t setvalid)
{
       int b;
       int i;

       VM_OBJECT_ASSERT_WLOCKED(m->object);
       /*
        * Scan the valid bits looking for invalid sections that
        * must be zeroed.  Invalid sub-DEV_BSIZE'd areas ( where the
        * valid bit may be set ) have already been zeroed by
        * vm_page_set_validclean().
        */
       for (b = i = 0; i <= PAGE_SIZE / DEV_BSIZE; ++i) {
               if (i == (PAGE_SIZE / DEV_BSIZE) ||
                   (m->valid & ((vm_page_bits_t)1 << i))) {
                       if (i > b) {
                               pmap_zero_page_area(m,
                                   b << DEV_BSHIFT, (i - b) << DEV_BSHIFT);
                       }
                       b = i + 1;
               }
       }

       /*
        * setvalid is TRUE when we can safely set the zero'd areas
        * as being valid.  We can do this if there are no cache consistancy
        * issues.  e.g. it is ok to do with UFS, but not ok to do with NFS.
        */
       if (setvalid)
               m->valid = VM_PAGE_BITS_ALL;
}

This code simply does not do the right thing for .sbss and
.bss handling.

__start in /sbin/init (for example) expects .sbss and .bss
to have already been initialized to zero (and possibly
further adjusted after that for something like environ).

So far I find nothing to cover that.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "[hidden email]"