HEADSUP: Something has gone south with -current

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

HEADSUP: Something has gone south with -current

Steve Kargl
Dell 7510 laptop was happily running FreeBSD12-alpha9
from Oct. 10th.  I decided to update to top-of-tree
today, which would be FreeBSD13 at r341703.

% cd /usr/obj
% rm -rf usr
% cd ../src
% svn update
% make -j6 buildwould   (OK)
% make -j6 buildkernel  (OK)
% make installkernel    (OK)
% mergemaster -p
% <reboot into single user mode>
% mount -a
% cd /usr/src
% make installworld

Dies with a segfault in make(1) half way through the update.
/sbin has been update.

Rebooted with new kernel. Laptop locks up.
Rebooted with kernel.old/kernel (known good kernel).  Laptop locks up.
Rebooted with verbose info.  Lockup occurs right after

Starting /sbin/init

is printed to console.

Reboot to Dell laptop BIOS and run system diagnostics.

Reboot with old FreeBSD installation cdrom.  Mounted the
laptop's root filesystem on /mnt.

% chflags nochgs /mnt/sbin/init
% cp /mnt/sbin/init.bak /mnt/sbin/init

Reboot laptop and finally get back to multi-user mode.  Post trauma
analysis

make core dumps.
devd core dumps.
init core dumps.
cc   core dumps.  
c++  core dumps.

Something seems to be broken.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Shawn Webb-3
On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:

> Dell 7510 laptop was happily running FreeBSD12-alpha9
> from Oct. 10th.  I decided to update to top-of-tree
> today, which would be FreeBSD13 at r341703.
>
> % cd /usr/obj
> % rm -rf usr
> % cd ../src
> % svn update
> % make -j6 buildwould   (OK)
> % make -j6 buildkernel  (OK)
> % make installkernel    (OK)
> % mergemaster -p
> % <reboot into single user mode>
> % mount -a
> % cd /usr/src
> % make installworld
>
> Dies with a segfault in make(1) half way through the update.
> /sbin has been update.
>
> Rebooted with new kernel. Laptop locks up.
> Rebooted with kernel.old/kernel (known good kernel).  Laptop locks up.
> Rebooted with verbose info.  Lockup occurs right after
>
> Starting /sbin/init
>
> is printed to console.
>
> Reboot to Dell laptop BIOS and run system diagnostics.
>
> Reboot with old FreeBSD installation cdrom.  Mounted the
> laptop's root filesystem on /mnt.
>
> % chflags nochgs /mnt/sbin/init
> % cp /mnt/sbin/init.bak /mnt/sbin/init
>
> Reboot laptop and finally get back to multi-user mode.  Post trauma
> analysis
>
> make core dumps.
> devd core dumps.
> init core dumps.
> cc   core dumps.  
> c++  core dumps.
>
> Something seems to be broken.
There have been (and still are) issues with the introduction of ifunc
in libc (r339898). The symptoms you're describing sound a lot like the
symptoms I experienced early on.

Do you have any non-standard settings in make.conf/src.conf?

Thanks,

--
Shawn Webb
Cofounder and Security Engineer
HardenedBSD

Tor-ified Signal:    +1 443-546-8752
Tor+XMPP+OTR:        [hidden email]
GPG Key ID:          0x6A84658F52456EEE
GPG Key Fingerprint: 2ABA B6BD EF6A F486 BE89  3D9E 6A84 658F 5245 6EEE

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
In reply to this post by Steve Kargl
On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
>
> make core dumps.
> devd core dumps.
> init core dumps.
> cc   core dumps.  
> c++  core dumps.
>
> Something seems to be broken.
>

Further investigation,
as core dumps.
cpp core dumps.
/rescue/vi core dumps.

All of these programs are statically linked.  Note, ar and ranlib
have static linkage, and appear to still work but these were not
replaced by the failing 'make installworld'.

Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
is static and not stripped and works!  But, if I do

cp ar ar.new
strip ar
./ar

This ar core dumps.  So, stripping static binaries seems to
break the binary.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
In reply to this post by Shawn Webb-3
On Fri, Dec 07, 2018 at 06:23:57PM -0500, Shawn Webb wrote:

> On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
> > Dell 7510 laptop was happily running FreeBSD12-alpha9
> > from Oct. 10th.  I decided to update to top-of-tree
> > today, which would be FreeBSD13 at r341703.
> > analysis
> >
> > make core dumps.
> > devd core dumps.
> > init core dumps.
> > cc   core dumps.  
> > c++  core dumps.
> >
> > Something seems to be broken.
>
> There have been (and still are) issues with the introduction of ifunc
> in libc (r339898). The symptoms you're describing sound a lot like the
> symptoms I experienced early on.
>
> Do you have any non-standard settings in make.conf/src.conf?
>

Both are fairly benign.  make.conf contains MALLOC_PRODUCTION="YES"
and src.conf contains a few WITHOUT_* options (eg, CTM, PPP, NDIS).

It seems to be associated with stripping static binaries.  See
my follow-up post.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
In reply to this post by Steve Kargl
On Fri, Dec 07, 2018 at 03:30:19PM -0800, Steve Kargl wrote:

> On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
> >
> > make core dumps.
> > devd core dumps.
> > init core dumps.
> > cc   core dumps.  
> > c++  core dumps.
> >
> > Something seems to be broken.
> >
>
> Further investigation,
> as core dumps.
> cpp core dumps.
> /rescue/vi core dumps.
>
> All of these programs are statically linked.  Note, ar and ranlib
> have static linkage, and appear to still work but these were not
> replaced by the failing 'make installworld'.
>
> Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
> is static and not stripped and works!  But, if I do
>
> cp ar ar.new
> strip ar
> ./ar
>
> This ar core dumps.  So, stripping static binaries seems to
> break the binary.
>

Yep, definitely, a problem with stripping static binaries.

I copied both init and devd from /usr/obj to /sbin without
stripping the binaries.  System rebooted as expected.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
On Fri, Dec 07, 2018 at 03:52:33PM -0800, Steve Kargl wrote:

> On Fri, Dec 07, 2018 at 03:30:19PM -0800, Steve Kargl wrote:
> > On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
> > >
> > > make core dumps.
> > > devd core dumps.
> > > init core dumps.
> > > cc   core dumps.  
> > > c++  core dumps.
> > >
> > > Something seems to be broken.
> > >
> >
> > Further investigation,
> > as core dumps.
> > cpp core dumps.
> > /rescue/vi core dumps.
> >
> > All of these programs are statically linked.  Note, ar and ranlib
> > have static linkage, and appear to still work but these were not
> > replaced by the failing 'make installworld'.
> >
> > Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
> > is static and not stripped and works!  But, if I do
> >
> > cp ar ar.new
> > strip ar
> > ./ar
> >
> > This ar core dumps.  So, stripping static binaries seems to
> > break the binary.
> >
>
> Yep, definitely, a problem with stripping static binaries.
>
> I copied both init and devd from /usr/obj to /sbin without
> stripping the binaries.  System rebooted as expected.
>

Don't know if it's valid, but

% ./ar
% gdb82 ar.new ar.core
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000029386c in __je_malloc_tsd_boot0 ()
(gdb) bt
#0  0x000000000029386c in __je_malloc_tsd_boot0 ()
#1  0x00000000002b6d08 in calloc ()
#2  0x000000000028275b in _thr_alloc ()
#3  0x000000000027ec98 in _libpthread_init ()
#4  0x000000000024d239 in handle_static_init ()
#5  0x000000000024d10e in _start ()


--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
In reply to this post by Steve Kargl
On Sat, Dec 08, 2018 at 02:08:20AM +0200, Konstantin Belousov wrote:

> On Fri, Dec 07, 2018 at 03:52:33PM -0800, Steve Kargl wrote:
> > On Fri, Dec 07, 2018 at 03:30:19PM -0800, Steve Kargl wrote:
> > > On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
> > > >
> > > > make core dumps.
> > > > devd core dumps.
> > > > init core dumps.
> > > > cc   core dumps.  
> > > > c++  core dumps.
> > > >
> > > > Something seems to be broken.
> > > >
> > >
> > > Further investigation,
> > > as core dumps.
> > > cpp core dumps.
> > > /rescue/vi core dumps.
> > >
> > > All of these programs are statically linked.  Note, ar and ranlib
> > > have static linkage, and appear to still work but these were not
> > > replaced by the failing 'make installworld'.
> > >
> > > Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
> > > is static and not stripped and works!  But, if I do
> > >
> > > cp ar ar.new
> > > strip ar
> > > ./ar
> > >
> > > This ar core dumps.  So, stripping static binaries seems to
> > > break the binary.
> > >
> >
> > Yep, definitely, a problem with stripping static binaries.
> >
> > I copied both init and devd from /usr/obj to /sbin without
> > stripping the binaries.  System rebooted as expected.
>
> Most likely this is an issue fixed by r339350.

My tree is at r341703.  The last paragraph of the commit
message for r339350 is

  Just remove filter_reloc.  This fixes certain cases including statically
  linked binaries containing ifuncs.  Stripping binaries with relocations
  referencing removed symbols was already broken, and after this change
  may still be broken in a different way.

So, I guess I'm hitting the "broken in a different way".

The gdb82 backtrace ends up in jemalloc.  I do build world with
MALLOC_PRODUCTION="YES".  Perhaps, ifuncs+jemalloc aren't at
production level.  I have few more broken static binaries that
I need to replace before I can rebuild without MALLOC_PRODUCTION.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Konstantin Belousov
On Fri, Dec 07, 2018 at 04:25:39PM -0800, Steve Kargl wrote:

> On Sat, Dec 08, 2018 at 02:08:20AM +0200, Konstantin Belousov wrote:
> > On Fri, Dec 07, 2018 at 03:52:33PM -0800, Steve Kargl wrote:
> > > On Fri, Dec 07, 2018 at 03:30:19PM -0800, Steve Kargl wrote:
> > > > On Fri, Dec 07, 2018 at 03:06:22PM -0800, Steve Kargl wrote:
> > > > >
> > > > > make core dumps.
> > > > > devd core dumps.
> > > > > init core dumps.
> > > > > cc   core dumps.  
> > > > > c++  core dumps.
> > > > >
> > > > > Something seems to be broken.
> > > > >
> > > >
> > > > Further investigation,
> > > > as core dumps.
> > > > cpp core dumps.
> > > > /rescue/vi core dumps.
> > > >
> > > > All of these programs are statically linked.  Note, ar and ranlib
> > > > have static linkage, and appear to still work but these were not
> > > > replaced by the failing 'make installworld'.
> > > >
> > > > Ah, so if I go into /usr/obj/usr/src/amd64.amd64/ar, this ar
> > > > is static and not stripped and works!  But, if I do
> > > >
> > > > cp ar ar.new
> > > > strip ar
> > > > ./ar
> > > >
> > > > This ar core dumps.  So, stripping static binaries seems to
> > > > break the binary.
> > > >
> > >
> > > Yep, definitely, a problem with stripping static binaries.
> > >
> > > I copied both init and devd from /usr/obj to /sbin without
> > > stripping the binaries.  System rebooted as expected.
> >
> > Most likely this is an issue fixed by r339350.
>
> My tree is at r341703.  The last paragraph of the commit
> message for r339350 is
Which tree ?  The strip that is used by install should be past this
revision.

>
>   Just remove filter_reloc.  This fixes certain cases including statically
>   linked binaries containing ifuncs.  Stripping binaries with relocations
>   referencing removed symbols was already broken, and after this change
>   may still be broken in a different way.
>
> So, I guess I'm hitting the "broken in a different way".
>
> The gdb82 backtrace ends up in jemalloc.  I do build world with
> MALLOC_PRODUCTION="YES".  Perhaps, ifuncs+jemalloc aren't at
> production level.  I have few more broken static binaries that
> I need to replace before I can rebuild without MALLOC_PRODUCTION.
>
> --
> Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
On Sat, Dec 08, 2018 at 02:43:17AM +0200, Konstantin Belousov wrote:

> On Fri, Dec 07, 2018 at 04:25:39PM -0800, Steve Kargl wrote:
> > On Sat, Dec 08, 2018 at 02:08:20AM +0200, Konstantin Belousov wrote:
> > >
> > > Most likely this is an issue fixed by r339350.
> >
> > My tree is at r341703.  The last paragraph of the commit
> > message for r339350 is
> Which tree ?  The strip that is used by install should be past this
> revision.
>

% cd /usr/src
% svn info
Path: .
Working Copy Root Path: /usr/src
URL: svn://svn.freebsd.org/base/head
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341703
Node Kind: directory
Schedule: normal
Last Changed Author: emaste
Last Changed Rev: 341703
Last Changed Date: 2018-12-07 08:52:52 -0800 (Fri, 07 Dec 2018)

This is the /usr/src that has led to the broken static binaries.

Looking at timestamps, I have

% ls -l  /usr/bin/strip
-r-xr-xr-x  2 root  wheel  - 131144 Oct 10 17:10 /usr/bin/strip*

which is the strip from my Oct 10 build.  This strip did not get
updated because 'make installworld' died.  Does install during
an installworld use the old strip instead of freshly built strip?

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Steve Kargl
On Fri, Dec 07, 2018 at 05:02:03PM -0800, Steve Kargl wrote:

> On Sat, Dec 08, 2018 at 02:43:17AM +0200, Konstantin Belousov wrote:
> > On Fri, Dec 07, 2018 at 04:25:39PM -0800, Steve Kargl wrote:
> > > On Sat, Dec 08, 2018 at 02:08:20AM +0200, Konstantin Belousov wrote:
> > > >
> > > > Most likely this is an issue fixed by r339350.
> > >
> > > My tree is at r341703.  The last paragraph of the commit
> > > message for r339350 is
> > Which tree ?  The strip that is used by install should be past this
> > revision.
> >
> This is the /usr/src that has led to the broken static binaries.
>
> Looking at timestamps, I have
>
> % ls -l  /usr/bin/strip
> -r-xr-xr-x  2 root  wheel  - 131144 Oct 10 17:10 /usr/bin/strip*
>
> which is the strip from my Oct 10 build.  This strip did not get
> updated because 'make installworld' died.  Does install during
> an installworld use the old strip instead of freshly built strip?
>

Looks like /usr/src/UPDATING could use an entry about r339350.

I was updating an r339290 world to r341703.  This jumps across
r339350.  /usr/bin/strip from r339290 apparently is used during
installworld, which renders a system rather broken.  

20181013:
   At r339350, /usr/bin/strip was updated to deal with the introduction
   of ifuncs into FreeBSD.  In particular, a /usr/bin/strip from an earlier
   revision can lead to a broken system.  To avoid mayhem, it is suggested
   that one does

   cd /usr/src/usr.bin/objcopy
   make install

   prior to 'make installworld'

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: HEADSUP: Something has gone south with -current

Konstantin Belousov
In reply to this post by Steve Kargl
On Fri, Dec 07, 2018 at 05:02:03PM -0800, Steve Kargl wrote:

> On Sat, Dec 08, 2018 at 02:43:17AM +0200, Konstantin Belousov wrote:
> > On Fri, Dec 07, 2018 at 04:25:39PM -0800, Steve Kargl wrote:
> > > On Sat, Dec 08, 2018 at 02:08:20AM +0200, Konstantin Belousov wrote:
> > > >
> > > > Most likely this is an issue fixed by r339350.
> > >
> > > My tree is at r341703.  The last paragraph of the commit
> > > message for r339350 is
> > Which tree ?  The strip that is used by install should be past this
> > revision.
> >
>
> % cd /usr/src
> % svn info
> Path: .
> Working Copy Root Path: /usr/src
> URL: svn://svn.freebsd.org/base/head
> Relative URL: ^/head
> Repository Root: svn://svn.freebsd.org/base
> Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
> Revision: 341703
> Node Kind: directory
> Schedule: normal
> Last Changed Author: emaste
> Last Changed Rev: 341703
> Last Changed Date: 2018-12-07 08:52:52 -0800 (Fri, 07 Dec 2018)
>
> This is the /usr/src that has led to the broken static binaries.
>
> Looking at timestamps, I have
>
> % ls -l  /usr/bin/strip
> -r-xr-xr-x  2 root  wheel  - 131144 Oct 10 17:10 /usr/bin/strip*
>
> which is the strip from my Oct 10 build.  This strip did not get
> updated because 'make installworld' died.  Does install during
> an installworld use the old strip instead of freshly built strip?

It is installed (host) strip that is used, AFAIK.  You can build
static lib/libelftc and usr.bin/strip from the later date and install
it to get past the issue.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"