Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

freebsd-emulation mailing list
[I messed up the freebsd-emulation email address the first time I sent
this. I also forgot to indicate the qemu-user-static vintage relationship.]

I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port cross
builds in another message sequence. But it turns out that one thing I ran into
has hung-up every time, the same way, for amd64->armv7 cross builds:
multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate report
with some updated notes.

A little context: I had built from ports head -r484783 before under FreeBSD head
-r340287 (as I remember the version). Back then it did not have this problem that it
now has under FreeBSD head -r341836 . One ports-specific change was to force perl5.28
as the default instead of perl5.26 originally. In fact this is what drives what is
being rebuilt for my experiment that caught this. But I doubt the perl version is
important to the problem. The context has a Ryzen Threadripper 1950X and has been
tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
hang-up at the same point as seen via ps or top. The native tools for cross-build
speedup were in use. Cross-builds targeting aarch64 did not get this problem but
targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the first
armv7 try.

ADDED: The qemu-user-static back with head -r340287 before installing the
updated ports would likely be different than the -r484783 vintage. So both
FreeBSD and qemu-user-static may have changed over the comparison.


The hang-up:

In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and timed
out. Looking during the wait in later tries shows something much like (from one of the
examples):

root       33719    0.0  0.0  12920  3528  0  I    11:40       0:00.03 | |           `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
root       41551    0.0  0.0  12920  3520  0  I    11:43       0:00.00 | |             `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
root       41552    0.0  0.0  10340  1744  0  IJ   11:43       0:00.01 | |               `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
root       41566    0.0  0.0  10236  1796  0  IJ   11:43       0:00.00 | |                 `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE
root       41567    0.0  0.0  89976 12896  0  IJ   11:43       0:00.07 | |                   `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
root       41585    0.0  0.0 102848 25056  0  IJ   11:43       0:00.10 | |                     |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
root       41586    0.0  0.0 102852 25072  0  IJ   11:43       0:00.11 | |                     `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g

or as top showed it:

41552 root          1  52    0    10M  1744K    0 wait    15   0:00   0.00% /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
41566 root          1  52    0    10M  1796K    0 wait     1   0:00   0.00% /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
41567 root          2  52    0    88M    13M    0 select   4   0:00   0.00% /usr/local/bin/qemu-arm-static ninja -j28 -v all
41585 root          2  52    0   100M    24M    0 kqread   8   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
41586 root          2  52    0   100M    24M    0 kqread  22   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.

So: waiting in kqread trying to run cmake.

Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
resume the hung-up processes. Kills of the processes waiting on kqread stop
the build.

Given the prior ports have been built already, building just
multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.

Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be
solidly blocked in my environment.



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

freebsd-emulation mailing list
[A native poudreire-devel based build of
multimedia/gstreamer1-qt@qt5 did not hang-up
and worked fine. Official package build history
also provides some evidence.]

On 2018-Dec-22, at 12:55, Mark Millard <[hidden email]> wrote:

> [I found my E-mail records reporting successful builds using
> qemu-user-static from ports head -r484783 under FreeBSD
> head -r340287.]
>
> On 2018-Dec-22, at 00:10, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [I messed up the freebsd-emulation email address the first time I sent
>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>>
>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port cross
>> builds in another message sequence. But it turns out that one thing I ran into
>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate report
>> with some updated notes.
>>
>> A little context: I had built from ports head -r484783 before under FreeBSD head
>> -r340287 (as I remember the version). Back then it did not have this problem that it
>> now has under FreeBSD head -r341836 . One ports-specific change was to force perl5.28
>> as the default instead of perl5.26 originally. In fact this is what drives what is
>> being rebuilt for my experiment that caught this. But I doubt the perl version is
>> important to the problem. The context has a Ryzen Threadripper 1950X and has been
>> tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
>> hang-up at the same point as seen via ps or top. The native tools for cross-build
>> speedup were in use. Cross-builds targeting aarch64 did not get this problem but
>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the first
>> armv7 try.
>>
>> ADDED: The qemu-user-static back with head -r340287 before installing the
>> updated ports would likely be different than the -r484783 vintage. So both
>> FreeBSD and qemu-user-static may have changed over the comparison.
>
> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
> based on qemu-user-static from ports head -484783 --all built under FreeBSD
> head -r340287 . So the use of the perl5.28 as the forced-default and the
> newer FreeBSD head version -r341836 as the context are the differences here.
>
>> The hang-up:
>>
>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and timed
>> out. Looking during the wait in later tries shows something much like (from one of the
>> examples):
>>
>> root       33719    0.0  0.0  12920  3528  0  I    11:40       0:00.03 | |           `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>> root       41551    0.0  0.0  12920  3520  0  I    11:43       0:00.00 | |             `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>> root       41552    0.0  0.0  10340  1744  0  IJ   11:43       0:00.01 | |               `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>> root       41566    0.0  0.0  10236  1796  0  IJ   11:43       0:00.00 | |                 `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE
>> root       41567    0.0  0.0  89976 12896  0  IJ   11:43       0:00.07 | |                   `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> root       41585    0.0  0.0 102848 25056  0  IJ   11:43       0:00.10 | |                     |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>> root       41586    0.0  0.0 102852 25072  0  IJ   11:43       0:00.11 | |                     `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>
>> or as top showed it:
>>
>> 41552 root          1  52    0    10M  1744K    0 wait    15   0:00   0.00% /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>> 41566 root          1  52    0    10M  1796K    0 wait     1   0:00   0.00% /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>> 41567 root          2  52    0    88M    13M    0 select   4   0:00   0.00% /usr/local/bin/qemu-arm-static ninja -j28 -v all
>> 41585 root          2  52    0   100M    24M    0 kqread   8   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>> 41586 root          2  52    0   100M    24M    0 kqread  22   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>
>> So: waiting in kqread trying to run cmake.
>>
>> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
>> resume the hung-up processes. Kills of the processes waiting on kqread stop
>> the build.
>>
>> Given the prior ports have been built already, building just
>> multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.
>>
>> Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be
>> solidly blocked in my environment.


I tried building multimedia/gstreamer1-qt@qt5 on a Orange Pi 2 2nd
Edition and the build did not hang-up. This was also based on FreeBSD
head -r341836 and ports head -r484783 .

This test was set up in part by copying over the
/usr/local/poudriere/data/packages/ material from what that did
cross build. So, for example, the cmake used should be a binary exact
match.

The FreeBSD head -r341836 was installed from the same buildworld
buildkernel tree that the cross-build's installworld was based on.

The problem is somehow specific to cross-builds (and so
qemu-user-static being involved).


Other evidence (official package build attempts):

I looked at beefy16.nyi.freebsd.org 's head-armv7-default and
beefy8.nyi.freebsd.org 's head-armv6-default histories and
the problem does not exist for the:

FreeBSD -r332419 ports -r467121

combination but exists for the later ones, starting with:

FreeBSD -r332632 ports -r467547

Interestingly qemu-sbruno (the master port for qemu-user-static)
was not updated in that ports range, being from -r463452 .

There was a cmake change at -r467437 but the more modern
native result suggests cmake is not currently contributing
(and, so, likely was not the issue back then).

That possibly leaves qemu-user-static for targeting armv7 (and v6)
misinterpreting something different from the different
FreeBSD versions. For example, there was a change to return
EAGAIN instead of EIO for certain conditions, between -r332419
and -r332632 : at -r332631 . (I do not know that it is
involved.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

freebsd-emulation mailing list
[The historical notes are removed and replaced by partial trace
information from example hang-ups, not that I've figured out
what contributes yet.]

I ran into the following while trying to get evidence
about the hang-up for an amd64->armv7 cross-build of
multimedia/gstreamer1-qt@qt5 .

The following from trying to get evidence for the hang-up
via a manual run of "make multimedia/gstreamer1-qt FLAVOR=qt5”
in a poudriere bulk -i’s interactive mode for the context
that has the hang-up in normal poudriere-devel runs.


From top after the hang-up (to identify some context):

14528 root          2  52    0   100M    24M    0 kqread  11   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
14527 root          2  52    0    88M    13M    0 select  22   0:00   0.00% /usr/local/bin/qemu-arm-static ninja -j1 -v all

from ps -auxd as well (to identify more context):

root       10114    0.0  0.0  10328  1756  1  I+J  13:47       0:00.01 |                 `-- make FLAVOR=qt5
root       14526    0.0  0.0  10204  1792  1  I+J  13:50       0:00.00 |                   `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE
root       14527    0.0  0.0  90304 13084  1  I+J  13:50       0:00.09 |                     `-- /usr/local/bin/qemu-arm-static ninja -j1 -v all
root       14528    0.0  0.0 102876 25060  1  IJ   13:50       0:00.12 |                       `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g

I had made a qemu-user-static that enabled do_strace when
it is used to run cmake or ninja.

The only do_strace lines from qemu-arm-static running cmake
or ninja mentioning process 14528 are included in the sequence:

(Before the below was a long list of "14527 fstatat” lines.
I’ll note that "'Unknown syscall 545” is from ppoll use.)

82400 sigprocmask(1,-1610620016,-191968524,-186261416,0,24) = 0
82400 sigaction(2,-1610620040,-191968596,-186261584,210460,0) = 0
82400 sigaction(15,-1610620040,-191968572,-186261584,210460,0) = 0
82400 sigaction(1,-1610620040,-191968548,-186261584,210460,0) = 0
82400 gettimeofday(-1610619984,0,4,-186261584,-1610619440,-1610619528) = 0
82400 gettimeofday(-1610619984,0,4,359949,1545969996,0) = 0
82400 gettimeofday(-1610620120,0,4,2,-184666112,-1610619520) = 0
82400 fstatat(-100,"elements/gstqtvideosink/CMakeFiles", 0x9fffe200, 0) = 0
82400 fstatat(-100,"elements/gstqtvideosink/gstqt5videosink_autogen", 0x9fffe200, 0) = 0
82400 pipe2(-1610620176,0,-1610620108,0,-1610620120,167084) = 0
82400 fcntl(5,1,-1610620108,-185863932,-192200556,-1610620228) = 0
82400 fcntl(5,2,1,-185863932,-192200556,-1610620228) = 0
82400 vfork(0,66450,-186876196,-1610620184,-1610620240,0) = 82401
82400 close(6) = 0
 = 0
82400 Unknown syscall 545
82401 setpgid(0,0,-186876196,-1610620184,-1610620240,0) = 0
82401 sigprocmask(3,-191586912,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 open("/dev/null",0,0) = 5
82401 dup2(5,0,0,-1610620184,-1610620240,0) = 0
82401 close(5) = 0
82401 fcntl(0,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,1,0,-1610620184,-1610620240,0) = 1
82401 fcntl(1,2,0,-1610620184,-1610620240,0) = 0
82401 dup2(6,2,0,-1610620184,-1610620240,0)82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0

The vfork then close(6) sequence for 82400 vs. the later
use of 6 in dup2 in 82401 may be rather odd. But it looks
like qemu-*-static uses do_freebsd_fork to implement
do_freebsd_vfork, despite reporting vfork before
calling do_freebsd_vfork. (Does the close(6) appear to
indicate a race for native operation of ninja for the
period when the address space is shared?)

Ninja has Subprocess::Start code that has:

#ifdef POSIX_SPAWN_USEVFORK
  flags |= POSIX_SPAWN_USEVFORK;
#endif


  if (posix_spawnattr_setflags(&attr, flags) != 0)
    Fatal("posix_spawnattr_setflags: %s", strerror(errno));

  const char* spawned_args[] = { "/bin/sh", "-c", command.c_str(), NULL };
  if (posix_spawn(&pid_, "/bin/sh", &action, &attr,
                  const_cast<char**>(spawned_args), environ) != 0)
    Fatal("posix_spawn: %s", strerror(errno));

that is in use here. I think that this explains the vfork use.


It turns out that putting the hung-up build in the background
and then killing 82401 with the likes of kill -6 leads to more
output that had apparently been buffered. It shows the use of
the (amd64 native) /bin/sh that in turn leads to
/usr/local/bin/cmake via qemu-arm-static. /bin/sh, being
native, gets no do_strace output from qemu-arm-static.

82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 58
82400 Unknown syscall 545
82400 sigpending(-1610620072,1,0,-191968524,0,0) = 0
82400 read(5,0x9fffd368,4096) = 0
82400 close(5) = 0
82400 wait4(82401,-1610620004,0,0,-191968640,0) = 82401
82400 mmap(0,86016,3,201330690,-1,-1610620169) = 0xf4777000
82400 gettimeofday(-1610620224,0,4,-1610619944,31,16777216) = 0
82400 write(1,0xf4950000,283)[1/129] cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink && /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake Debug
 = 283
82400 write(1,0xf4950000,137)FAILED: elements/gstqtvideosink/CMakeFiles/gstqt5videosink_autogen elements/gstqtvideosink/gstqt5videosink_autogen/mocs_compilation.cpp
 = 137
82400 write(1,0xf4950000,275)cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink && /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake Debug
 = 275
82400 write(1,0xf4950000,5) = 2
 = 5

(Note that some 82400 writes are reporting 82401 information:)

82400 write(1,0xf4950000,49)82401 fcntl(2,2,0,-1610620184,-1610620240,0) = 0
 = 49
82400 write(1,0xf4950000,19)82401 close(6) = 0
 = 19
82400 write(1,0xf4950000,401)82401 execve("/bin/sh",{"/bin/sh","-c","cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink && /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build/elements/gstqtvideosink/CMakeFiles/gstqt5videosink_autogen.dir/AutogenInfo.cmake Debug",NULL})82401 __sysctl({ 0 3 }, 2, 0x9fffda80, 0x9fffdf64, 0xf5002097, 0x0000000c) = 0
 = 401

(The /bin/sh activity is not logged: /bin/sh is native amd64 code here. The
below is from the later /usr/local/bin/cmake via qemu-arm-static.

. . . (much omitted) . . .

82400 write(1,0xf4950000,60)82401 mmap(0,28672,3,201330690,-1,-1610621989) = 0xf41a0000
 = 60
82400 write(1,0xf4950000,74)82401 clock_gettime(4,-1610621832,4,-199622492,-199622492,-199622656) = 0
 = 74
82400 write(1,0xf4950000,62)82401 kqueue(-199622656,0,53102,0,-199622656,-1610621444) = 3
 = 62
82400 write(1,0xf4950000,81)82401 ioctl(3, 0x20006601 { IO GRP:0x66('f') CMD:1 LEN:0 }, 0x0000cf6e, ...) = 0
 = 81

. . . (some omitted) . . .

(Then there is a fairly long sequence of access's and then a sequence of
fstatat's just before:)


82400 write(1,0xf4950000,32)82401 write(9,0xf4e1a945,1) = 1
 = 32
82400 write(1,0xf4950000,61)82401 clock_gettime(4,-1610622624,4,100863,1,-199483392) = 0
 = 61
82400 write(1,0xf4950000,106)82401 kevent(3,-1610688200,2,-1610688200,1024,0)qemu: uncaught target signal 6 (Abort trap) - core dumped
 = 106
82400 write(1,0xf4950000,41)ninja: build stopped: subcommand failed.
 = 41

So it was hung at the kevent until the kill -6 .


Via another experiment ninja was at the time waiting
in ppoll:

Reading symbols from ninja...done.
[New LWP 73023]
Core was generated by `ninja'.
Program terminated with signal SIGABRT, Aborted.
#0  0xf4e5e0dc in _ppoll () from /lib/libc.so.7
(gdb) bt
#0  0xf4e5e0dc in _ppoll () from /lib/libc.so.7
#1  0x00033bf0 in SubprocessSet::DoWork (this=<optimized out>) at src/subprocess-posix.cc:237
Backtrace stopped: previous frame inner to this frame (corrupt stack?)



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

Michal Meloun-2
In reply to this post by freebsd-emulation mailing list


On 24.12.2018 8:28, Mark Millard wrote:

> [I built a FreeBSD head -r340288 context and tried ports head
> -r484783 and the problem repeated.]
>
> On 2018-Dec-22, at 12:55, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [I found my E-mail records reporting successful builds using
>> qemu-user-static from ports head -r484783 under FreeBSD
>> head -r340287.]
>>
>> On 2018-Dec-22, at 00:10, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> [I messed up the freebsd-emulation email address the first time I sent
>>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>>>
>>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port cross
>>> builds in another message sequence. But it turns out that one thing I ran into
>>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>>> multimedia/gstreamer1-qt@qt5 . So I extract the material here into a separate report
>>> with some updated notes.
>>>
>>> A little context: I had built from ports head -r484783 before under FreeBSD head
>>> -r340287 (as I remember the version). Back then it did not have this problem that it
>>> now has under FreeBSD head -r341836 . One ports-specific change was to force perl5.28
>>> as the default instead of perl5.26 originally. In fact this is what drives what is
>>> being rebuilt for my experiment that caught this. But I doubt the perl version is
>>> important to the problem. The context has a Ryzen Threadripper 1950X and has been
>>> tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
>>> hang-up at the same point as seen via ps or top. The native tools for cross-build
>>> speedup were in use. Cross-builds targeting aarch64 did not get this problem but
>>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the first
>>> armv7 try.
>>>
>>> ADDED: The qemu-user-static back with head -r340287 before installing the
>>> updated ports would likely be different than the -r484783 vintage. So both
>>> FreeBSD and qemu-user-static may have changed over the comparison.
>>
>> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
>> based on qemu-user-static from ports head -484783 --all built under FreeBSD
>> head -r340287 . So the use of the perl5.28 as the forced-default and the
>> newer FreeBSD head version -r341836 as the context are the differences here.
>>
>>> The hang-up:
>>>
>>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt@qt5 hung-up and timed
>>> out. Looking during the wait in later tries shows something much like (from one of the
>>> examples):
>>>
>>> root       33719    0.0  0.0  12920  3528  0  I    11:40       0:00.03 | |           `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>>> root       41551    0.0  0.0  12920  3520  0  I    11:43       0:00.00 | |             `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>>> root       41552    0.0  0.0  10340  1744  0  IJ   11:43       0:00.01 | |               `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> root       41566    0.0  0.0  10236  1796  0  IJ   11:43       0:00.00 | |                 `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE
>>> root       41567    0.0  0.0  89976 12896  0  IJ   11:43       0:00.07 | |                   `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> root       41585    0.0  0.0 102848 25056  0  IJ   11:43       0:00.10 | |                     |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>> root       41586    0.0  0.0 102852 25072  0  IJ   11:43       0:00.11 | |                     `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>>
>>> or as top showed it:
>>>
>>> 41552 root          1  52    0    10M  1744K    0 wait    15   0:00   0.00% /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> 41566 root          1  52    0    10M  1796K    0 wait     1   0:00   0.00% /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>>> 41567 root          2  52    0    88M    13M    0 select   4   0:00   0.00% /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> 41585 root          2  52    0   100M    24M    0 kqread   8   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>> 41586 root          2  52    0   100M    24M    0 kqread  22   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>>
>>> So: waiting in kqread trying to run cmake.
>>>
>>> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
>>> resume the hung-up processes. Kills of the processes waiting on kqread stop
>>> the build.
>>>
>>> Given the prior ports have been built already, building just
>>> multimedia/gstreamer1-qt@qt5 still gets the hang-up at the same point.
>>>
>>> Building anything that requires multimedia/gstreamer1-qt@qt5 seems to be
>>> solidly blocked in my environment.
>
>
> I built a FreeBSD head -r340288 context and tried cross-buiding an
> amd64->armv7 ports head -r484783 of my usual ports and the problem
> repeated. I also found evidence that originally in the old time frame
> I'd disabled part of my originally-intended port builds because of
> other problems so multimedia/gstreamer1-qt 's build was not being
> tried.
>
> So the qemu-user-static vintage or content may be what to vary to
> narrow down the problem instead of bisecting FreeBSD kernel or world
> vintages. clang7 building qemu-user-static or the kernel/world has
> been eliminated.
>
>
> (I used -r340288 to match a artifact.ci.freebsd.org build, incorrectly
> expecting to bisect via kernel substitutions.)
>

Mark,
this is known problem with qemu-user-static.
Emulation of every single interruptible syscall is broken by design (it
have signal related races). Theses races cannot be solved without major
rewrite of syscall emulation code.
Unfortunately, nobody actively works on this, I think.

Michal
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

freebsd-emulation mailing list
In reply to this post by freebsd-emulation mailing list

On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> wrote:
>
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design (it
>> have signal related races). Theses races cannot be solved without major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>>
>
> Thanks for the note setting some expectations.
>
> On the evidence that I have I expect that more is going on than that:
>
> A) The hang-up always happens and always in the same place. So
> it would appear that no race is involved.
>
> B) (A) is true even for varying the number of builders in parallel
> (so other builds also happening) and the number of jobs allowed per
> builder. It also fails for only one builder allowed only one process.
> (I get traces from that last kind of context.)
>
> C) The problem started on the package-building servers for armv7
> and armv6 without qemu-user-static having an update (FreeBSD and
> cmake had updates, for example).
>
> D) The problem is only observed for targeting armv7 and armv6 as
> far as I can tell. I've never seen it for aarch64, neither my
> own builds nor when I looked at the package-building server
> history.
>
> At least that is what got me started. (I've since learned that
> qemu-user-static uses fork in place of a requested vfork.)
>
> My ktrace/kdump experiment yesterday showed something odd for the
> kevent that hangs in cmake:
>
> 93172 qemu-arm-static CALL  kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0)
> 93172 qemu-arm-static STRU  struct kevent[] = { { ident=6, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 }
>             { ident=0x0, filter=<invalid=0>, flags=0, fflags=0x8, data=0x1ffff, udata=0x0 } }
>
> Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
> kevent[]. The kevent use is from cmake.
>
> So far I've not identified a signal being delivered at a time that would seem
> to me to be likely to contribute. (But this is not familiar code so my judgment
> is likely not the best.)
>
> Note: I normally run FreeBSD using a non-debug kernel, even when using
> head. (The kernel does have symbols.)


The detail of the signal usage involved leading up to the hang-up,
starting from just before the "press return" for the "make FLAVOR=qt5"
command that I had entered:

The only "Interrupted system call" prior to my killing the hung cmake
process was (kdump -H -r -S output):

 93172 100717 qemu-arm-static CALL  execve[59](0x10392,0x8605051a0,0x860cf5400)
 93172 101706 qemu-arm-static RET   nanosleep[240] -1 errno 4 Interrupted system call
 93172 100717 qemu-arm-static NAMI  "/bin/sh"
 93172 100717 sh       RET   execve[59] JUSTRETURN
 93172 100717 sh       CALL  readlink[58](0x207a65,0x7fffffffccc0,0x400)

This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to
in turn later run cmake via qemu-arm-static). (This was after the fork [for the
requested vfork].) So it is for the close-down of the thread that was in
nanosleep.

There were no PSIG's and no sigreturn's prior to the kill according to the
kdump output.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

freebsd-emulation mailing list
On 2018-Dec-30, at 21:01, Jonathan Chen <jonc at chen.org.nz> wrote:

> On Mon, 31 Dec 2018 at 14:34, Mark Millard via freebsd-ports
> <[hidden email]> wrote:
>>
>> [Removing __packed did make the size and offsets match armv7
>> and the build worked based on the reconstructed qemu-arm-static.]
>
> Thanks for the analysis Mark! I've been suffering quite a few hangups
> with my ports crossbuilds on amd64->armv7 on 12-STABLE, and I'll be
> trying your suggestions to see whether it resolves the issue.

If you have something like a kqread state for a hang-up consistently
in the same place, then Mikael Urankar 's fix (or any other
way of getting the right sizes and field offsets for kevent) has a
chance of fixing what you have observed.

But if you have a form of hang-up that shows no sign of being tied
to kevent or hangs-up only sometimes, I'd be surprised if the __packed
change(s) would fix the issue.

I've seen such racy hang-ups from lld's creation of (#cpu)+2 threads,
as FreeBSD counts cpus. I've selectively forced -Wl,--no-threads at
times in specific contexts to avoid that. binutils ld does not tolerate
the option. ports does not appear to have an equivalent of:

LDFLAGS.lld+= -Wl,--no-threads

that would be lld specific.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved) [details of a specific qemu-arm-static source code problem]

freebsd-emulation mailing list
On 2018-Dec-31, at 10:16, Jonathan Chen <jonc at chen.org.nz> wrote:

> On Mon, 31 Dec 2018 at 21:05, Mark Millard <marklmi at yahoo.com> wrote:
> [...]
>> But if you have a form of hang-up that shows no sign of being tied
>> to kevent or hangs-up only sometimes, I'd be surprised if the __packed
>> change(s) would fix the issue.
>
> With the __packed-modified qemu-user-static, the amd64->armv7
> crossbuilds does not hang anymore, but I get build failures instead.
> Interestingly enough, an unmodified qemu-user-static gets further
> along in a amd64->armv6 crossbuild, with only one reproducible hang.

I tend to compare cross-build failures to native-build attempts. The
multimedia-gstreamer1-qt@qt5 hang-up was qemu-arm-static specific,
not occurring native. That and being reliable about hanging-up is
what prompted the investigation.

The lld thread fanout hangup also has only happened under
qemu-arm-static but I do not have a context with more than 4 cores for
armv7: far less than 28 (FreeBSD under Hyper-V) or 32 cpus (FreeBSD
native) that I use for cross-builds.

I do not know if you care to but it is possible to see if the FreeBSD
package builders get failures or hangs for the same ports. I use
head port build examples below:

http://beefy16.nyi.freebsd.org/jail.html?mastername=head-armv7-default

http://beefy8.nyi.freebsd.org/jail.html?mastername=head-armv6-default

The pages displayed show a list of port version (p??????) and freebsd
version (s??????) looking like p??????_s?????? . Those links take you
to pages for exploring the built, failed, skipped, and ignored
ports.

Of course, for race-condition problems in builds, checking is messier
because of needing to look at possibly many port/system combinations.

My attempts to build x11/lumina fail for:

[00:01:02] [01] [00:00:00] Building multimedia/libvpx | libvpx-1.7.0_2
[00:02:23] [01] [00:01:21] Saved multimedia/libvpx | libvpx-1.7.0_2 wrkdir to: /usr/local/poudriere/data/wrkdirs/FBSDFSSDjailArmV7-default/default/libvpx-1.7.0_2.tar
[00:02:23] [01] [00:01:21] Finished multimedia/libvpx | libvpx-1.7.0_2: Failed: build
[00:02:24] [01] [00:01:22] Skipping multimedia/ffmpeg | ffmpeg-4.1,1: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-libav | gstreamer1-libav-1.14.4_2: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping multimedia/gstreamer1-plugins-core | gstreamer1-plugins-core-1.14: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping x11/lumina | lumina-1.4.1,3: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
[00:02:24] [01] [00:01:22] Skipping x11/lumina-core | lumina-core-1.4.1: Dependent port multimedia/libvpx | libvpx-1.7.0_2 failed
. . .
[00:06:19] Failed ports: multimedia/libvpx:build
[00:06:19] Skipped ports: multimedia/ffmpeg multimedia/gstreamer1-libav multimedia/gstreamer1-plugins-core x11/lumina x11/lumina-core
[FBSDFSSDjailArmV7-default] [2018-12-30_17h04m02s] [committing:] Queued: 7  Built: 1  Failed: 1  Skipped: 5  Ignored: 0  Tobuild: 0   Time: 00:06:16

Native build attempts on an armv7 get the same.

But I'm still at:

# svnlite info | grep "Re[plv]"
Relative URL: ^/head
Repository Root: svn://svn.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 341836
Last Changed Rev: 341836

because I froze at that while investigating the reliable hang and
have not started progressing again yet. Last I looked the
head-armv7-default package builds were also failing for libvpx if
I remember right.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"