qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 (with backtrace, code, and value details)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 (with backtrace, code, and value details)

freebsd-emulation mailing list
The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
the last hang-up that I analyzed. I do not yet know how repeatable this is
but the original hang-up and the one experiment the below is from.

From top:

  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU COMMAND
12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0

Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.

From ps -auxd:

root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |             | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |             |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |             |     `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5


(gdb) attach 12789
Attaching to process 12789
Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
[New LWP 101168 of process 12789]
[New LWP 101178 of process 12789]
[New LWP 101499 of process 12789]
[Switching to LWP 100304 of process 12789]
_umtx_op () at _umtx_op.S:3
3 RSYSCALL(_umtx_op)
(gdb) info threads
  Id   Target Id                   Frame
* 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
  2    LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
  3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
  4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
(gdb) thread 4
[Switching to thread 4 (LWP 101499 of process 12789)]
#0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
220 ATOMIC_CMPSET(int);

(gdb) bt
#0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
#1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
#2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
#3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
#4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
#5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
#6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
#7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
#8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
#9  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000

181 #define ATOMIC_CMPSET(TYPE) \
182 static __inline int \
183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
184 { \
(gdb)
185 u_char res; \
186 \
187 __asm __volatile( \
188 " " MPLOCKED " " \
189 " cmpxchg %3,%1 ; " \
190 " sete %0 ; " \
191 "# atomic_cmpset_" #TYPE " " \
192 : "=q" (res), /* 0 */ \
193  "+m" (*dst), /* 1 */ \
194  "+a" (expect) /* 2 */ \
(gdb)
195 : "r" (src) /* 3 */ \
196 : "memory", "cc"); \
197 return (res); \
198 } \
. . .
218 ATOMIC_CMPSET(char);
219 ATOMIC_CMPSET(short);
220 ATOMIC_CMPSET(int);
221 ATOMIC_CMPSET(long);

(gdb) up
#1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
178    return atomic_cmpset_acq_32(addr, current, new);
(gdb) list
173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
174 {
175    uint32_t current = tswap32(a);
176    uint32_t new = tswap32(b);
177
178    return atomic_cmpset_acq_32(addr, current, new);
179 }

(gdb) up
#2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
(gdb) list
1259            }
1260        }
1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
1262        /* decrement reader count */
1263        for (;;) {
1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
1266                    unlock_user_struct(target_urwlock,
1267                        target_addr, 1);
1268                        return -TARGET_EPERM;
(gdb)
1269                 }
1270            } else {
1271                break;
1272            }
1273        }
1274    } else {
1275        unlock_user_struct(target_urwlock, target_addr, 1);
1276        return -TARGET_EPERM;
1277    }
1278

The above loop is where the code is stuck looping, using most of the capacity of
one cpu (as FreeBSD classifies cpus). The value of state is stuck at:

(gdb) print state
$2 = 536870913

I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
is running.



To show a little more context before the assembler for the routine that ends up
containing that loop:

(gdb) up
#3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
474        ret = freebsd_rw_unlock(obj);
(gdb) list
469            ret = freebsd_rw_wrlock(obj, val, 0, NULL);
470        }
471        break;
472
473    case TARGET_UMTX_OP_RW_UNLOCK:
474        ret = freebsd_rw_unlock(obj);
475        break;
476
477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
478 #ifdef UMTX_OP_MUTEX_WAKE2

(gdb) up
#4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
(gdb) list
1359        ret = do_freebsd__umtx_unlock(arg1);
1360        break;
1361 #endif
1362
1363    case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
1365        break;
1366
1367        /*
1368         * ioctl(2)

As for the assembler level view for the routine containing the
loop (via inlining and such):

Dump of assembler code for function freebsd_rw_unlock:
   0x0000000060051bd0 <+0>: push   %rbp
   0x0000000060051bd1 <+1>: mov    %rsp,%rbp
   0x0000000060051bd4 <+4>: push   %rbx
   0x0000000060051bd5 <+5>: push   %rax
   0x0000000060051bd6 <+6>: mov    %edi,%ebx
   0x0000000060051bd8 <+8>: mov    $0x20,%esi
   0x0000000060051bdd <+13>: mov    $0x3,%edx
   0x0000000060051be2 <+18>: callq  0x60036b70 <page_check_range>
   0x0000000060051be7 <+23>: mov    %eax,%ecx
   0x0000000060051be9 <+25>: mov    $0xfffffff2,%eax
   0x0000000060051bee <+30>: test   %ecx,%ecx
   0x0000000060051bf0 <+32>: jne    0x60051c50 <freebsd_rw_unlock+128>
   0x0000000060051bf2 <+34>: mov    %ebx,%edi
   0x0000000060051bf4 <+36>: add    0x26a6355(%rip),%rdi        # 0x626f7f50 <guest_base>
   0x0000000060051bfb <+43>: je     0x60051c50 <freebsd_rw_unlock+128>
   0x0000000060051bfd <+45>: mov    (%rdi),%ecx
   0x0000000060051bff <+47>: mov    0x4(%rdi),%edx
   0x0000000060051c02 <+50>: test   %ecx,%ecx
   0x0000000060051c04 <+52>: js     0x60051c30 <freebsd_rw_unlock+96>
   0x0000000060051c06 <+54>: test   $0x1fffffff,%ecx
   0x0000000060051c0c <+60>: je     0x60051c4b <freebsd_rw_unlock+123>
   0x0000000060051c0e <+62>: lea    -0x1(%rcx),%esi
   0x0000000060051c11 <+65>: nopw   %cs:0x0(%rax,%rax,1)
   0x0000000060051c1b <+75>: nopl   0x0(%rax,%rax,1)
NOTE: loop is below:
   0x0000000060051c20 <+80>: mov    %ecx,%eax
=> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
   0x0000000060051c26 <+86>: sete   %bl
   0x0000000060051c29 <+89>: test   %bl,%bl
   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
Note: loop is above
   0x0000000060051c2d <+93>: jmp    0x60051c57 <freebsd_rw_unlock+135>
   0x0000000060051c2f <+95>: nop
   0x0000000060051c30 <+96>: mov    %ecx,%esi
   0x0000000060051c32 <+98>: and    $0x7fffffff,%esi
   0x0000000060051c38 <+104>: mov    %ecx,%eax
   0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi)
   0x0000000060051c3e <+110>: sete   %bl
   0x0000000060051c41 <+113>: test   %bl,%bl
   0x0000000060051c43 <+115>: jne    0x60051c57 <freebsd_rw_unlock+135>
   0x0000000060051c45 <+117>: mov    (%rdi),%ecx
   0x0000000060051c47 <+119>: test   %ecx,%ecx
   0x0000000060051c49 <+121>: js     0x60051c30 <freebsd_rw_unlock+96>
   0x0000000060051c4b <+123>: mov    $0xffffffff,%eax
   0x0000000060051c50 <+128>: add    $0x8,%rsp
   0x0000000060051c54 <+132>: pop    %rbx
   0x0000000060051c55 <+133>: pop    %rbp
   0x0000000060051c56 <+134>: retq  
   0x0000000060051c57 <+135>: test   $0x2,%dl
   0x0000000060051c5a <+138>: jne    0x60051c70 <freebsd_rw_unlock+160>
   0x0000000060051c5c <+140>: test   $0x40000000,%ecx
   0x0000000060051c62 <+146>: jne    0x60051c82 <freebsd_rw_unlock+178>
   0x0000000060051c64 <+148>: xor    %eax,%eax
   0x0000000060051c66 <+150>: test   $0x20000000,%ecx
   0x0000000060051c6c <+156>: je     0x60051c50 <freebsd_rw_unlock+128>
   0x0000000060051c6e <+158>: jmp    0x60051c89 <freebsd_rw_unlock+185>
   0x0000000060051c70 <+160>: test   $0x20000000,%ecx
   0x0000000060051c76 <+166>: jne    0x60051c89 <freebsd_rw_unlock+185>
   0x0000000060051c78 <+168>: xor    %eax,%eax
   0x0000000060051c7a <+170>: test   $0x40000000,%ecx
   0x0000000060051c80 <+176>: je     0x60051c50 <freebsd_rw_unlock+128>
   0x0000000060051c82 <+178>: mov    $0x1,%edx
   0x0000000060051c87 <+183>: jmp    0x60051c8e <freebsd_rw_unlock+190>
   0x0000000060051c89 <+185>: mov    $0x7fffffff,%edx
   0x0000000060051c8e <+190>: mov    $0x3,%esi
   0x0000000060051c93 <+195>: xor    %ecx,%ecx
   0x0000000060051c95 <+197>: xor    %r8d,%r8d
   0x0000000060051c98 <+200>: callq  0x601b5970 <_umtx_op>
   0x0000000060051c9d <+205>: mov    %eax,%edi
   0x0000000060051c9f <+207>: add    $0x8,%rsp
   0x0000000060051ca3 <+211>: pop    %rbx
   0x0000000060051ca4 <+212>: pop    %rbp
--Type <RET> for more, q to quit, c to continue without paging--
   0x0000000060051ca5 <+213>: jmpq   0x60040a40 <get_errno>
End of assembler dump.

For just before the "lock cmpxchg %esi,(%rdi)" in the loop:

rax            0x20000001          536870913
rbx            0xf4dede00          4108246528
rcx            0x20000001          536870913
rdx            0x2                 2
rsi            0x20000000          536870912
rdi            0xf4dede80          4108246656

(gdb) x 0xf4dede80
0xf4dede80: 0x40000001

   0x0000000060051c20 <+80>: mov    %ecx,%eax
=> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
   0x0000000060051c26 <+86>: sete   %bl
   0x0000000060051c29 <+89>: test   %bl,%bl
   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>


At this point I do not have interpretation of the details, not even
a comparison to the source code.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [problem possibly found]

freebsd-emulation mailing list
[It looks to me like the assembler code has some code moved out of the
loop that it should not be (by intent). It appears that calling
tcmpset_32 does not prevent code motion to before the call and
the variable involved was not declared volatile.]

On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:

> The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
> cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
> the last hang-up that I analyzed. I do not yet know how repeatable this is
> but the original hang-up and the one experiment the below is from.
>
> From top:
>
>  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU COMMAND
> 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>
> Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.
>
> From ps -auxd:
>
> root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |             | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |             |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
> root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |             |     `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
> root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
> root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
> root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
> root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>
>
> (gdb) attach 12789
> Attaching to process 12789
> Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
> [New LWP 101168 of process 12789]
> [New LWP 101178 of process 12789]
> [New LWP 101499 of process 12789]
> [Switching to LWP 100304 of process 12789]
> _umtx_op () at _umtx_op.S:3
> 3 RSYSCALL(_umtx_op)
> (gdb) info threads
>  Id   Target Id                   Frame
> * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
>  2    LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>  3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
>  4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> (gdb) thread 4
> [Switching to thread 4 (LWP 101499 of process 12789)]
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> 220 ATOMIC_CMPSET(int);
>
> (gdb) bt
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> #5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
> #6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
> #7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
> #8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
> #9  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>
> 181 #define ATOMIC_CMPSET(TYPE) \
> 182 static __inline int \
> 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
> 184 { \
> (gdb)
> 185 u_char res; \
> 186 \
> 187 __asm __volatile( \
> 188 " " MPLOCKED " " \
> 189 " cmpxchg %3,%1 ; " \
> 190 " sete %0 ; " \
> 191 "# atomic_cmpset_" #TYPE " " \
> 192 : "=q" (res), /* 0 */ \
> 193  "+m" (*dst), /* 1 */ \
> 194  "+a" (expect) /* 2 */ \
> (gdb)
> 195 : "r" (src) /* 3 */ \
> 196 : "memory", "cc"); \
> 197 return (res); \
> 198 } \
> . . .
> 218 ATOMIC_CMPSET(char);
> 219 ATOMIC_CMPSET(short);
> 220 ATOMIC_CMPSET(int);
> 221 ATOMIC_CMPSET(long);
>
> (gdb) up
> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> 178    return atomic_cmpset_acq_32(addr, current, new);
> (gdb) list
> 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
> 174 {
> 175    uint32_t current = tswap32(a);
> 176    uint32_t new = tswap32(b);
> 177
> 178    return atomic_cmpset_acq_32(addr, current, new);
> 179 }
>
> (gdb) up
> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> (gdb) list
> 1259            }
> 1260        }
> 1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> 1262        /* decrement reader count */
> 1263        for (;;) {
> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> 1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266                    unlock_user_struct(target_urwlock,
> 1267                        target_addr, 1);
> 1268                        return -TARGET_EPERM;
> (gdb)
> 1269                 }
> 1270            } else {
> 1271                break;
> 1272            }
> 1273        }
> 1274    } else {
> 1275        unlock_user_struct(target_urwlock, target_addr, 1);
> 1276        return -TARGET_EPERM;
> 1277    }
> 1278
>
> The above loop is where the code is stuck looping, using most of the capacity of
> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>
> (gdb) print state
> $2 = 536870913
>
> I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
> is running.
>
>
>
> To show a little more context before the assembler for the routine that ends up
> containing that loop:
>
> (gdb) up
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> 474        ret = freebsd_rw_unlock(obj);
> (gdb) list
> 469            ret = freebsd_rw_wrlock(obj, val, 0, NULL);
> 470        }
> 471        break;
> 472
> 473    case TARGET_UMTX_OP_RW_UNLOCK:
> 474        ret = freebsd_rw_unlock(obj);
> 475        break;
> 476
> 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
> 478 #ifdef UMTX_OP_MUTEX_WAKE2
>
> (gdb) up
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> (gdb) list
> 1359        ret = do_freebsd__umtx_unlock(arg1);
> 1360        break;
> 1361 #endif
> 1362
> 1363    case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> 1365        break;
> 1366
> 1367        /*
> 1368         * ioctl(2)
>
> As for the assembler level view for the routine containing the
> loop (via inlining and such):
>
> Dump of assembler code for function freebsd_rw_unlock:
>   0x0000000060051bd0 <+0>: push   %rbp
>   0x0000000060051bd1 <+1>: mov    %rsp,%rbp
>   0x0000000060051bd4 <+4>: push   %rbx
>   0x0000000060051bd5 <+5>: push   %rax
>   0x0000000060051bd6 <+6>: mov    %edi,%ebx
>   0x0000000060051bd8 <+8>: mov    $0x20,%esi
>   0x0000000060051bdd <+13>: mov    $0x3,%edx
>   0x0000000060051be2 <+18>: callq  0x60036b70 <page_check_range>
>   0x0000000060051be7 <+23>: mov    %eax,%ecx
>   0x0000000060051be9 <+25>: mov    $0xfffffff2,%eax
>   0x0000000060051bee <+30>: test   %ecx,%ecx
>   0x0000000060051bf0 <+32>: jne    0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051bf2 <+34>: mov    %ebx,%edi
>   0x0000000060051bf4 <+36>: add    0x26a6355(%rip),%rdi        # 0x626f7f50 <guest_base>
>   0x0000000060051bfb <+43>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051bfd <+45>: mov    (%rdi),%ecx
>   0x0000000060051bff <+47>: mov    0x4(%rdi),%edx
>   0x0000000060051c02 <+50>: test   %ecx,%ecx
>   0x0000000060051c04 <+52>: js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c06 <+54>: test   $0x1fffffff,%ecx
>   0x0000000060051c0c <+60>: je     0x60051c4b <freebsd_rw_unlock+123>
>   0x0000000060051c0e <+62>: lea    -0x1(%rcx),%esi
>   0x0000000060051c11 <+65>: nopw   %cs:0x0(%rax,%rax,1)
>   0x0000000060051c1b <+75>: nopl   0x0(%rax,%rax,1)
> NOTE: loop is below:
>   0x0000000060051c20 <+80>: mov    %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>: sete   %bl
>   0x0000000060051c29 <+89>: test   %bl,%bl
>   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
> Note: loop is above
>   0x0000000060051c2d <+93>: jmp    0x60051c57 <freebsd_rw_unlock+135>
>   0x0000000060051c2f <+95>: nop
>   0x0000000060051c30 <+96>: mov    %ecx,%esi
>   0x0000000060051c32 <+98>: and    $0x7fffffff,%esi
>   0x0000000060051c38 <+104>: mov    %ecx,%eax
>   0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c3e <+110>: sete   %bl
>   0x0000000060051c41 <+113>: test   %bl,%bl
>   0x0000000060051c43 <+115>: jne    0x60051c57 <freebsd_rw_unlock+135>
>   0x0000000060051c45 <+117>: mov    (%rdi),%ecx
>   0x0000000060051c47 <+119>: test   %ecx,%ecx
>   0x0000000060051c49 <+121>: js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c4b <+123>: mov    $0xffffffff,%eax
>   0x0000000060051c50 <+128>: add    $0x8,%rsp
>   0x0000000060051c54 <+132>: pop    %rbx
>   0x0000000060051c55 <+133>: pop    %rbp
>   0x0000000060051c56 <+134>: retq  
>   0x0000000060051c57 <+135>: test   $0x2,%dl
>   0x0000000060051c5a <+138>: jne    0x60051c70 <freebsd_rw_unlock+160>
>   0x0000000060051c5c <+140>: test   $0x40000000,%ecx
>   0x0000000060051c62 <+146>: jne    0x60051c82 <freebsd_rw_unlock+178>
>   0x0000000060051c64 <+148>: xor    %eax,%eax
>   0x0000000060051c66 <+150>: test   $0x20000000,%ecx
>   0x0000000060051c6c <+156>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051c6e <+158>: jmp    0x60051c89 <freebsd_rw_unlock+185>
>   0x0000000060051c70 <+160>: test   $0x20000000,%ecx
>   0x0000000060051c76 <+166>: jne    0x60051c89 <freebsd_rw_unlock+185>
>   0x0000000060051c78 <+168>: xor    %eax,%eax
>   0x0000000060051c7a <+170>: test   $0x40000000,%ecx
>   0x0000000060051c80 <+176>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051c82 <+178>: mov    $0x1,%edx
>   0x0000000060051c87 <+183>: jmp    0x60051c8e <freebsd_rw_unlock+190>
>   0x0000000060051c89 <+185>: mov    $0x7fffffff,%edx
>   0x0000000060051c8e <+190>: mov    $0x3,%esi
>   0x0000000060051c93 <+195>: xor    %ecx,%ecx
>   0x0000000060051c95 <+197>: xor    %r8d,%r8d
>   0x0000000060051c98 <+200>: callq  0x601b5970 <_umtx_op>
>   0x0000000060051c9d <+205>: mov    %eax,%edi
>   0x0000000060051c9f <+207>: add    $0x8,%rsp
>   0x0000000060051ca3 <+211>: pop    %rbx
>   0x0000000060051ca4 <+212>: pop    %rbp
> --Type <RET> for more, q to quit, c to continue without paging--
>   0x0000000060051ca5 <+213>: jmpq   0x60040a40 <get_errno>
> End of assembler dump.
>
> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>
> rax            0x20000001          536870913
> rbx            0xf4dede00          4108246528
> rcx            0x20000001          536870913
> rdx            0x2                 2
> rsi            0x20000000          536870912
> rdi            0xf4dede80          4108246656
>
> (gdb) x 0xf4dede80
> 0xf4dede80: 0x40000001
>
>   0x0000000060051c20 <+80>: mov    %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>: sete   %bl
>   0x0000000060051c29 <+89>: test   %bl,%bl
>   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
>
>
> At this point I do not have interpretation of the details, not even
> a comparison to the source code.


In the source code there is:

1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
1262        /* decrement reader count */
1263        for (;;) {
1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
1266                    unlock_user_struct(target_urwlock,
1267                        target_addr, 1);
1268                        return -TARGET_EPERM;
(gdb)
1269                 }
1270            } else {
1271                break;
1272            }
1273        }

But in the assembler code it appears to me that the code:

1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
1266                    unlock_user_struct(target_urwlock,
1267                        target_addr, 1);
1268                        return -TARGET_EPERM;
(gdb)
1269                 }

was factored out of the loop, with the test moved to before the loop.
The TARGET_URWLOCK_READER_COUNT use masks off based on 0x1fffffff .
But the code for the conditional test is before the loop:

  0x0000000060051c06 <+54>: test   $0x1fffffff,%ecx
  0x0000000060051c0c <+60>: je     0x60051c4b <freebsd_rw_unlock+123>
. . .
NOTE: loop is below:
  0x0000000060051c20 <+80>: mov    %ecx,%eax
=> 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
  0x0000000060051c26 <+86>: sete   %bl
  0x0000000060051c29 <+89>: test   %bl,%bl
  0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
Note: loop is above

I may try declaring state volatile in an attempt to prevent the
code motion. (But what of other uses of tcmpset_32 controlling
subordinate code in other places?)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [problem not found]

freebsd-emulation mailing list
[I was wrong: it is code elimination, not motion. volatile is
not a fix.]

On 2019-Jan-1, at 19:37, Mark Millard <marklmi at yahoo.com> wrote:

> [It looks to me like the assembler code has some code moved out of the
> loop that it should not be (by intent). It appears that calling
> tcmpset_32 does not prevent code motion to before the call and
> the variable involved was not declared volatile.]
>
> On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:
>
>> The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
>> cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
>> the last hang-up that I analyzed. I do not yet know how repeatable this is
>> but the original hang-up and the one experiment the below is from.
>>
>> From top:
>>
>> PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU COMMAND
>> 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>>
>> Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.
>>
>> From ps -auxd:
>>
>> root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>> root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>> root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
>> root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |             | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
>> root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |             |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
>> root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |             |     `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
>> root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
>> root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
>> root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
>> root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>>
>>
>> (gdb) attach 12789
>> Attaching to process 12789
>> Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
>> [New LWP 101168 of process 12789]
>> [New LWP 101178 of process 12789]
>> [New LWP 101499 of process 12789]
>> [Switching to LWP 100304 of process 12789]
>> _umtx_op () at _umtx_op.S:3
>> 3 RSYSCALL(_umtx_op)
>> (gdb) info threads
>> Id   Target Id                   Frame
>> * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
>> 2    LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>> 3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
>> 4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
>> (gdb) thread 4
>> [Switching to thread 4 (LWP 101499 of process 12789)]
>> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
>> 220 ATOMIC_CMPSET(int);
>>
>> (gdb) bt
>> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
>> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
>> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
>> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>>   at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
>> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>>   arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
>> #5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
>> #6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
>> #7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
>> #8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
>> #9  0x0000000000000000 in ?? ()
>> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>>
>> 181 #define ATOMIC_CMPSET(TYPE) \
>> 182 static __inline int \
>> 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
>> 184 { \
>> (gdb)
>> 185 u_char res; \
>> 186 \
>> 187 __asm __volatile( \
>> 188 " " MPLOCKED " " \
>> 189 " cmpxchg %3,%1 ; " \
>> 190 " sete %0 ; " \
>> 191 "# atomic_cmpset_" #TYPE " " \
>> 192 : "=q" (res), /* 0 */ \
>> 193  "+m" (*dst), /* 1 */ \
>> 194  "+a" (expect) /* 2 */ \
>> (gdb)
>> 195 : "r" (src) /* 3 */ \
>> 196 : "memory", "cc"); \
>> 197 return (res); \
>> 198 } \
>> . . .
>> 218 ATOMIC_CMPSET(char);
>> 219 ATOMIC_CMPSET(short);
>> 220 ATOMIC_CMPSET(int);
>> 221 ATOMIC_CMPSET(long);
>>
>> (gdb) up
>> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
>> 178    return atomic_cmpset_acq_32(addr, current, new);
>> (gdb) list
>> 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
>> 174 {
>> 175    uint32_t current = tswap32(a);
>> 176    uint32_t new = tswap32(b);
>> 177
>> 178    return atomic_cmpset_acq_32(addr, current, new);
>> 179 }
>>
>> (gdb) up
>> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
>> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
>> (gdb) list
>> 1259            }
>> 1260        }
>> 1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
>> 1262        /* decrement reader count */
>> 1263        for (;;) {
>> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
>> 1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
>> 1266                    unlock_user_struct(target_urwlock,
>> 1267                        target_addr, 1);
>> 1268                        return -TARGET_EPERM;
>> (gdb)
>> 1269                 }
>> 1270            } else {
>> 1271                break;
>> 1272            }
>> 1273        }
>> 1274    } else {
>> 1275        unlock_user_struct(target_urwlock, target_addr, 1);
>> 1276        return -TARGET_EPERM;
>> 1277    }
>> 1278
>>
>> The above loop is where the code is stuck looping, using most of the capacity of
>> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>>
>> (gdb) print state
>> $2 = 536870913
>>
>> I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
>> is running.
>>
>>
>>
>> To show a little more context before the assembler for the routine that ends up
>> containing that loop:
>>
>> (gdb) up
>> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>>   at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
>> 474        ret = freebsd_rw_unlock(obj);
>> (gdb) list
>> 469            ret = freebsd_rw_wrlock(obj, val, 0, NULL);
>> 470        }
>> 471        break;
>> 472
>> 473    case TARGET_UMTX_OP_RW_UNLOCK:
>> 474        ret = freebsd_rw_unlock(obj);
>> 475        break;
>> 476
>> 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
>> 478 #ifdef UMTX_OP_MUTEX_WAKE2
>>
>> (gdb) up
>> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>>   arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
>> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
>> (gdb) list
>> 1359        ret = do_freebsd__umtx_unlock(arg1);
>> 1360        break;
>> 1361 #endif
>> 1362
>> 1363    case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
>> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
>> 1365        break;
>> 1366
>> 1367        /*
>> 1368         * ioctl(2)
>>
>> As for the assembler level view for the routine containing the
>> loop (via inlining and such):
>>
>> Dump of assembler code for function freebsd_rw_unlock:
>>  0x0000000060051bd0 <+0>: push   %rbp
>>  0x0000000060051bd1 <+1>: mov    %rsp,%rbp
>>  0x0000000060051bd4 <+4>: push   %rbx
>>  0x0000000060051bd5 <+5>: push   %rax
>>  0x0000000060051bd6 <+6>: mov    %edi,%ebx
>>  0x0000000060051bd8 <+8>: mov    $0x20,%esi
>>  0x0000000060051bdd <+13>: mov    $0x3,%edx
>>  0x0000000060051be2 <+18>: callq  0x60036b70 <page_check_range>
>>  0x0000000060051be7 <+23>: mov    %eax,%ecx
>>  0x0000000060051be9 <+25>: mov    $0xfffffff2,%eax
>>  0x0000000060051bee <+30>: test   %ecx,%ecx
>>  0x0000000060051bf0 <+32>: jne    0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051bf2 <+34>: mov    %ebx,%edi
>>  0x0000000060051bf4 <+36>: add    0x26a6355(%rip),%rdi        # 0x626f7f50 <guest_base>
>>  0x0000000060051bfb <+43>: je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051bfd <+45>: mov    (%rdi),%ecx
>>  0x0000000060051bff <+47>: mov    0x4(%rdi),%edx
>>  0x0000000060051c02 <+50>: test   %ecx,%ecx
>>  0x0000000060051c04 <+52>: js     0x60051c30 <freebsd_rw_unlock+96>
>>  0x0000000060051c06 <+54>: test   $0x1fffffff,%ecx
>>  0x0000000060051c0c <+60>: je     0x60051c4b <freebsd_rw_unlock+123>
>>  0x0000000060051c0e <+62>: lea    -0x1(%rcx),%esi
>>  0x0000000060051c11 <+65>: nopw   %cs:0x0(%rax,%rax,1)
>>  0x0000000060051c1b <+75>: nopl   0x0(%rax,%rax,1)
>> NOTE: loop is below:
>>  0x0000000060051c20 <+80>: mov    %ecx,%eax
>> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c26 <+86>: sete   %bl
>>  0x0000000060051c29 <+89>: test   %bl,%bl
>>  0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
>> Note: loop is above
>>  0x0000000060051c2d <+93>: jmp    0x60051c57 <freebsd_rw_unlock+135>
>>  0x0000000060051c2f <+95>: nop
>>  0x0000000060051c30 <+96>: mov    %ecx,%esi
>>  0x0000000060051c32 <+98>: and    $0x7fffffff,%esi
>>  0x0000000060051c38 <+104>: mov    %ecx,%eax
>>  0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c3e <+110>: sete   %bl
>>  0x0000000060051c41 <+113>: test   %bl,%bl
>>  0x0000000060051c43 <+115>: jne    0x60051c57 <freebsd_rw_unlock+135>
>>  0x0000000060051c45 <+117>: mov    (%rdi),%ecx
>>  0x0000000060051c47 <+119>: test   %ecx,%ecx
>>  0x0000000060051c49 <+121>: js     0x60051c30 <freebsd_rw_unlock+96>
>>  0x0000000060051c4b <+123>: mov    $0xffffffff,%eax
>>  0x0000000060051c50 <+128>: add    $0x8,%rsp
>>  0x0000000060051c54 <+132>: pop    %rbx
>>  0x0000000060051c55 <+133>: pop    %rbp
>>  0x0000000060051c56 <+134>: retq  
>>  0x0000000060051c57 <+135>: test   $0x2,%dl
>>  0x0000000060051c5a <+138>: jne    0x60051c70 <freebsd_rw_unlock+160>
>>  0x0000000060051c5c <+140>: test   $0x40000000,%ecx
>>  0x0000000060051c62 <+146>: jne    0x60051c82 <freebsd_rw_unlock+178>
>>  0x0000000060051c64 <+148>: xor    %eax,%eax
>>  0x0000000060051c66 <+150>: test   $0x20000000,%ecx
>>  0x0000000060051c6c <+156>: je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051c6e <+158>: jmp    0x60051c89 <freebsd_rw_unlock+185>
>>  0x0000000060051c70 <+160>: test   $0x20000000,%ecx
>>  0x0000000060051c76 <+166>: jne    0x60051c89 <freebsd_rw_unlock+185>
>>  0x0000000060051c78 <+168>: xor    %eax,%eax
>>  0x0000000060051c7a <+170>: test   $0x40000000,%ecx
>>  0x0000000060051c80 <+176>: je     0x60051c50 <freebsd_rw_unlock+128>
>>  0x0000000060051c82 <+178>: mov    $0x1,%edx
>>  0x0000000060051c87 <+183>: jmp    0x60051c8e <freebsd_rw_unlock+190>
>>  0x0000000060051c89 <+185>: mov    $0x7fffffff,%edx
>>  0x0000000060051c8e <+190>: mov    $0x3,%esi
>>  0x0000000060051c93 <+195>: xor    %ecx,%ecx
>>  0x0000000060051c95 <+197>: xor    %r8d,%r8d
>>  0x0000000060051c98 <+200>: callq  0x601b5970 <_umtx_op>
>>  0x0000000060051c9d <+205>: mov    %eax,%edi
>>  0x0000000060051c9f <+207>: add    $0x8,%rsp
>>  0x0000000060051ca3 <+211>: pop    %rbx
>>  0x0000000060051ca4 <+212>: pop    %rbp
>> --Type <RET> for more, q to quit, c to continue without paging--
>>  0x0000000060051ca5 <+213>: jmpq   0x60040a40 <get_errno>
>> End of assembler dump.
>>
>> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>>
>> rax            0x20000001          536870913
>> rbx            0xf4dede00          4108246528
>> rcx            0x20000001          536870913
>> rdx            0x2                 2
>> rsi            0x20000000          536870912
>> rdi            0xf4dede80          4108246656
>>
>> (gdb) x 0xf4dede80
>> 0xf4dede80: 0x40000001
>>
>>  0x0000000060051c20 <+80>: mov    %ecx,%eax
>> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>>  0x0000000060051c26 <+86>: sete   %bl
>>  0x0000000060051c29 <+89>: test   %bl,%bl
>>  0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
>>
>>
>> At this point I do not have interpretation of the details, not even
>> a comparison to the source code.
>
>
> . . . (misleading notes deleted) . . .

I was wrong about the code motion: in . . .

   } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
        /* decrement reader count */
        for (;;) {
            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
                    unlock_user_struct(target_urwlock,
                        target_addr, 1);
                        return -TARGET_EPERM;
                 }
            } else {
                break;
            }
        }

the outer test was leading to the inner-if being eliminated. While
"volatile state" stopped that from happening it did not fix the hang-up
in the loop. (There are likely other consequences to the use of memory
involved instead of just a register as well.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [a tested fix included]

freebsd-emulation mailing list
In reply to this post by freebsd-emulation mailing list
On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:

> The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
> cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
> the last hang-up that I analyzed. I do not yet know how repeatable this is
> but the original hang-up and the one experiment the below is from.
>
> From top:
>
>  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU COMMAND
> 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>
> Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.
>
> From ps -auxd:
>
> root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |             | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |             |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
> root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |             |     `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
> root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
> root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
> root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
> root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>
>
> (gdb) attach 12789
> Attaching to process 12789
> Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
> [New LWP 101168 of process 12789]
> [New LWP 101178 of process 12789]
> [New LWP 101499 of process 12789]
> [Switching to LWP 100304 of process 12789]
> _umtx_op () at _umtx_op.S:3
> 3 RSYSCALL(_umtx_op)
> (gdb) info threads
>  Id   Target Id                   Frame
> * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
>  2    LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
>  3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
>  4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> (gdb) thread 4
> [Switching to thread 4 (LWP 101499 of process 12789)]
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> 220 ATOMIC_CMPSET(int);
>
> (gdb) bt
> #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> #5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
> #6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
> #7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
> #8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
> #9  0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>
> 181 #define ATOMIC_CMPSET(TYPE) \
> 182 static __inline int \
> 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
> 184 { \
> (gdb)
> 185 u_char res; \
> 186 \
> 187 __asm __volatile( \
> 188 " " MPLOCKED " " \
> 189 " cmpxchg %3,%1 ; " \
> 190 " sete %0 ; " \
> 191 "# atomic_cmpset_" #TYPE " " \
> 192 : "=q" (res), /* 0 */ \
> 193  "+m" (*dst), /* 1 */ \
> 194  "+a" (expect) /* 2 */ \
> (gdb)
> 195 : "r" (src) /* 3 */ \
> 196 : "memory", "cc"); \
> 197 return (res); \
> 198 } \
> . . .
> 218 ATOMIC_CMPSET(char);
> 219 ATOMIC_CMPSET(short);
> 220 ATOMIC_CMPSET(int);
> 221 ATOMIC_CMPSET(long);
>
> (gdb) up
> #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> 178    return atomic_cmpset_acq_32(addr, current, new);
> (gdb) list
> 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
> 174 {
> 175    uint32_t current = tswap32(a);
> 176    uint32_t new = tswap32(b);
> 177
> 178    return atomic_cmpset_acq_32(addr, current, new);
> 179 }
>
> (gdb) up
> #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> (gdb) list
> 1259            }
> 1260        }
> 1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> 1262        /* decrement reader count */
> 1263        for (;;) {
> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> 1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266                    unlock_user_struct(target_urwlock,
> 1267                        target_addr, 1);
> 1268                        return -TARGET_EPERM;
> (gdb)
> 1269                 }
> 1270            } else {
> 1271                break;
> 1272            }
> 1273        }
> 1274    } else {
> 1275        unlock_user_struct(target_urwlock, target_addr, 1);
> 1276        return -TARGET_EPERM;
> 1277    }
> 1278
>
> The above loop is where the code is stuck looping, using most of the capacity of
> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>
> (gdb) print state
> $2 = 536870913
>
> I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
> is running.
>
>
>
> To show a little more context before the assembler for the routine that ends up
> containing that loop:
>
> (gdb) up
> #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
>    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> 474        ret = freebsd_rw_unlock(obj);
> (gdb) list
> 469            ret = freebsd_rw_wrlock(obj, val, 0, NULL);
> 470        }
> 471        break;
> 472
> 473    case TARGET_UMTX_OP_RW_UNLOCK:
> 474        ret = freebsd_rw_unlock(obj);
> 475        break;
> 476
> 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
> 478 #ifdef UMTX_OP_MUTEX_WAKE2
>
> (gdb) up
> #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
>    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> (gdb) list
> 1359        ret = do_freebsd__umtx_unlock(arg1);
> 1360        break;
> 1361 #endif
> 1362
> 1363    case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
> 1364        ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> 1365        break;
> 1366
> 1367        /*
> 1368         * ioctl(2)
>
> As for the assembler level view for the routine containing the
> loop (via inlining and such):
>
> Dump of assembler code for function freebsd_rw_unlock:
>   0x0000000060051bd0 <+0>: push   %rbp
>   0x0000000060051bd1 <+1>: mov    %rsp,%rbp
>   0x0000000060051bd4 <+4>: push   %rbx
>   0x0000000060051bd5 <+5>: push   %rax
>   0x0000000060051bd6 <+6>: mov    %edi,%ebx
>   0x0000000060051bd8 <+8>: mov    $0x20,%esi
>   0x0000000060051bdd <+13>: mov    $0x3,%edx
>   0x0000000060051be2 <+18>: callq  0x60036b70 <page_check_range>
>   0x0000000060051be7 <+23>: mov    %eax,%ecx
>   0x0000000060051be9 <+25>: mov    $0xfffffff2,%eax
>   0x0000000060051bee <+30>: test   %ecx,%ecx
>   0x0000000060051bf0 <+32>: jne    0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051bf2 <+34>: mov    %ebx,%edi
>   0x0000000060051bf4 <+36>: add    0x26a6355(%rip),%rdi        # 0x626f7f50 <guest_base>
>   0x0000000060051bfb <+43>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051bfd <+45>: mov    (%rdi),%ecx
>   0x0000000060051bff <+47>: mov    0x4(%rdi),%edx
>   0x0000000060051c02 <+50>: test   %ecx,%ecx
>   0x0000000060051c04 <+52>: js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c06 <+54>: test   $0x1fffffff,%ecx
>   0x0000000060051c0c <+60>: je     0x60051c4b <freebsd_rw_unlock+123>
>   0x0000000060051c0e <+62>: lea    -0x1(%rcx),%esi
>   0x0000000060051c11 <+65>: nopw   %cs:0x0(%rax,%rax,1)
>   0x0000000060051c1b <+75>: nopl   0x0(%rax,%rax,1)
> NOTE: loop is below:
>   0x0000000060051c20 <+80>: mov    %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>: sete   %bl
>   0x0000000060051c29 <+89>: test   %bl,%bl
>   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
> Note: loop is above
>   0x0000000060051c2d <+93>: jmp    0x60051c57 <freebsd_rw_unlock+135>
>   0x0000000060051c2f <+95>: nop
>   0x0000000060051c30 <+96>: mov    %ecx,%esi
>   0x0000000060051c32 <+98>: and    $0x7fffffff,%esi
>   0x0000000060051c38 <+104>: mov    %ecx,%eax
>   0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c3e <+110>: sete   %bl
>   0x0000000060051c41 <+113>: test   %bl,%bl
>   0x0000000060051c43 <+115>: jne    0x60051c57 <freebsd_rw_unlock+135>
>   0x0000000060051c45 <+117>: mov    (%rdi),%ecx
>   0x0000000060051c47 <+119>: test   %ecx,%ecx
>   0x0000000060051c49 <+121>: js     0x60051c30 <freebsd_rw_unlock+96>
>   0x0000000060051c4b <+123>: mov    $0xffffffff,%eax
>   0x0000000060051c50 <+128>: add    $0x8,%rsp
>   0x0000000060051c54 <+132>: pop    %rbx
>   0x0000000060051c55 <+133>: pop    %rbp
>   0x0000000060051c56 <+134>: retq  
>   0x0000000060051c57 <+135>: test   $0x2,%dl
>   0x0000000060051c5a <+138>: jne    0x60051c70 <freebsd_rw_unlock+160>
>   0x0000000060051c5c <+140>: test   $0x40000000,%ecx
>   0x0000000060051c62 <+146>: jne    0x60051c82 <freebsd_rw_unlock+178>
>   0x0000000060051c64 <+148>: xor    %eax,%eax
>   0x0000000060051c66 <+150>: test   $0x20000000,%ecx
>   0x0000000060051c6c <+156>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051c6e <+158>: jmp    0x60051c89 <freebsd_rw_unlock+185>
>   0x0000000060051c70 <+160>: test   $0x20000000,%ecx
>   0x0000000060051c76 <+166>: jne    0x60051c89 <freebsd_rw_unlock+185>
>   0x0000000060051c78 <+168>: xor    %eax,%eax
>   0x0000000060051c7a <+170>: test   $0x40000000,%ecx
>   0x0000000060051c80 <+176>: je     0x60051c50 <freebsd_rw_unlock+128>
>   0x0000000060051c82 <+178>: mov    $0x1,%edx
>   0x0000000060051c87 <+183>: jmp    0x60051c8e <freebsd_rw_unlock+190>
>   0x0000000060051c89 <+185>: mov    $0x7fffffff,%edx
>   0x0000000060051c8e <+190>: mov    $0x3,%esi
>   0x0000000060051c93 <+195>: xor    %ecx,%ecx
>   0x0000000060051c95 <+197>: xor    %r8d,%r8d
>   0x0000000060051c98 <+200>: callq  0x601b5970 <_umtx_op>
>   0x0000000060051c9d <+205>: mov    %eax,%edi
>   0x0000000060051c9f <+207>: add    $0x8,%rsp
>   0x0000000060051ca3 <+211>: pop    %rbx
>   0x0000000060051ca4 <+212>: pop    %rbp
> --Type <RET> for more, q to quit, c to continue without paging--
>   0x0000000060051ca5 <+213>: jmpq   0x60040a40 <get_errno>
> End of assembler dump.
>
> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>
> rax            0x20000001          536870913
> rbx            0xf4dede00          4108246528
> rcx            0x20000001          536870913
> rdx            0x2                 2
> rsi            0x20000000          536870912
> rdi            0xf4dede80          4108246656
>
> (gdb) x 0xf4dede80
> 0xf4dede80: 0x40000001
>
>   0x0000000060051c20 <+80>: mov    %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
>   0x0000000060051c26 <+86>: sete   %bl
>   0x0000000060051c29 <+89>: test   %bl,%bl
>   0x0000000060051c2b <+91>: je     0x60051c20 <freebsd_rw_unlock+80>
>
>
> At this point I do not have interpretation of the details, not even
> a comparison to the source code.


Note: The problem was observed in building ports head -r488859 's graphics/poppler-qt5 .

In the below code I've added a missing statement:

> 1261    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> 1262        /* decrement reader count */
> 1263        for (;;) {
> 1264            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
                         __get_user(state, &target_urwlock->rw_state);

> 1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266                    unlock_user_struct(target_urwlock,
> 1267                        target_addr, 1);
> 1268                        return -TARGET_EPERM;
> (gdb)
> 1269                 }
> 1270            } else {
> 1271                break;
> 1272            }
> 1273        }

The lack of the update to 'state' is why it was valid for the
compiler to eliminate the code for:

> 1265                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266                    unlock_user_struct(target_urwlock,
> 1267                        target_addr, 1);
> 1268                        return -TARGET_EPERM;
> (gdb)
> 1269                 }

So (without old line numbers):

    } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
        /* decrement reader count */
        for (;;) {
            if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
                __get_user(state, &target_urwlock->rw_state);
                if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
                    unlock_user_struct(target_urwlock,
                        target_addr, 1);
                        return -TARGET_EPERM;
                 }
            } else {
                break;
            }
        }

This follows the structure of other tcmpset_32 use in the source file.

With this change poudriere-devel's bulk worked for graphics/poppler-qt5
as a amd64->armv7 cross-build (FreeBSD head -r341836 based, under Hyper-V,
with 28 logical-processors assigned):

[00:00:13] [01] [00:00:00] Building graphics/poppler-qt5 | poppler-qt5-0.72.0
[00:08:03] [01] [00:07:50] Finished graphics/poppler-qt5 | poppler-qt5-0.72.0: Success


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [a tested fix included]

Kyle Evans-3
On Wed, Jan 2, 2019 at 3:38 AM Mark Millard via freebsd-ports
<[hidden email]> wrote:

>
> On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:
>
> > The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
> > cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
> > the last hang-up that I analyzed. I do not yet know how repeatable this is
> > but the original hang-up and the one experiment the below is from.
> >
> > From top:
> >
> >  PID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME     CPU COMMAND
> > 12789 root          4  52    0   166M    33M    0 uwait    6  36:06  97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
> >
> > Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.
> >
> > From ps -auxd:
> >
> > root       94075    0.0  0.0  12932  3552  1  S+   10:42        0:01.21 |           `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> > root        1944    0.0  0.0  12932  3540  1  I+   10:42        0:00.00 |             |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> > root        1957    0.0  0.0  12932  3556  1  I    10:42        0:00.04 |             |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> > root       12328    0.0  0.0  12932  3548  1  I    10:49        0:00.00 |             | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> > root       12329    0.0  0.0  10328  1756  1  IJ   10:49        0:00.01 |             |   `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
> > root       12350    0.0  0.0   9860  1248  1  IJ   10:49        0:00.00 |             |     `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
> > root       12353    0.0  0.0  10236  1664  1  IJ   10:49        0:00.05 |             |       `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
> > root       12787    0.0  0.0   9856  1236  1  IJ   10:50        0:00.00 |             |         `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
> > root       12789  100.0  0.0 169868 33528  1  IJ   10:50       36:35.26 |             |           `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
> > root       94423    0.0  0.0  12932  3484  1  S+   10:42        0:12.91 |             `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> >
> >
> > (gdb) attach 12789
> > Attaching to process 12789
> > Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
> > [New LWP 101168 of process 12789]
> > [New LWP 101178 of process 12789]
> > [New LWP 101499 of process 12789]
> > [Switching to LWP 100304 of process 12789]
> > _umtx_op () at _umtx_op.S:3
> > 3     RSYSCALL(_umtx_op)
> > (gdb) info threads
> >  Id   Target Id                   Frame
> > * 1    LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
> >  2    LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> >  3    LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
> >  4    LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> > (gdb) thread 4
> > [Switching to thread 4 (LWP 101499 of process 12789)]
> > #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> > 220   ATOMIC_CMPSET(int);
> >
> > (gdb) bt
> > #0  0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> > #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> > #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> > #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
> >    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> > #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
> >    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> > #5  0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
> > #6  0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
> > #7  0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
> > #8  0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
> > #9  0x0000000000000000 in ?? ()
> > Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
> >
> > 181   #define ATOMIC_CMPSET(TYPE)                             \
> > 182   static __inline int                                     \
> > 183   atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
> > 184   {                                                       \
> > (gdb)
> > 185           u_char res;                                     \
> > 186                                                           \
> > 187           __asm __volatile(                               \
> > 188           "       " MPLOCKED "            "               \
> > 189           "       cmpxchg %3,%1 ; "                       \
> > 190           "       sete    %0 ;            "               \
> > 191           "# atomic_cmpset_" #TYPE "      "               \
> > 192           : "=q" (res),                   /* 0 */         \
> > 193             "+m" (*dst),                  /* 1 */         \
> > 194             "+a" (expect)                 /* 2 */         \
> > (gdb)
> > 195           : "r" (src)                     /* 3 */         \
> > 196           : "memory", "cc");                              \
> > 197           return (res);                                   \
> > 198   }                                                       \
> > . . .
> > 218   ATOMIC_CMPSET(char);
> > 219   ATOMIC_CMPSET(short);
> > 220   ATOMIC_CMPSET(int);
> > 221   ATOMIC_CMPSET(long);
> >
> > (gdb) up
> > #1  tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> > 178       return atomic_cmpset_acq_32(addr, current, new);
> > (gdb) list
> > 173   static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
> > 174   {
> > 175       uint32_t current = tswap32(a);
> > 176       uint32_t new = tswap32(b);
> > 177
> > 178       return atomic_cmpset_acq_32(addr, current, new);
> > 179   }
> >
> > (gdb) up
> > #2  freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> > 1264              if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> > (gdb) list
> > 1259              }
> > 1260          }
> > 1261      } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> > 1262          /* decrement reader count */
> > 1263          for (;;) {
> > 1264              if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
> > 1265                  if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> > 1266                      unlock_user_struct(target_urwlock,
> > 1267                          target_addr, 1);
> > 1268                          return -TARGET_EPERM;
> > (gdb)
> > 1269                   }
> > 1270              } else {
> > 1271                  break;
> > 1272              }
> > 1273          }
> > 1274      } else {
> > 1275          unlock_user_struct(target_urwlock, target_addr, 1);
> > 1276          return -TARGET_EPERM;
> > 1277      }
> > 1278
> >
> > The above loop is where the code is stuck looping, using most of the capacity of
> > one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
> >
> > (gdb) print state
> > $2 = 536870913
> >
> > I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
> > is running.
> >
> >
> >
> > To show a little more context before the assembler for the routine that ends up
> > containing that loop:
> >
> > (gdb) up
> > #3  0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
> >    at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> > 474           ret = freebsd_rw_unlock(obj);
> > (gdb) list
> > 469               ret = freebsd_rw_wrlock(obj, val, 0, NULL);
> > 470           }
> > 471           break;
> > 472
> > 473       case TARGET_UMTX_OP_RW_UNLOCK:
> > 474           ret = freebsd_rw_unlock(obj);
> > 475           break;
> > 476
> > 477   #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
> > 478   #ifdef UMTX_OP_MUTEX_WAKE2
> >
> > (gdb) up
> > #4  0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
> >    arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> > 1364          ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> > (gdb) list
> > 1359          ret = do_freebsd__umtx_unlock(arg1);
> > 1360          break;
> > 1361  #endif
> > 1362
> > 1363      case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
> > 1364          ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> > 1365          break;
> > 1366
> > 1367          /*
> > 1368           * ioctl(2)
> >
> > As for the assembler level view for the routine containing the
> > loop (via inlining and such):
> >
> > Dump of assembler code for function freebsd_rw_unlock:
> >   0x0000000060051bd0 <+0>:    push   %rbp
> >   0x0000000060051bd1 <+1>:    mov    %rsp,%rbp
> >   0x0000000060051bd4 <+4>:    push   %rbx
> >   0x0000000060051bd5 <+5>:    push   %rax
> >   0x0000000060051bd6 <+6>:    mov    %edi,%ebx
> >   0x0000000060051bd8 <+8>:    mov    $0x20,%esi
> >   0x0000000060051bdd <+13>:   mov    $0x3,%edx
> >   0x0000000060051be2 <+18>:   callq  0x60036b70 <page_check_range>
> >   0x0000000060051be7 <+23>:   mov    %eax,%ecx
> >   0x0000000060051be9 <+25>:   mov    $0xfffffff2,%eax
> >   0x0000000060051bee <+30>:   test   %ecx,%ecx
> >   0x0000000060051bf0 <+32>:   jne    0x60051c50 <freebsd_rw_unlock+128>
> >   0x0000000060051bf2 <+34>:   mov    %ebx,%edi
> >   0x0000000060051bf4 <+36>:   add    0x26a6355(%rip),%rdi        # 0x626f7f50 <guest_base>
> >   0x0000000060051bfb <+43>:   je     0x60051c50 <freebsd_rw_unlock+128>
> >   0x0000000060051bfd <+45>:   mov    (%rdi),%ecx
> >   0x0000000060051bff <+47>:   mov    0x4(%rdi),%edx
> >   0x0000000060051c02 <+50>:   test   %ecx,%ecx
> >   0x0000000060051c04 <+52>:   js     0x60051c30 <freebsd_rw_unlock+96>
> >   0x0000000060051c06 <+54>:   test   $0x1fffffff,%ecx
> >   0x0000000060051c0c <+60>:   je     0x60051c4b <freebsd_rw_unlock+123>
> >   0x0000000060051c0e <+62>:   lea    -0x1(%rcx),%esi
> >   0x0000000060051c11 <+65>:   nopw   %cs:0x0(%rax,%rax,1)
> >   0x0000000060051c1b <+75>:   nopl   0x0(%rax,%rax,1)
> > NOTE: loop is below:
> >   0x0000000060051c20 <+80>:   mov    %ecx,%eax
> > => 0x0000000060051c22 <+82>:  lock cmpxchg %esi,(%rdi)
> >   0x0000000060051c26 <+86>:   sete   %bl
> >   0x0000000060051c29 <+89>:   test   %bl,%bl
> >   0x0000000060051c2b <+91>:   je     0x60051c20 <freebsd_rw_unlock+80>
> > Note: loop is above
> >   0x0000000060051c2d <+93>:   jmp    0x60051c57 <freebsd_rw_unlock+135>
> >   0x0000000060051c2f <+95>:   nop
> >   0x0000000060051c30 <+96>:   mov    %ecx,%esi
> >   0x0000000060051c32 <+98>:   and    $0x7fffffff,%esi
> >   0x0000000060051c38 <+104>:  mov    %ecx,%eax
> >   0x0000000060051c3a <+106>:  lock cmpxchg %esi,(%rdi)
> >   0x0000000060051c3e <+110>:  sete   %bl
> >   0x0000000060051c41 <+113>:  test   %bl,%bl
> >   0x0000000060051c43 <+115>:  jne    0x60051c57 <freebsd_rw_unlock+135>
> >   0x0000000060051c45 <+117>:  mov    (%rdi),%ecx
> >   0x0000000060051c47 <+119>:  test   %ecx,%ecx
> >   0x0000000060051c49 <+121>:  js     0x60051c30 <freebsd_rw_unlock+96>
> >   0x0000000060051c4b <+123>:  mov    $0xffffffff,%eax
> >   0x0000000060051c50 <+128>:  add    $0x8,%rsp
> >   0x0000000060051c54 <+132>:  pop    %rbx
> >   0x0000000060051c55 <+133>:  pop    %rbp
> >   0x0000000060051c56 <+134>:  retq
> >   0x0000000060051c57 <+135>:  test   $0x2,%dl
> >   0x0000000060051c5a <+138>:  jne    0x60051c70 <freebsd_rw_unlock+160>
> >   0x0000000060051c5c <+140>:  test   $0x40000000,%ecx
> >   0x0000000060051c62 <+146>:  jne    0x60051c82 <freebsd_rw_unlock+178>
> >   0x0000000060051c64 <+148>:  xor    %eax,%eax
> >   0x0000000060051c66 <+150>:  test   $0x20000000,%ecx
> >   0x0000000060051c6c <+156>:  je     0x60051c50 <freebsd_rw_unlock+128>
> >   0x0000000060051c6e <+158>:  jmp    0x60051c89 <freebsd_rw_unlock+185>
> >   0x0000000060051c70 <+160>:  test   $0x20000000,%ecx
> >   0x0000000060051c76 <+166>:  jne    0x60051c89 <freebsd_rw_unlock+185>
> >   0x0000000060051c78 <+168>:  xor    %eax,%eax
> >   0x0000000060051c7a <+170>:  test   $0x40000000,%ecx
> >   0x0000000060051c80 <+176>:  je     0x60051c50 <freebsd_rw_unlock+128>
> >   0x0000000060051c82 <+178>:  mov    $0x1,%edx
> >   0x0000000060051c87 <+183>:  jmp    0x60051c8e <freebsd_rw_unlock+190>
> >   0x0000000060051c89 <+185>:  mov    $0x7fffffff,%edx
> >   0x0000000060051c8e <+190>:  mov    $0x3,%esi
> >   0x0000000060051c93 <+195>:  xor    %ecx,%ecx
> >   0x0000000060051c95 <+197>:  xor    %r8d,%r8d
> >   0x0000000060051c98 <+200>:  callq  0x601b5970 <_umtx_op>
> >   0x0000000060051c9d <+205>:  mov    %eax,%edi
> >   0x0000000060051c9f <+207>:  add    $0x8,%rsp
> >   0x0000000060051ca3 <+211>:  pop    %rbx
> >   0x0000000060051ca4 <+212>:  pop    %rbp
> > --Type <RET> for more, q to quit, c to continue without paging--
> >   0x0000000060051ca5 <+213>:  jmpq   0x60040a40 <get_errno>
> > End of assembler dump.
> >
> > For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
> >
> > rax            0x20000001          536870913
> > rbx            0xf4dede00          4108246528
> > rcx            0x20000001          536870913
> > rdx            0x2                 2
> > rsi            0x20000000          536870912
> > rdi            0xf4dede80          4108246656
> >
> > (gdb) x 0xf4dede80
> > 0xf4dede80:   0x40000001
> >
> >   0x0000000060051c20 <+80>:   mov    %ecx,%eax
> > => 0x0000000060051c22 <+82>:  lock cmpxchg %esi,(%rdi)
> >   0x0000000060051c26 <+86>:   sete   %bl
> >   0x0000000060051c29 <+89>:   test   %bl,%bl
> >   0x0000000060051c2b <+91>:   je     0x60051c20 <freebsd_rw_unlock+80>
> >
> >
> > At this point I do not have interpretation of the details, not even
> > a comparison to the source code.
>
>
> Note: The problem was observed in building ports head -r488859 's graphics/poppler-qt5 .
>
> In the below code I've added a missing statement:
>
> > 1261      } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> > 1262          /* decrement reader count */
> > 1263          for (;;) {
> > 1264              if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
>                          __get_user(state, &target_urwlock->rw_state);
> > 1265                  if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> > 1266                      unlock_user_struct(target_urwlock,
> > 1267                          target_addr, 1);
> > 1268                          return -TARGET_EPERM;
> > (gdb)
> > 1269                   }
> > 1270              } else {
> > 1271                  break;
> > 1272              }
> > 1273          }
>
> The lack of the update to 'state' is why it was valid for the
> compiler to eliminate the code for:
>
> > 1265                  if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> > 1266                      unlock_user_struct(target_urwlock,
> > 1267                          target_addr, 1);
> > 1268                          return -TARGET_EPERM;
> > (gdb)
> > 1269                   }
>
> So (without old line numbers):
>
>     } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
>         /* decrement reader count */
>         for (;;) {
>             if (!tcmpset_32(&target_urwlock->rw_state, state, (state  - 1))) {
>                 __get_user(state, &target_urwlock->rw_state);
>                 if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
>                     unlock_user_struct(target_urwlock,
>                         target_addr, 1);
>                         return -TARGET_EPERM;
>                  }
>             } else {
>                 break;
>             }
>         }
>
> This follows the structure of other tcmpset_32 use in the source file.
>
> With this change poudriere-devel's bulk worked for graphics/poppler-qt5
> as a amd64->armv7 cross-build (FreeBSD head -r341836 based, under Hyper-V,
> with 28 logical-processors assigned):
>

Ah, thanks for that! I think your analysis is correct, and I've
created a pull request [1] for Sean. This should fix the apparent
hangs reported by many across armv7/aarch64.

[1] https://github.com/seanbruno/qemu-bsd-user/pull/72
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "[hidden email]"