Bug ID: 252579
Summary: fork() causes process to hang in rare circumstances.
Product: Base System
Severity: Affects Only Me
Assignee: [hidden email] Reporter: [hidden email]
This bug is discovered on FreeBSD 13-CURRENT but can also be reproduced on
12.2-RELEASE. It causes a process to hung when fork(2) is called and specific
NSS (Name Switch Service) module is used.
How to reproduce:
1) Download archive in the attachment.
2) Compile NSS stub module (do not forget .1 at the end of compiled module):
cc -shared -fPIC -pthread -o nss_stub.so.1 nss_stub.c
3) Copy nss_stub.so.1 to /usr/local/lib
4) Edit /etc/nsswitch.conf and replace 'hosts: files dns' witch 'hosts: files
5) Compile test program: cc -o bug bug.c
6) Run it, it will hang, so even killall -9 bug won't kill it.
There is a small and unpleasant discussion on freebsd-net mailing list with
Konstantin Belousov who wanted me to reproduce this bug without editing
/etc/nsswitch.conf I think it's either impossible, because NSS system is
somehow messing with fork, or it's beyond my competence. So the provided way to
reproduce the bug is as minimal as I can get.
What |Removed |Added
CC| |[hidden email]
--- Comment #1 from Jason A. Harmening <[hidden email]> ---
I can reproduce this 100% of the time on a -current VM using the supplied test
code. I noticed a few things:
--for me, the parent process seems to be hanging during fork(); I see no
evidence the child process is ever spawned.
--wmesg for the process is 'umtxn', and ddb shows what looks like the main
thread attempting to take a userspace lock, going through umtxq_lock(), and
sleeping in sleepq_wait_sig()
--I tried to write a smaller test program to reproduce the failure by
simulating the locking done by the NS dispatcher and the pthread_create()
issued by the stub, but this did not reproduce the hang.
--However, if I just link the original test program against libpthread ('cc -o
bug -pthread bug.c), then I can no longer reproduce the hang. This tells me
the problem might have something to do with some bit of static umtx
initialization that happens when linking against libpthread/libthr. If this
initialization hasn't happened by the time the NS dispatcher (which loads the
stub through dlopen()) is invoked, then fork() ends up stuck in a umtx wait
that never gets signaled. It might also be related to the __isthreaded checks
made by lib/libc/net/nsdispatch.c, which smell fishy to me.
At the very least, it might be possible to make a smaller repro case by writing
a test program (that does not link libpthread) which dlopen()s a simple library
(which does link libpthread) and calls an entry point that spawns a thread.
Author: Konstantin Belousov <[hidden email]>
AuthorDate: 2021-01-12 09:02:37 +0000
Commit: Konstantin Belousov <[hidden email]>
CommitDate: 2021-01-12 10:45:44 +0000
libthr malloc: support recursion on thr_malloc_umtx.
One possible way the recursion can happen is during fork: suppose
that fork is called from early code that did not triggered
jemalloc(3) initialization yet. Then we lock thr_malloc lock, and
call malloc_prefork() that might require initialization of jemalloc
pthread_mutexes, calling into libthr malloc. It is safe to allow
recursion for this occurence.
Reported by: Vasily Postnicov <[hidden email]>
MFC after: 1 week
Sponsored by: The FreeBSD Foundation