Recently, we attempted to get with the 2000's and try switching from
NFSv3 to NFSv4 on our 12.2 servers. This has not gone well. Any system we switch to NFSv4 mounts is functionally unusable, pegged at 100% system CPU usage, load average 70+, largely from nfscl threads and client processes using NFS. Dmesg shows NFS-related messages: $ dmesg | fgrep -i nfs | sort | uniq -c | sort -n 1 nfsv4 err=10010 4 nfsv4 client/server protocol prob err=10026 29 nfscl: never fnd open Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1" both report: GtAttr Lookup Rdlink Read Write Rename Access Rddir 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Meanwhile, tcpdump on the client shows an endless stream of getattr requests at the exact same time nfsstat -c says nothing is happening: $ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39 14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.], ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100], length 0 14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val 234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148 getattr fh 0,5/0 14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val 234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152 getattr fh 0,5/0 14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val 234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152 getattr fh 0,5/0 14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val 234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148 getattr fh 0,5/0 14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val 234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152 getattr fh 0,5/0 14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val 234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152 getattr fh 0,5/0 14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val 234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152 getattr fh 0,5/0 14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val 234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152 getattr fh 0,5/0 14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val 234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152 getattr fh 0,5/0 Only 10 shown here for brevity, but: $ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 | fgrep getattr | wc -l tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes 10000 packets captured 20060 packets received by filter 0 packets dropped by kernel 9759 There are no dropped packets or network problems: $ netstat -in -I net1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll net1 1500 <Link#2> 12:33:df:5f:79:d7 40988832 0 0 48760307 0 0 net1 - 172.20.0.0/16 172.20.200.39 40942065 - - 48756241 - - The mount flags in fstab are: ro,nfsv4,nosuid The mount flags as reported by "nfsstat -m" are: nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647 Today, I managed to kill everything down to one user process that was exhibiting this behavior. After a kill -9 on that process, it went to "REsJ" but continued to burn the same amount of CPU (all system). Oddly the run state / wait channel was just "CPU1." Running "ktrace" did not produce any trace records. Probably that is predictable for a process in E state; if the process had crossed the user/kernel boundary in a way ktrace could detect, it would have exited. At that point, I started unmounting filesystems. Everything but the NFS filesystem used by that process unmounted cleanly. The umount for that filesystem went to D state for about a minute and then kicked back "Device busy." That's fair, if awfully slow. Meanwhile, that user process continued burning system CPU with the E flag set, not doing anything whatsoever in userspace, still producing 300+ "getattr fh 0,5/0" per second according to tcpdump and 0 according to nfsstat. Eventually, I rebooted with fstab set back to nfsv3. This feels like the user process is in a system call that is stuck in an endless loop repeating some operation that generates that getattr request. But that is a feeling, not a fact. This is fairly easy to reproduce; it seems pretty consistent within a few hours (a day at most) any time I switch the relevant mounts to nfsv4. Reverting to nfsv3 makes this issue completely disappear. What on earth could be going on here? What other information can I provide that would help track this down? Thanks for any advice! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
Any particular reason you choose to use NFSv4.0 and not NFSv4.1?
Also, it might be useful information if you could show the configuration your are using on the server and the clients. Are the client FreeBSD 12.2 also or (more common) some Linux variant? We are using NFS v4.0 and v4.1 with great success here from our FreeBSD 12.1, 12.2 and 11.3 servers from various Linux (and some OmniOS clients - only 4.0 on those) with Kerberos. With NFSv4 there are some additional things you need to set up compared to NFSv3. For example the NFS-Domain name which must be the same on servers & clients, and you must run the nfsuserd daemon, and have the V4 export line. Our NFS server setup: > root:/etc # egrep 'nfs|gss|sec' rc.conf rc.conf.d/* /boot/loader.conf /etc/sysctl.conf exports zfs/exports > > rc.conf:gssd_enable="YES" > rc.conf:nfs_server_enable="YES" > rc.conf:nfsv4_server_enable="YES" > rc.conf:nfscbd_enable="YES" > > rc.conf.d/nfsuserd:nfsuserd_enable="YES" > rc.conf.d/nfsuserd:nfsuserd_flags="-manage-gids -domain your.nfs.domain.id 16" > > exports:V4: /export -sec=krb5:krb5i:krb5p > > zfs/exports:/export/staff -sec=krb5:krb5i:krb5p On a Linux client (Debian for example) you need to configure the NFS-domain, make sure the idmap/gssd stuff is running and make sure you nfsmount correctly… /etc/default/nfs-common NEED_IDMAPD=yes NEED_GSSD=yes /etc/idmapd.conf [general] Domain = your.nfs.domain.id Local-Realms = YOUR-KRB5-REALM /etc/nfsmount.conf [NFSMount_Global_Options] Defaultvers = 4.1 Packages need on Linux clients: keyutils nfs-kernel-server (on Debian 9) Nfs-utils libnfsidmap nfs4-acl-tools rpcgssd (CentOS 7) We use “fstype=nfs4,sec=krb5” when mounting on the Linux clients. At least on CentOS 7 if you use “fstype=nfs,vers=4,sec=krb5” then iit will use 4.0 instead of the highest supported NFS version… - Peter > On 10 Dec 2020, at 17:15, J David <[hidden email]> wrote: > > Recently, we attempted to get with the 2000's and try switching from > NFSv3 to NFSv4 on our 12.2 servers. This has not gone well. > > Any system we switch to NFSv4 mounts is functionally unusable, pegged > at 100% system CPU usage, load average 70+, largely from nfscl threads > and client processes using NFS. > > Dmesg shows NFS-related messages: > > $ dmesg | fgrep -i nfs | sort | uniq -c | sort -n > 1 nfsv4 err=10010 > 4 nfsv4 client/server protocol prob err=10026 > 29 nfscl: never fnd open > > Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1" > both report: > > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > > Meanwhile, tcpdump on the client shows an endless stream of getattr > requests at the exact same time nfsstat -c says nothing is happening: > > $ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39 > 14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.], > ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100], > length 0 > 14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val > 234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148 > getattr fh 0,5/0 > 14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val > 234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152 > getattr fh 0,5/0 > 14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val > 234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152 > getattr fh 0,5/0 > 14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val > 234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148 > getattr fh 0,5/0 > 14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val > 234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152 > getattr fh 0,5/0 > 14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val > 234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152 > getattr fh 0,5/0 > 14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val > 234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152 > getattr fh 0,5/0 > 14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val > 234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152 > getattr fh 0,5/0 > 14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], > seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val > 234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152 > getattr fh 0,5/0 > > Only 10 shown here for brevity, but: > > $ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 | > fgrep getattr | wc -l > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes > 10000 packets captured > 20060 packets received by filter > 0 packets dropped by kernel > 9759 > > There are no dropped packets or network problems: > > $ netstat -in -I net1 > Name Mtu Network Address Ipkts Ierrs Idrop > Opkts Oerrs Coll > net1 1500 <Link#2> 12:33:df:5f:79:d7 40988832 0 0 > 48760307 0 0 > net1 - 172.20.0.0/16 172.20.200.39 40942065 - - > 48756241 - - > > The mount flags in fstab are: > > ro,nfsv4,nosuid > > The mount flags as reported by "nfsstat -m" are: > > nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647 > > Today, I managed to kill everything down to one user process that was > exhibiting this behavior. After a kill -9 on that process, it went to > "REsJ" but continued to burn the same amount of CPU (all system). > Oddly the run state / wait channel was just "CPU1." Running "ktrace" > did not produce any trace records. Probably that is predictable for a > process in E state; if the process had crossed the user/kernel > boundary in a way ktrace could detect, it would have exited. > > At that point, I started unmounting filesystems. Everything but the > NFS filesystem used by that process unmounted cleanly. The umount for > that filesystem went to D state for about a minute and then kicked > back "Device busy." That's fair, if awfully slow. > > Meanwhile, that user process continued burning system CPU with the E > flag set, not doing anything whatsoever in userspace, still producing > 300+ "getattr fh 0,5/0" per second according to tcpdump and 0 > according to nfsstat. > > Eventually, I rebooted with fstab set back to nfsv3. > > This feels like the user process is in a system call that is stuck in > an endless loop repeating some operation that generates that getattr > request. But that is a feeling, not a fact. > > This is fairly easy to reproduce; it seems pretty consistent within a > few hours (a day at most) any time I switch the relevant mounts to > nfsv4. Reverting to nfsv3 makes this issue completely disappear. > > What on earth could be going on here? What other information can I > provide that would help track this down? > > Thanks for any advice! > _______________________________________________ > [hidden email] mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
Ah, oops. The "12.2 servers" referred to at the top of the message
are the NFS *clients* in this scenario. They are application servers, not NFS servers. Sorry for the confusing overloaded usage of "server" there! Everything in the message (dmesg, tcpdump, nfsstat, etc.) is from the perspective of a FreeBSD 12.2 NFS client, which is where the problems are occurring. Our Linux servers (machines? instances? hosts? nodes?) that are NFS clients have been running NFSv4 against the same servers for many years without incident. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote:
> Show procstat -kk -p <pid> output for it. I will add this to the list of things to try the next time I provoke this issue. As you might expect, the people working on these machines don't appreciate these issues, so my goal is to gather as much of a strategy as I can before doing so again. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
J. David wrote:
>Recently, we attempted to get with the 2000's and try switching from >NFSv3 to NFSv4 on our 12.2 servers. This has not gone well. > >Any system we switch to NFSv4 mounts is functionally unusable, pegged >at 100% system CPU usage, load average 70+, largely from nfscl threads >and client processes using NFS. > >Dmesg shows NFS-related messages: > >$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n > 1 nfsv4 err=10010 > 4 nfsv4 client/server protocol prob err=10026 > 29 nfscl: never fnd open and error 10026 should go away (and I suspect that the 10010 will go away too. The correct semantics for handling the "seqid" field that serialized open/lock operations for NFSv4.0 is difficult to get correct (and might now be broken in the client, since the original code written 20years ago depended on exclusive vnode locking and hasn't been updated or interop tested with non-FreeBSD NFS servers for ages). --> NFSv4.0 is close to 20years old and has been fixed/superceded by NFSv4.1 for many years now. --> NFSv4.1 (and NFSv4.2) replaced the "seqid" stuff with something called "sessions", which works better. I have been tempted to make FreeBSD NFSv4 mounts use 4.1/4.2 by default to avoid problems with NFSv4.0, but I've hesitated since the change could be considered a POLA violation. NFSv4.0 is like any .0 release. There were significant issues with the protocol fixed by NFSv4.1. If you still have problems when using NFSv4.1, post again. Btw, "nfsstat -m" shows what the client mount options actually are. rick Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1" both report: GtAttr Lookup Rdlink Read Write Rename Access Rddir 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Meanwhile, tcpdump on the client shows an endless stream of getattr requests at the exact same time nfsstat -c says nothing is happening: $ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39 14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.], ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100], length 0 14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val 234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148 getattr fh 0,5/0 14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val 234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152 getattr fh 0,5/0 14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val 234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152 getattr fh 0,5/0 14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val 234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148 getattr fh 0,5/0 14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val 234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152 getattr fh 0,5/0 14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val 234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152 getattr fh 0,5/0 14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val 234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152 getattr fh 0,5/0 14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val 234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152 getattr fh 0,5/0 14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.], seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val 234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152 getattr fh 0,5/0 Only 10 shown here for brevity, but: $ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 | fgrep getattr | wc -l tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes 10000 packets captured 20060 packets received by filter 0 packets dropped by kernel 9759 There are no dropped packets or network problems: $ netstat -in -I net1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll net1 1500 <Link#2> 12:33:df:5f:79:d7 40988832 0 0 48760307 0 0 net1 - 172.20.0.0/16 172.20.200.39 40942065 - - 48756241 - - The mount flags in fstab are: ro,nfsv4,nosuid The mount flags as reported by "nfsstat -m" are: nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647 Today, I managed to kill everything down to one user process that was exhibiting this behavior. After a kill -9 on that process, it went to "REsJ" but continued to burn the same amount of CPU (all system). Oddly the run state / wait channel was just "CPU1." Running "ktrace" did not produce any trace records. Probably that is predictable for a process in E state; if the process had crossed the user/kernel boundary in a way ktrace could detect, it would have exited. At that point, I started unmounting filesystems. Everything but the NFS filesystem used by that process unmounted cleanly. The umount for that filesystem went to D state for about a minute and then kicked back "Device busy." That's fair, if awfully slow. Meanwhile, that user process continued burning system CPU with the E flag set, not doing anything whatsoever in userspace, still producing 300+ "getattr fh 0,5/0" per second according to tcpdump and 0 according to nfsstat. Eventually, I rebooted with fstab set back to nfsv3. This feels like the user process is in a system call that is stuck in an endless loop repeating some operation that generates that getattr request. But that is a feeling, not a fact. This is fairly easy to reproduce; it seems pretty consistent within a few hours (a day at most) any time I switch the relevant mounts to nfsv4. Reverting to nfsv3 makes this issue completely disappear. What on earth could be going on here? What other information can I provide that would help track this down? Thanks for any advice! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
J. David wrote:
>Ah, oops. The "12.2 servers" referred to at the top of the message >are the NFS *clients* in this scenario. They are application servers, >not NFS servers. Sorry for the confusing overloaded usage of "server" >there! So what is your NFS server running? Btw, if it happens to be a Linux system and you aren't using Kerberos, it will expect Users/Groups as the numbers in strings by default. To do that, do not start the nfsuserd(8) daemon on the client and instead add the following line to the client's /etc/sysctl.conf file: vfs.nfs.enable_uidtostring=1 When User/Group mapping is broken, you'll see lots of files owned by "nobody". Also, if you do want to see what the NFS packets look like, you can capture packets with tcpdump, but then look at them in wireshark. # tcpdump -s 0 -w out.pcap host <nfs-server> - then look at out.pcap in wireshark. Unlike tcpdump, wireshark knows how to parse NFS messages properly. rick ps: Once you have switched to NFSv4.1 and have User/Group mapping working, I suspect the NFS clients will be ok. Using NFSv4.1 also avoids FreeBSD NFS server issues w.r.t. tuning the DRC, since it is not used by NFSv4.1 (again, fixed by sessions). Everything in the message (dmesg, tcpdump, nfsstat, etc.) is from the perspective of a FreeBSD 12.2 NFS client, which is where the problems are occurring. Our Linux servers (machines? instances? hosts? nodes?) that are NFS clients have been running NFSv4 against the same servers for many years without incident. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by Rick Macklem
On 12/10/2020 7:59 PM, Rick Macklem wrote:
> J. David wrote: >> Recently, we attempted to get with the 2000's and try switching from >> NFSv3 to NFSv4 on our 12.2 servers. This has not gone well. >> >> Any system we switch to NFSv4 mounts is functionally unusable, pegged >> at 100% system CPU usage, load average 70+, largely from nfscl threads >> and client processes using NFS. >> >> Dmesg shows NFS-related messages: >> >> $ dmesg | fgrep -i nfs | sort | uniq -c | sort -n >> 1 nfsv4 err=10010 >> 4 nfsv4 client/server protocol prob err=10026 >> 29 nfscl: never fnd open > Add "minorversion=1" to your FreeBSD NFS client mount options > and error 10026 should go away (and I suspect that the 10010 will > go away too. Hi Rick, I never knew there was such an important difference. Is there a way on the server side to force only v4.1 connections from the client when they try and v4.x mount ? ---Mike _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote:
> E means exiting process. Is it multithreaded ? > Show procstat -kk -p <pid> output for it. To answer this separately, procstat -kk of an exiting process generating huge volumes of getattr requests produces nothing but the headers: # ps Haxlww | fgrep DNE 0 21281 18549 1 20 0 11196 2560 piperd S+ 1 0:00.00 fgrep DNE 125428 9661 1 0 36 15 0 16 nfsreq DNE+J 3- 3:22.54 job_exec # proctstat -kk 9661 PID TID COMM TDNAME KSTACK This happened while retesting on NFSv4.1. Although I don't know if the process was originally multithreaded, it appears it wasn't even single-threaded by the time it got into this state. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by Rick Macklem
Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not
resolve our issue. But I've narrowed down the problem to a harmful interaction between NFSv4 and nullfs. These FreeBSD NFS clients form a pool of application servers that run jobs for the application. A given job needs read-write access to its data and read-only access to the set of binaries it needs to run. The job data is horizontally partitioned across a set of directory trees spread over one set of NFS servers. A separate set of NFS servers store the read-only binary roots. The jobs are assigned to these machines by a scheduler. A job might take five milliseconds or five days. Historically, we have mounted the job data trees and the various binary roots on each application server over NFSv3. When a job starts, its setup binds the needed data and binaries into a jail via nullfs, then runs the job in the jail. This approach has worked perfectly for 10+ years. After I switched a server to NFSv4.1 to test that recommendation, it started having the same load problems as NFSv4. As a test, I altered it to mount NFS directly in the jails for both the data and the binaries. As "nullfs-NFS" jobs finished and "direct NFS" jobs started, the load and CPU usage started to fall dramatically. The critical problem with this approach is that privileged TCP ports are a finite resource. At two per job, this creates two issues. First, there's a hard limit on both simultaneous jobs per server inconsistent with the hardware's capabilities. Second, due to TIME_WAIT, it places a hard limit on job throughput. In practice, these limits also interfere with each other; the more simultaneous long jobs are running, the more impact TIME_WAIT has on short job throughput. While it's certainly possible to configure NFS not to require reserved ports, the slightest possibility of a non-root user establishing a session to the NFS server kills that as an option. Turning down TIME_WAIT helps, though the ability to do that only on the interface facing the NFS server would be more palatable than doing it globally. Adjusting net.inet.ip.portrange.lowlast does not seem to help. The code at sys/nfs/krpc_subr.c correctly uses ports between IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto and ipport_lowlastauto. But is that the correct place to look for NFSv4.1? How explosive would adding SO_REUSEADDR to the NFS client be? It's not a full solution, but it would handle the TIME_WAIT side of the issue. Even so, there may be no workaround for the simultaneous mount limit as long as reserved ports are required. Solving the negative interaction with nullfs seems like the only long-term fix. What would be a good next step there? Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by mdtancsa
mike tancsa wrote:
[stuff snipped] >Hi Rick, > > I never knew there was such an important difference. Is there a way >on the server side to force only v4.1 connections from the client when >they try and v4.x mount ? You can set the sysctl: vfs.nfsd.server_min_minorversion4=1 if your server has it. (I can't remember what versions of FreeBSD have it.) For Linux clients, they will usually use the highest minor version the server supports. FreeBSD clients will use 0 unless the "minorversion=1" option is on the mount command. To be honest, I have only heard of a couple of other sites having the NFSERR_BADSEQID (10026) error problem and it sounds like J David's problem is related to nullfs and jails. 4.0->4.1 was a minor revision in name only. RFC5661 (the NFSv4.1 one) is over 500pages. Not a trivial update. On the other hand, 4.1->4.2 is a minor update, made up of a bunch of additional optional features like SEEK_HOLE/SEEK_DATA support and local copy_file_range() support in the server. rick ---Mike _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
On Fri, Dec 11, 2020 at 2:52 PM J David <[hidden email]> wrote:
> Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not > resolve our issue. But I've narrowed down the problem to a harmful > interaction between NFSv4 and nullfs. > > These FreeBSD NFS clients form a pool of application servers that run > jobs for the application. A given job needs read-write access to its > data and read-only access to the set of binaries it needs to run. > > The job data is horizontally partitioned across a set of directory > trees spread over one set of NFS servers. A separate set of NFS > servers store the read-only binary roots. > > The jobs are assigned to these machines by a scheduler. A job might > take five milliseconds or five days. > > Historically, we have mounted the job data trees and the various > binary roots on each application server over NFSv3. When a job > starts, its setup binds the needed data and binaries into a jail via > nullfs, then runs the job in the jail. This approach has worked > perfectly for 10+ years. > > After I switched a server to NFSv4.1 to test that recommendation, it > started having the same load problems as NFSv4. As a test, I altered > it to mount NFS directly in the jails for both the data and the > binaries. As "nullfs-NFS" jobs finished and "direct NFS" jobs > started, the load and CPU usage started to fall dramatically. > > The critical problem with this approach is that privileged TCP ports > are a finite resource. At two per job, this creates two issues. > > First, there's a hard limit on both simultaneous jobs per server > inconsistent with the hardware's capabilities. Second, due to > TIME_WAIT, it places a hard limit on job throughput. In practice, > these limits also interfere with each other; the more simultaneous > long jobs are running, the more impact TIME_WAIT has on short job > throughput. > > While it's certainly possible to configure NFS not to require reserved > ports, the slightest possibility of a non-root user establishing a > session to the NFS server kills that as an option. > > Turning down TIME_WAIT helps, though the ability to do that only on > the interface facing the NFS server would be more palatable than doing > it globally. > > Adjusting net.inet.ip.portrange.lowlast does not seem to help. The > code at sys/nfs/krpc_subr.c correctly uses ports between > IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto > and ipport_lowlastauto. But is that the correct place to look for > NFSv4.1? > > How explosive would adding SO_REUSEADDR to the NFS client be? It's > not a full solution, but it would handle the TIME_WAIT side of the > issue. > > Even so, there may be no workaround for the simultaneous mount limit > as long as reserved ports are required. Solving the negative > interaction with nullfs seems like the only long-term fix. > > What would be a good next step there? > > Thanks! > That's some good information. However, it must not be the whole story. I've been nullfs mounting my NFS mounts for years. For example, right now on a FreeBSD 12.2-RC2 machine: > sudo nfsstat -m Password: 192.168.0.2:/home on /usr/home nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647 > mount | grep home 192.168.0.2:/home on /usr/home (nfs, nfsv4acls) /usr/home on /iocage/jails/rustup2/root/usr/home (nullfs) Are you using any mount options with nullfs? It might be worth trying to make the read-only mount into read-write, to see if that helps. And what does "jls -n" show? -Alan _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
J David wrote:
>Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not >resolve our issue. But I've narrowed down the problem to a harmful >interaction between NFSv4 and nullfs. I am afraid I know nothing about nullfs and jails. I suspect it will be something related to when file descriptors in the NFS client mount get closed. The NFSv4 Open is a Windows Open lock and has nothing to do with a POSIX open. Since only one of these can exist for each <client process, file> tuple, the NFSv4 Close must be delayed until all POSIX Opens on the file have been closed, including open file descriptors inherited by children processes. Someone else recently reported problems using nullfs and vnet jails. >These FreeBSD NFS clients form a pool of application servers that run >jobs for the application. A given job needs read-write access to its >data and read-only access to the set of binaries it needs to run. > >The job data is horizontally partitioned across a set of directory >trees spread over one set of NFS servers. A separate set of NFS >servers store the read-only binary roots. > >The jobs are assigned to these machines by a scheduler. A job might >take five milliseconds or five days. > >Historically, we have mounted the job data trees and the various >binary roots on each application server over NFSv3. When a job >starts, its setup binds the needed data and binaries into a jail via >nullfs, then runs the job in the jail. This approach has worked >perfectly for 10+ years. any of the additional features it offers... >After I switched a server to NFSv4.1 to test that recommendation, it >started having the same load problems as NFSv4. As a test, I altered >it to mount NFS directly in the jails for both the data and the >binaries. As "nullfs-NFS" jobs finished and "direct NFS" jobs >started, the load and CPU usage started to fall dramatically. Good work isolating the problem. Imay try playing with NFSv4/nullfs someday soon and see if I can break it. >The critical problem with this approach is that privileged TCP ports >are a finite resource. At two per job, this creates two issues. > >First, there's a hard limit on both simultaneous jobs per server >inconsistent with the hardware's capabilities. Second, due to >TIME_WAIT, it places a hard limit on job throughput. In practice, >these limits also interfere with each other; the more simultaneous >long jobs are running, the more impact TIME_WAIT has on short job >throughput. > >While it's certainly possible to configure NFS not to require reserved >ports, the slightest possibility of a non-root user establishing a >session to the NFS server kills that as an option. any real security for most situations. Unless you set "vfs.usermount=1" only root can do the mount. For non-root to mount the NFS server when "vfs.usermount=0", a user would have to run their own custom hacked userland NFS client. Although doable, I have never heard of it being done. rick Turning down TIME_WAIT helps, though the ability to do that only on the interface facing the NFS server would be more palatable than doing it globally. Adjusting net.inet.ip.portrange.lowlast does not seem to help. The code at sys/nfs/krpc_subr.c correctly uses ports between IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto and ipport_lowlastauto. But is that the correct place to look for NFSv4.1? How explosive would adding SO_REUSEADDR to the NFS client be? It's not a full solution, but it would handle the TIME_WAIT side of the issue. Even so, there may be no workaround for the simultaneous mount limit as long as reserved ports are required. Solving the negative interaction with nullfs seems like the only long-term fix. What would be a good next step there? Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
On Fri, Dec 11, 2020 at 4:28 PM Rick Macklem <[hidden email]> wrote:
> J David wrote: > >Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not > >resolve our issue. But I've narrowed down the problem to a harmful > >interaction between NFSv4 and nullfs. > I am afraid I know nothing about nullfs and jails. I suspect it will be > something related to when file descriptors in the NFS client mount > get closed. > > The NFSv4 Open is a Windows Open lock and has nothing to do with > a POSIX open. Since only one of these can exist for each > <client process, file> tuple, the NFSv4 Close must be delayed until > all POSIX Opens on the file have been closed, including open file > descriptors inherited by children processes. > Does it make a difference whether the files are opened read-only or read-write? My longstanding practice has been to never use NFS to store object files while compiling. I do that for performance reasons, and I didn't think that nullfs had anything to do with it (but maybe it does). > > Someone else recently reported problems using nullfs and vnet jails. > > >These FreeBSD NFS clients form a pool of application servers that run > >jobs for the application. A given job needs read-write access to its > >data and read-only access to the set of binaries it needs to run. > > > >The job data is horizontally partitioned across a set of directory > >trees spread over one set of NFS servers. A separate set of NFS > >servers store the read-only binary roots. > > > >The jobs are assigned to these machines by a scheduler. A job might > >take five milliseconds or five days. > > > >Historically, we have mounted the job data trees and the various > >binary roots on each application server over NFSv3. When a job > >starts, its setup binds the needed data and binaries into a jail via > >nullfs, then runs the job in the jail. This approach has worked > >perfectly for 10+ years. > Well, NFSv3 is not going away any time soon, so if you don't need > any of the additional features it offers... > > >After I switched a server to NFSv4.1 to test that recommendation, it > >started having the same load problems as NFSv4. As a test, I altered > >it to mount NFS directly in the jails for both the data and the > >binaries. As "nullfs-NFS" jobs finished and "direct NFS" jobs > >started, the load and CPU usage started to fall dramatically. > Good work isolating the problem. Imay try playing with NFSv4/nullfs > someday soon and see if I can break it. > > >The critical problem with this approach is that privileged TCP ports > >are a finite resource. At two per job, this creates two issues. > > > >First, there's a hard limit on both simultaneous jobs per server > >inconsistent with the hardware's capabilities. Second, due to > >TIME_WAIT, it places a hard limit on job throughput. In practice, > >these limits also interfere with each other; the more simultaneous > >long jobs are running, the more impact TIME_WAIT has on short job > >throughput. > > > >While it's certainly possible to configure NFS not to require reserved > >ports, the slightest possibility of a non-root user establishing a > >session to the NFS server kills that as an option. > Personally, I've never thought the reserved port# requirement provided > any real security for most situations. Unless you set "vfs.usermount=1" > only root can do the mount. For non-root to mount the NFS server > when "vfs.usermount=0", a user would have to run their own custom hacked > userland NFS client. Although doable, I have never heard of it being done. > There are a few out there. For example, https://github.com/sahlberg/libnfs . > > rick > > Turning down TIME_WAIT helps, though the ability to do that only on > the interface facing the NFS server would be more palatable than doing > it globally. > > Adjusting net.inet.ip.portrange.lowlast does not seem to help. The > code at sys/nfs/krpc_subr.c correctly uses ports between > IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto > and ipport_lowlastauto. But is that the correct place to look for > NFSv4.1? > > How explosive would adding SO_REUSEADDR to the NFS client be? It's > not a full solution, but it would handle the TIME_WAIT side of the > issue. > > Even so, there may be no workaround for the simultaneous mount limit > as long as reserved ports are required. Solving the negative > interaction with nullfs seems like the only long-term fix. > > What would be a good next step there? > > Thanks! > _______________________________________________ > [hidden email] mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "[hidden email]" > [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by Alan Somers-2
Alan Somers wrote:
[stuff snipped] >That's some good information. However, it must not be the whole story. I've >been nullfs mounting my NFS mounts for years. For example, right now on a >FreeBSD 12.2-RC2 machine: If I recall, you were one of the two people that needed to switch to "minorversion=1" to get rid of NFSERR_BADSEQID (10026) errors. Is that correct? >> sudo nfsstat -m >Password: >192.168.0.2:/home on /usr/home >nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acreg>min=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=6553>6,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=>2147483647 Btw, using "soft" with NFSv4 mounts is a bad idea. (See the BUGS section of "man mount_nfs".) If you have a hung NFSv4 mount, you can use # umount -N /usr/home to dismount it. (It may take a couple of minutes.) rick > mount | grep home 192.168.0.2:/home on /usr/home (nfs, nfsv4acls) /usr/home on /iocage/jails/rustup2/root/usr/home (nullfs) Are you using any mount options with nullfs? It might be worth trying to make the read-only mount into read-write, to see if that helps. And what does "jls -n" show? -Alan _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
On Fri, Dec 11, 2020 at 4:39 PM Rick Macklem <[hidden email]> wrote:
> Alan Somers wrote: > [stuff snipped] > >That's some good information. However, it must not be the whole story. > I've >been nullfs mounting my NFS mounts for years. For example, right now > on a >FreeBSD 12.2-RC2 machine: > If I recall, you were one of the two people that needed to switch to > "minorversion=1" to get rid of NFSERR_BADSEQID (10026) errors. > Is that correct? > In fact, yes. Though that case had nothing to do with nullfs or jails. > > >> sudo nfsstat -m > >Password: > >192.168.0.2:/home on /usr/home > > >nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acreg>min=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=6553>6,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=>2147483647 > Btw, using "soft" with NFSv4 mounts is a bad idea. (See the BUGS section of > "man mount_nfs".) > Grahh. I forgot that was in there. I can't remember why I put that there. These days I agree with you, and advise other people to use hard mounts, too. Thanks for point it out. > > If you have a hung NFSv4 mount, you can use > # umount -N /usr/home > to dismount it. (It may take a couple of minutes.) > > rick > > > mount | grep home > 192.168.0.2:/home on /usr/home (nfs, nfsv4acls) > /usr/home on /iocage/jails/rustup2/root/usr/home (nullfs) > > Are you using any mount options with nullfs? It might be worth trying to > make the read-only mount into read-write, to see if that helps. And what > does "jls -n" show? > -Alan > [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
J David wrote:
[lots of stuff snipped] >Even so, there may be no workaround for the simultaneous mount limit >as long as reserved ports are required. Solving the negative >interaction with nullfs seems like the only long-term fix. > >What would be a good next step there? Well, if you have a test system you can break, doing # nfsstat -c -E once it is constipated could be useful. Look for the numbers under OpenOwner Opens LockOwner ... and see if any of them are getting very large. rick Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by J David
On Fri, Dec 11, 2020 at 03:30:29PM -0500, J David wrote:
> On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote: > > E means exiting process. Is it multithreaded ? > > Show procstat -kk -p <pid> output for it. > > To answer this separately, procstat -kk of an exiting process > generating huge volumes of getattr requests produces nothing but the > headers: > > # ps Haxlww | fgrep DNE > 0 21281 18549 1 20 0 11196 2560 piperd S+ 1 > 0:00.00 fgrep DNE > 125428 9661 1 0 36 15 0 16 nfsreq DNE+J 3- > 3:22.54 job_exec > # proctstat -kk 9661 > PID TID COMM TDNAME KSTACK > > This happened while retesting on NFSv4.1. Although I don't know if > the process was originally multithreaded, it appears it wasn't even > single-threaded by the time it got into this state. Ok, do 'procstat -kk -a' instead. Exiting processes are not excluded from the kstack sysctl, might be you just raced with termination. Or, if you have serial console, enter ddb, then do 'bt <pid>'. Or if you have kernel built with symbols, # kgdb /boot/kernel/kernel /dev/mem (gdb) proc <pid> (gdb) bt but this has low chances of work for running process. procstat -kk -a output might be the most informative anyway. _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
On Fri, Dec 11, 2020 at 8:09 PM Konstantin Belousov <[hidden email]> wrote:
> Ok, do 'procstat -kk -a' instead. Exiting processes are not excluded from > the kstack sysctl, might be you just raced with termination. No, it's not a race. When this is occurring, processes sit in "exiting" for several minutes like that, doing (apparently) nothing. What's weird is that I was able to unmount the nullfs mount, but not the NFS mount, even though the process would have had to access the NFS mount through the nullfs mount. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by Rick Macklem
On Fri, Dec 11, 2020 at 6:28 PM Rick Macklem <[hidden email]> wrote:
> I am afraid I know nothing about nullfs and jails. I suspect it will be > something related to when file descriptors in the NFS client mount > get closed. What does NFSv4 do differently than NFSv3 that might upset a low-level consumer like nullfs? > Well, NFSv3 is not going away any time soon, so if you don't need > any of the additional features it offers... If we did not want the additional features, we definitely would not be attempting this. > a user would have to run their own custom hacked > userland NFS client. Although doable, I have never heard of it being done. Alex beat me to libnfs. What about this as a stopgap measure? > How explosive would adding SO_REUSEADDR to the NFS client be? It's > not a full solution, but it would handle the TIME_WAIT side of the > issue. The kernel NFS networking code is confusing to me. I can't even figure out where/how NFSv4 binds a client socket to know if it's possible. (Pretty sure the code in sys/nfs/krpc_subr.c is not it.) Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
In reply to this post by Alan Somers-2
On Fri, Dec 11, 2020 at 6:08 PM Alan Somers <[hidden email]> wrote:
> That's some good information. However, it must not be the whole story. Indeed not. If it were, this would happen instantly every time. There must be some sort of trigger. But there are a lot of jobs that run and I didn't write any of them. So the search space is large. > Are you using any mount options with nullfs? nosuid and, on half the mounts, ro. > It might be worth trying to make the read-only mount into read-write, to see if that helps. It won't; the read-only mounts they are exported read-only on the server side. And no one is going to sign off on changing that, not even for a minute. > And what does "jls -n" show? Here is an example, newlines added for readability: devfs_ruleset=0 nodying enforce_statfs=2 host=new ip4=disable ip6=disable jid=1020 linux=new name=job-1020 osreldate=1202000 osrelease=12.2-RELEASE parent=0 path=/job/roots/job-1020 persist securelevel=-1 sysvmsg=inherit sysvsem=inherit sysvshm=inherit vnet=inherit allow.nochflags allow.nomlock allow.nomount allow.mount.nodevfs allow.mount.nofdescfs allow.mount.nofusefs allow.mount.nonullfs allow.mount.noprocfs allow.mount.notmpfs allow.noquotas allow.noraw_sockets allow.noread_msgbuf allow.reserved_ports allow.set_hostname allow.nosocket_af allow.sysvipc children.cur=0 children.max=0 cpuset.id=87 host.domainname=/""}"" host.hostid=0 host.hostname=job1020.local host.hostuuid=00000000-0000-0000-0000-000000000000 ip4.addr=10.0.3.252 ip4.saddrsel ip6.addr=2001:db8::1 ip6.saddrsel linux.osname=Linux linux.osrelease=3.2.0 linux.oss_version=198144 Seems like the next step is to find a reproduction that doesn't involve people calling me asking angry questions about why things are broken again. Thanks! _______________________________________________ [hidden email] mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "[hidden email]" |
Free forum by Nabble | Edit this page |