Major issues with nfsv4

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Major issues with nfsv4

J David
Recently, we attempted to get with the 2000's and try switching from
NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.

Any system we switch to NFSv4 mounts is functionally unusable, pegged
at 100% system CPU usage, load average 70+, largely from nfscl threads
and client processes using NFS.

Dmesg shows NFS-related messages:

$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
   1 nfsv4 err=10010
   4 nfsv4 client/server protocol prob err=10026
  29 nfscl: never fnd open

Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"
both report:

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0

Meanwhile, tcpdump on the client shows an endless stream of getattr
requests at the exact same time nfsstat -c says nothing is happening:

$ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39
14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],
ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],
length 0
14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val
234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148
getattr fh 0,5/0
14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val
234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152
getattr fh 0,5/0
14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val
234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152
getattr fh 0,5/0
14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val
234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148
getattr fh 0,5/0
14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val
234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152
getattr fh 0,5/0
14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val
234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152
getattr fh 0,5/0
14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val
234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152
getattr fh 0,5/0
14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val
234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152
getattr fh 0,5/0
14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val
234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152
getattr fh 0,5/0

Only 10 shown here for brevity, but:

$ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |
fgrep getattr | wc -l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
10000 packets captured
20060 packets received by filter
0 packets dropped by kernel
    9759

There are no dropped packets or network problems:

$ netstat -in -I net1
Name    Mtu Network       Address              Ipkts Ierrs Idrop
Opkts Oerrs  Coll
net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0
48760307     0     0
net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -
48756241     -     -

The mount flags in fstab are:

ro,nfsv4,nosuid

The mount flags as reported by "nfsstat -m" are:

nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647

Today, I managed to kill everything down to one user process that was
exhibiting this behavior.  After a kill -9 on that process, it went to
"REsJ" but continued to burn the same amount of CPU (all system).
Oddly the run state / wait channel was just "CPU1."  Running "ktrace"
did not produce any trace records.  Probably that is predictable for a
process in E state; if the process had crossed the user/kernel
boundary in a way ktrace could detect, it would have exited.

At that point, I started unmounting filesystems.  Everything but the
NFS filesystem used by that process unmounted cleanly.  The umount for
that filesystem went to D state for about a minute and then kicked
back "Device busy."  That's fair, if awfully slow.

Meanwhile, that user process continued burning system CPU with the E
flag set, not doing anything whatsoever in userspace, still producing
300+ "getattr fh 0,5/0" per second according to tcpdump and 0
according to nfsstat.

Eventually, I rebooted with fstab set back to nfsv3.

This feels like the user process is in a system call that is stuck in
an endless loop repeating some operation that generates that getattr
request.  But that is a feeling, not a fact.

This is fairly easy to reproduce; it seems pretty consistent within a
few hours (a day at most) any time I switch the relevant mounts to
nfsv4.  Reverting to nfsv3 makes this issue completely disappear.

What on earth could be going on here?  What other information can I
provide that would help track this down?

Thanks for any advice!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Peter Eriksson-2
Any particular reason you choose to use NFSv4.0 and not NFSv4.1?

Also, it might be useful information if you could show the configuration your are using on the server and the clients. Are the client FreeBSD 12.2 also or (more common) some Linux variant?

We are using NFS v4.0 and v4.1 with great success here from our FreeBSD 12.1, 12.2 and 11.3 servers from various Linux (and some OmniOS clients - only 4.0 on those) with
Kerberos.

With NFSv4 there are some additional things you need to set up compared to NFSv3. For example the NFS-Domain name which must be the same on servers & clients, and you must run the nfsuserd daemon, and have the V4 export line.


Our NFS server setup:

> root:/etc # egrep 'nfs|gss|sec' rc.conf rc.conf.d/* /boot/loader.conf /etc/sysctl.conf exports zfs/exports
>
> rc.conf:gssd_enable="YES"
> rc.conf:nfs_server_enable="YES"
> rc.conf:nfsv4_server_enable="YES"
> rc.conf:nfscbd_enable="YES"
>
> rc.conf.d/nfsuserd:nfsuserd_enable="YES"
> rc.conf.d/nfsuserd:nfsuserd_flags="-manage-gids -domain your.nfs.domain.id 16"
>
> exports:V4: /export -sec=krb5:krb5i:krb5p
>
> zfs/exports:/export/staff -sec=krb5:krb5i:krb5p


On a Linux client (Debian for example) you need to configure the NFS-domain, make sure the idmap/gssd stuff is running and make sure you nfsmount correctly…

/etc/default/nfs-common
        NEED_IDMAPD=yes
        NEED_GSSD=yes

/etc/idmapd.conf
        [general]
        Domain = your.nfs.domain.id
        Local-Realms = YOUR-KRB5-REALM

/etc/nfsmount.conf
        [NFSMount_Global_Options]
        Defaultvers = 4.1

Packages need on Linux clients:
 keyutils nfs-kernel-server (on Debian 9)
 Nfs-utils libnfsidmap nfs4-acl-tools rpcgssd (CentOS 7)

We use “fstype=nfs4,sec=krb5” when mounting on the Linux clients. At least on CentOS 7 if you use “fstype=nfs,vers=4,sec=krb5” then iit will use 4.0 instead of the highest supported NFS version…

- Peter

> On 10 Dec 2020, at 17:15, J David <[hidden email]> wrote:
>
> Recently, we attempted to get with the 2000's and try switching from
> NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.
>
> Any system we switch to NFSv4 mounts is functionally unusable, pegged
> at 100% system CPU usage, load average 70+, largely from nfscl threads
> and client processes using NFS.
>
> Dmesg shows NFS-related messages:
>
> $ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
>  1 nfsv4 err=10010
>  4 nfsv4 client/server protocol prob err=10026
> 29 nfscl: never fnd open
>
> Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"
> both report:
>
> GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>     0      0      0      0      0      0      0      0
>
> Meanwhile, tcpdump on the client shows an endless stream of getattr
> requests at the exact same time nfsstat -c says nothing is happening:
>
> $ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39
> 14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],
> ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],
> length 0
> 14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val
> 234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148
> getattr fh 0,5/0
> 14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val
> 234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152
> getattr fh 0,5/0
> 14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val
> 234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152
> getattr fh 0,5/0
> 14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val
> 234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148
> getattr fh 0,5/0
> 14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val
> 234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152
> getattr fh 0,5/0
> 14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val
> 234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152
> getattr fh 0,5/0
> 14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val
> 234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152
> getattr fh 0,5/0
> 14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val
> 234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152
> getattr fh 0,5/0
> 14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
> seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val
> 234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152
> getattr fh 0,5/0
>
> Only 10 shown here for brevity, but:
>
> $ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |
> fgrep getattr | wc -l
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
> 10000 packets captured
> 20060 packets received by filter
> 0 packets dropped by kernel
>   9759
>
> There are no dropped packets or network problems:
>
> $ netstat -in -I net1
> Name    Mtu Network       Address              Ipkts Ierrs Idrop
> Opkts Oerrs  Coll
> net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0
> 48760307     0     0
> net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -
> 48756241     -     -
>
> The mount flags in fstab are:
>
> ro,nfsv4,nosuid
>
> The mount flags as reported by "nfsstat -m" are:
>
> nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647
>
> Today, I managed to kill everything down to one user process that was
> exhibiting this behavior.  After a kill -9 on that process, it went to
> "REsJ" but continued to burn the same amount of CPU (all system).
> Oddly the run state / wait channel was just "CPU1."  Running "ktrace"
> did not produce any trace records.  Probably that is predictable for a
> process in E state; if the process had crossed the user/kernel
> boundary in a way ktrace could detect, it would have exited.
>
> At that point, I started unmounting filesystems.  Everything but the
> NFS filesystem used by that process unmounted cleanly.  The umount for
> that filesystem went to D state for about a minute and then kicked
> back "Device busy."  That's fair, if awfully slow.
>
> Meanwhile, that user process continued burning system CPU with the E
> flag set, not doing anything whatsoever in userspace, still producing
> 300+ "getattr fh 0,5/0" per second according to tcpdump and 0
> according to nfsstat.
>
> Eventually, I rebooted with fstab set back to nfsv3.
>
> This feels like the user process is in a system call that is stuck in
> an endless loop repeating some operation that generates that getattr
> request.  But that is a feeling, not a fact.
>
> This is fairly easy to reproduce; it seems pretty consistent within a
> few hours (a day at most) any time I switch the relevant mounts to
> nfsv4.  Reverting to nfsv3 makes this issue completely disappear.
>
> What on earth could be going on here?  What other information can I
> provide that would help track this down?
>
> Thanks for any advice!
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
Ah, oops.  The "12.2 servers" referred to at the top of the message
are the NFS *clients* in this scenario.  They are application servers,
not NFS servers.  Sorry for the confusing overloaded usage of "server"
there!

Everything in the message (dmesg, tcpdump, nfsstat, etc.) is from the
perspective of a FreeBSD 12.2 NFS client, which is where the problems
are occurring.

Our Linux servers (machines? instances? hosts? nodes?) that are NFS
clients have been running NFSv4 against the same servers for many
years without incident.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
In reply to this post by J David
On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote:
> Show procstat -kk -p <pid> output for it.

I will add this to the list of things to try the next time I provoke
this issue.  As you might expect, the people working on these machines
don't appreciate these issues, so my goal is to gather as much of a
strategy as I can before doing so again.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Rick Macklem
In reply to this post by J David
J. David wrote:

>Recently, we attempted to get with the 2000's and try switching from
>NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.
>
>Any system we switch to NFSv4 mounts is functionally unusable, pegged
>at 100% system CPU usage, load average 70+, largely from nfscl threads
>and client processes using NFS.
>
>Dmesg shows NFS-related messages:
>
>$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
>   1 nfsv4 err=10010
>   4 nfsv4 client/server protocol prob err=10026
>  29 nfscl: never fnd open
Add "minorversion=1" to your FreeBSD NFS client mount options
and error 10026 should go away (and I suspect that the 10010 will
go away too.

The correct semantics for handling the "seqid" field that
serialized open/lock operations for NFSv4.0 is difficult to get
correct (and might now be broken in the client, since the
original code written 20years ago depended on exclusive
vnode locking and hasn't been updated or interop tested with
non-FreeBSD NFS servers for ages).
--> NFSv4.0 is close to 20years old and has been fixed/superceded
      by NFSv4.1 for many years now.
--> NFSv4.1 (and NFSv4.2) replaced the "seqid" stuff with something
      called "sessions", which works better.

I have been tempted to make FreeBSD NFSv4 mounts use 4.1/4.2
by default to avoid problems with NFSv4.0, but I've hesitated since
the change could be considered a POLA violation.

NFSv4.0 is like any .0 release. There were significant issues with the
protocol fixed by NFSv4.1.

If you still have problems when using NFSv4.1, post again.
Btw, "nfsstat -m" shows what the client mount options actually are.

rick



















Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"
both report:

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0

Meanwhile, tcpdump on the client shows an endless stream of getattr
requests at the exact same time nfsstat -c says nothing is happening:

$ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39
14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],
ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],
length 0
14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val
234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148
getattr fh 0,5/0
14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val
234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152
getattr fh 0,5/0
14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val
234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152
getattr fh 0,5/0
14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val
234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148
getattr fh 0,5/0
14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val
234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152
getattr fh 0,5/0
14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val
234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152
getattr fh 0,5/0
14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val
234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152
getattr fh 0,5/0
14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val
234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152
getattr fh 0,5/0
14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val
234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152
getattr fh 0,5/0

Only 10 shown here for brevity, but:

$ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |
fgrep getattr | wc -l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
10000 packets captured
20060 packets received by filter
0 packets dropped by kernel
    9759

There are no dropped packets or network problems:

$ netstat -in -I net1
Name    Mtu Network       Address              Ipkts Ierrs Idrop
Opkts Oerrs  Coll
net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0
48760307     0     0
net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -
48756241     -     -

The mount flags in fstab are:

ro,nfsv4,nosuid

The mount flags as reported by "nfsstat -m" are:

nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647

Today, I managed to kill everything down to one user process that was
exhibiting this behavior.  After a kill -9 on that process, it went to
"REsJ" but continued to burn the same amount of CPU (all system).
Oddly the run state / wait channel was just "CPU1."  Running "ktrace"
did not produce any trace records.  Probably that is predictable for a
process in E state; if the process had crossed the user/kernel
boundary in a way ktrace could detect, it would have exited.

At that point, I started unmounting filesystems.  Everything but the
NFS filesystem used by that process unmounted cleanly.  The umount for
that filesystem went to D state for about a minute and then kicked
back "Device busy."  That's fair, if awfully slow.

Meanwhile, that user process continued burning system CPU with the E
flag set, not doing anything whatsoever in userspace, still producing
300+ "getattr fh 0,5/0" per second according to tcpdump and 0
according to nfsstat.

Eventually, I rebooted with fstab set back to nfsv3.

This feels like the user process is in a system call that is stuck in
an endless loop repeating some operation that generates that getattr
request.  But that is a feeling, not a fact.

This is fairly easy to reproduce; it seems pretty consistent within a
few hours (a day at most) any time I switch the relevant mounts to
nfsv4.  Reverting to nfsv3 makes this issue completely disappear.

What on earth could be going on here?  What other information can I
provide that would help track this down?

Thanks for any advice!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Rick Macklem
In reply to this post by J David
J. David wrote:
>Ah, oops.  The "12.2 servers" referred to at the top of the message
>are the NFS *clients* in this scenario.  They are application servers,
>not NFS servers.  Sorry for the confusing overloaded usage of "server"
>there!
So what is your NFS server running?

Btw, if it happens to be a Linux system and you aren't using Kerberos,
it will expect Users/Groups as the numbers in strings by default.
To do that, do not start the nfsuserd(8) daemon on the client and
instead add the following line to the client's /etc/sysctl.conf file:
vfs.nfs.enable_uidtostring=1

When User/Group mapping is broken, you'll see lots of files owned
by "nobody".

Also, if you do want to see what the NFS packets look like, you can
capture packets with tcpdump, but then look at them in wireshark.
# tcpdump -s 0 -w out.pcap host <nfs-server>
- then look at out.pcap in wireshark. Unlike tcpdump, wireshark
  knows how to parse NFS messages properly.

rick
ps: Once you have switched to NFSv4.1 and have User/Group
     mapping working, I suspect the NFS clients will be ok.
     Using NFSv4.1 also avoids FreeBSD NFS server issues w.r.t.
     tuning the DRC, since it is not used by NFSv4.1 (again, fixed
     by sessions).

Everything in the message (dmesg, tcpdump, nfsstat, etc.) is from the
perspective of a FreeBSD 12.2 NFS client, which is where the problems
are occurring.

Our Linux servers (machines? instances? hosts? nodes?) that are NFS
clients have been running NFSv4 against the same servers for many
years without incident.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

forcing nfsv4 versions from the server? (was Re: Major issues with nfsv4

mdtancsa
In reply to this post by Rick Macklem
On 12/10/2020 7:59 PM, Rick Macklem wrote:

> J. David wrote:
>> Recently, we attempted to get with the 2000's and try switching from
>> NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.
>>
>> Any system we switch to NFSv4 mounts is functionally unusable, pegged
>> at 100% system CPU usage, load average 70+, largely from nfscl threads
>> and client processes using NFS.
>>
>> Dmesg shows NFS-related messages:
>>
>> $ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
>>   1 nfsv4 err=10010
>>   4 nfsv4 client/server protocol prob err=10026
>>  29 nfscl: never fnd open
> Add "minorversion=1" to your FreeBSD NFS client mount options
> and error 10026 should go away (and I suspect that the 10010 will
> go away too.

Hi Rick,

    I never knew there was such an important difference. Is there a way
on the server side to force only v4.1 connections from the client when
they try and v4.x mount ?

    ---Mike

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
In reply to this post by J David
On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote:
> E means exiting process.  Is it multithreaded ?
> Show procstat -kk -p <pid> output for it.

To answer this separately, procstat -kk of an exiting process
generating huge volumes of getattr requests produces nothing but the
headers:

# ps Haxlww | fgrep DNE
     0 21281 18549  1  20  0  11196  2560 piperd   S+     1
0:00.00 fgrep DNE
125428  9661     1  0  36 15      0    16 nfsreq   DNE+J   3-
3:22.54 job_exec
# proctstat -kk 9661
  PID    TID COMM                TDNAME              KSTACK

This happened while retesting on NFSv4.1.  Although I don't know if
the process was originally multithreaded, it appears it wasn't even
single-threaded by the time it got into this state.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
In reply to this post by Rick Macklem
Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not
resolve our issue.  But I've narrowed down the problem to a harmful
interaction between NFSv4 and nullfs.

These FreeBSD NFS clients form a pool of application servers that run
jobs for the application.  A given job needs read-write access to its
data and read-only access to the set of binaries it needs to run.

The job data is horizontally partitioned across a set of directory
trees spread over one set of NFS servers.  A separate set of NFS
servers store the read-only binary roots.

The jobs are assigned to these machines by a scheduler.  A job might
take five milliseconds or five days.

Historically, we have mounted the job data trees and the various
binary roots on each application server over NFSv3.  When a job
starts, its setup binds the needed data and binaries into a jail via
nullfs, then runs the job in the jail.  This approach has worked
perfectly for 10+ years.

After I switched a server to NFSv4.1 to test that recommendation, it
started having the same load problems as NFSv4.  As a test, I altered
it to mount NFS directly in the jails for both the data and the
binaries.  As "nullfs-NFS" jobs finished and "direct NFS" jobs
started, the load and CPU usage started to fall dramatically.

The critical problem with this approach is that privileged TCP ports
are a finite resource.  At two per job, this creates two issues.

First, there's a hard limit on both simultaneous jobs per server
inconsistent with the hardware's capabilities.  Second, due to
TIME_WAIT, it places a hard limit on job throughput.  In practice,
these limits also interfere with each other; the more simultaneous
long jobs are running, the more impact TIME_WAIT has on short job
throughput.

While it's certainly possible to configure NFS not to require reserved
ports, the slightest possibility of a non-root user establishing a
session to the NFS server kills that as an option.

Turning down TIME_WAIT helps, though the ability to do that only on
the interface facing the NFS server would be more palatable than doing
it globally.

Adjusting net.inet.ip.portrange.lowlast does not seem to help.  The
code at sys/nfs/krpc_subr.c correctly uses ports between
IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto
and ipport_lowlastauto.  But is that the correct place to look for
NFSv4.1?

How explosive would adding SO_REUSEADDR to the NFS client be?  It's
not a full solution, but it would handle the TIME_WAIT side of the
issue.

Even so, there may be no workaround for the simultaneous mount limit
as long as reserved ports are required.  Solving the negative
interaction with nullfs seems like the only long-term fix.

What would be a good next step there?

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: forcing nfsv4 versions from the server? (was Re: Major issues with nfsv4

Rick Macklem
In reply to this post by mdtancsa
mike tancsa wrote:
[stuff snipped]
>Hi Rick,
>
>    I never knew there was such an important difference. Is there a way
>on the server side to force only v4.1 connections from the client when
>they try and v4.x mount ?
You can set the sysctl:
vfs.nfsd.server_min_minorversion4=1
if your server has it. (I can't remember what versions of FreeBSD have it.)

For Linux clients, they will usually use the highest minor version the
server supports. FreeBSD clients will use 0 unless the  "minorversion=1"
option is on the mount command.

To be honest, I have only heard of a couple of other sites having the
NFSERR_BADSEQID (10026) error problem and it sounds like J David's
problem is related to nullfs and jails.

4.0->4.1 was a minor revision in name only. RFC5661 (the NFSv4.1 one)
is over 500pages. Not a trivial update. On the other hand, 4.1->4.2 is
a minor update, made up of a bunch of additional optional features
like SEEK_HOLE/SEEK_DATA support and local copy_file_range() support
in the server.

rick


    ---Mike

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Alan Somers-2
In reply to this post by J David
On Fri, Dec 11, 2020 at 2:52 PM J David <[hidden email]> wrote:

> Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not
> resolve our issue.  But I've narrowed down the problem to a harmful
> interaction between NFSv4 and nullfs.
>
> These FreeBSD NFS clients form a pool of application servers that run
> jobs for the application.  A given job needs read-write access to its
> data and read-only access to the set of binaries it needs to run.
>
> The job data is horizontally partitioned across a set of directory
> trees spread over one set of NFS servers.  A separate set of NFS
> servers store the read-only binary roots.
>
> The jobs are assigned to these machines by a scheduler.  A job might
> take five milliseconds or five days.
>
> Historically, we have mounted the job data trees and the various
> binary roots on each application server over NFSv3.  When a job
> starts, its setup binds the needed data and binaries into a jail via
> nullfs, then runs the job in the jail.  This approach has worked
> perfectly for 10+ years.
>
> After I switched a server to NFSv4.1 to test that recommendation, it
> started having the same load problems as NFSv4.  As a test, I altered
> it to mount NFS directly in the jails for both the data and the
> binaries.  As "nullfs-NFS" jobs finished and "direct NFS" jobs
> started, the load and CPU usage started to fall dramatically.
>
> The critical problem with this approach is that privileged TCP ports
> are a finite resource.  At two per job, this creates two issues.
>
> First, there's a hard limit on both simultaneous jobs per server
> inconsistent with the hardware's capabilities.  Second, due to
> TIME_WAIT, it places a hard limit on job throughput.  In practice,
> these limits also interfere with each other; the more simultaneous
> long jobs are running, the more impact TIME_WAIT has on short job
> throughput.
>
> While it's certainly possible to configure NFS not to require reserved
> ports, the slightest possibility of a non-root user establishing a
> session to the NFS server kills that as an option.
>
> Turning down TIME_WAIT helps, though the ability to do that only on
> the interface facing the NFS server would be more palatable than doing
> it globally.
>
> Adjusting net.inet.ip.portrange.lowlast does not seem to help.  The
> code at sys/nfs/krpc_subr.c correctly uses ports between
> IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto
> and ipport_lowlastauto.  But is that the correct place to look for
> NFSv4.1?
>
> How explosive would adding SO_REUSEADDR to the NFS client be?  It's
> not a full solution, but it would handle the TIME_WAIT side of the
> issue.
>
> Even so, there may be no workaround for the simultaneous mount limit
> as long as reserved ports are required.  Solving the negative
> interaction with nullfs seems like the only long-term fix.
>
> What would be a good next step there?
>
> Thanks!
>

That's some good information.  However, it must not be the whole story.
I've been nullfs mounting my NFS mounts for years.  For example, right now
on a FreeBSD 12.2-RC2 machine:

> sudo nfsstat -m
Password:
192.168.0.2:/home on /usr/home
nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647
> mount | grep home
192.168.0.2:/home on /usr/home (nfs, nfsv4acls)
/usr/home on /iocage/jails/rustup2/root/usr/home (nullfs)

Are you using any mount options with nullfs?  It might be worth trying to
make the read-only mount into read-write, to see if that helps.  And what
does "jls -n" show?
-Alan
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Rick Macklem
In reply to this post by J David
J David wrote:
>Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not
>resolve our issue.  But I've narrowed down the problem to a harmful
>interaction between NFSv4 and nullfs.
I am afraid I know nothing about nullfs and jails. I suspect it will be
something related to when file descriptors in the NFS client mount
get closed.

The NFSv4 Open is a Windows Open lock and has nothing to do with
a POSIX open. Since only one of these can exist for each
<client process, file> tuple, the NFSv4 Close must be delayed until
all POSIX Opens on the file have been closed, including open file
descriptors inherited by children processes.

Someone else recently reported problems using nullfs and vnet jails.

>These FreeBSD NFS clients form a pool of application servers that run
>jobs for the application.  A given job needs read-write access to its
>data and read-only access to the set of binaries it needs to run.
>
>The job data is horizontally partitioned across a set of directory
>trees spread over one set of NFS servers.  A separate set of NFS
>servers store the read-only binary roots.
>
>The jobs are assigned to these machines by a scheduler.  A job might
>take five milliseconds or five days.
>
>Historically, we have mounted the job data trees and the various
>binary roots on each application server over NFSv3.  When a job
>starts, its setup binds the needed data and binaries into a jail via
>nullfs, then runs the job in the jail.  This approach has worked
>perfectly for 10+ years.
Well, NFSv3 is not going away any time soon, so if you don't need
any of the additional features it offers...

>After I switched a server to NFSv4.1 to test that recommendation, it
>started having the same load problems as NFSv4.  As a test, I altered
>it to mount NFS directly in the jails for both the data and the
>binaries.  As "nullfs-NFS" jobs finished and "direct NFS" jobs
>started, the load and CPU usage started to fall dramatically.
Good work isolating the problem. Imay try playing with NFSv4/nullfs
someday soon and see if I can break it.

>The critical problem with this approach is that privileged TCP ports
>are a finite resource.  At two per job, this creates two issues.
>
>First, there's a hard limit on both simultaneous jobs per server
>inconsistent with the hardware's capabilities.  Second, due to
>TIME_WAIT, it places a hard limit on job throughput.  In practice,
>these limits also interfere with each other; the more simultaneous
>long jobs are running, the more impact TIME_WAIT has on short job
>throughput.
>
>While it's certainly possible to configure NFS not to require reserved
>ports, the slightest possibility of a non-root user establishing a
>session to the NFS server kills that as an option.
Personally, I've never thought the reserved port# requirement provided
any real security for most situations. Unless you set "vfs.usermount=1"
only root can do the mount. For non-root to mount the NFS server
when "vfs.usermount=0", a user would have to run their own custom hacked
userland NFS client. Although doable, I have never heard of it being done.

rick

Turning down TIME_WAIT helps, though the ability to do that only on
the interface facing the NFS server would be more palatable than doing
it globally.

Adjusting net.inet.ip.portrange.lowlast does not seem to help.  The
code at sys/nfs/krpc_subr.c correctly uses ports between
IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto
and ipport_lowlastauto.  But is that the correct place to look for
NFSv4.1?

How explosive would adding SO_REUSEADDR to the NFS client be?  It's
not a full solution, but it would handle the TIME_WAIT side of the
issue.

Even so, there may be no workaround for the simultaneous mount limit
as long as reserved ports are required.  Solving the negative
interaction with nullfs seems like the only long-term fix.

What would be a good next step there?

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Alan Somers-2
On Fri, Dec 11, 2020 at 4:28 PM Rick Macklem <[hidden email]> wrote:

> J David wrote:
> >Unfortunately, switching the FreeBSD NFS clients to NFSv4.1 did not
> >resolve our issue.  But I've narrowed down the problem to a harmful
> >interaction between NFSv4 and nullfs.
> I am afraid I know nothing about nullfs and jails. I suspect it will be
> something related to when file descriptors in the NFS client mount
> get closed.
>
> The NFSv4 Open is a Windows Open lock and has nothing to do with
> a POSIX open. Since only one of these can exist for each
> <client process, file> tuple, the NFSv4 Close must be delayed until
> all POSIX Opens on the file have been closed, including open file
> descriptors inherited by children processes.
>

Does it make a difference whether the files are opened read-only or
read-write?  My longstanding practice has been to never use NFS to store
object files while compiling.  I do that for performance reasons, and I
didn't think that nullfs had anything to do with it (but maybe it does).


>
> Someone else recently reported problems using nullfs and vnet jails.
>
> >These FreeBSD NFS clients form a pool of application servers that run
> >jobs for the application.  A given job needs read-write access to its
> >data and read-only access to the set of binaries it needs to run.
> >
> >The job data is horizontally partitioned across a set of directory
> >trees spread over one set of NFS servers.  A separate set of NFS
> >servers store the read-only binary roots.
> >
> >The jobs are assigned to these machines by a scheduler.  A job might
> >take five milliseconds or five days.
> >
> >Historically, we have mounted the job data trees and the various
> >binary roots on each application server over NFSv3.  When a job
> >starts, its setup binds the needed data and binaries into a jail via
> >nullfs, then runs the job in the jail.  This approach has worked
> >perfectly for 10+ years.
> Well, NFSv3 is not going away any time soon, so if you don't need
> any of the additional features it offers...
>
> >After I switched a server to NFSv4.1 to test that recommendation, it
> >started having the same load problems as NFSv4.  As a test, I altered
> >it to mount NFS directly in the jails for both the data and the
> >binaries.  As "nullfs-NFS" jobs finished and "direct NFS" jobs
> >started, the load and CPU usage started to fall dramatically.
> Good work isolating the problem. Imay try playing with NFSv4/nullfs
> someday soon and see if I can break it.
>
> >The critical problem with this approach is that privileged TCP ports
> >are a finite resource.  At two per job, this creates two issues.
> >
> >First, there's a hard limit on both simultaneous jobs per server
> >inconsistent with the hardware's capabilities.  Second, due to
> >TIME_WAIT, it places a hard limit on job throughput.  In practice,
> >these limits also interfere with each other; the more simultaneous
> >long jobs are running, the more impact TIME_WAIT has on short job
> >throughput.
> >
> >While it's certainly possible to configure NFS not to require reserved
> >ports, the slightest possibility of a non-root user establishing a
> >session to the NFS server kills that as an option.
> Personally, I've never thought the reserved port# requirement provided
> any real security for most situations. Unless you set "vfs.usermount=1"
> only root can do the mount. For non-root to mount the NFS server
> when "vfs.usermount=0", a user would have to run their own custom hacked
> userland NFS client. Although doable, I have never heard of it being done.
>

There are a few out there.  For example, https://github.com/sahlberg/libnfs
.


>
> rick
>
> Turning down TIME_WAIT helps, though the ability to do that only on
> the interface facing the NFS server would be more palatable than doing
> it globally.
>
> Adjusting net.inet.ip.portrange.lowlast does not seem to help.  The
> code at sys/nfs/krpc_subr.c correctly uses ports between
> IPPORT_RESERVED and IPPORT_RESERVED/2 instead of ipport_lowfirstauto
> and ipport_lowlastauto.  But is that the correct place to look for
> NFSv4.1?
>
> How explosive would adding SO_REUSEADDR to the NFS client be?  It's
> not a full solution, but it would handle the TIME_WAIT side of the
> issue.
>
> Even so, there may be no workaround for the simultaneous mount limit
> as long as reserved ports are required.  Solving the negative
> interaction with nullfs seems like the only long-term fix.
>
> What would be a good next step there?
>
> Thanks!
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Rick Macklem
In reply to this post by Alan Somers-2
Alan Somers wrote:
[stuff snipped]
>That's some good information.  However, it must not be the whole story.  I've >been nullfs mounting my NFS mounts for years.  For example, right now on a >FreeBSD 12.2-RC2 machine:
If I recall, you were one of the two people that needed to switch to
"minorversion=1" to get rid of NFSERR_BADSEQID (10026) errors.
Is that correct?

>> sudo nfsstat -m
>Password:
>192.168.0.2:/home on /usr/home
>nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acreg>min=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=6553>6,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=>2147483647
Btw, using "soft" with NFSv4 mounts is a bad idea. (See the BUGS section of
"man mount_nfs".)

If you have a hung NFSv4 mount, you can use
# umount -N /usr/home
to dismount it. (It may take a couple of minutes.)

rick

> mount | grep home
192.168.0.2:/home on /usr/home (nfs, nfsv4acls)
/usr/home on /iocage/jails/rustup2/root/usr/home (nullfs)

Are you using any mount options with nullfs?  It might be worth trying to make the read-only mount into read-write, to see if that helps.  And what does "jls -n" show?
-Alan
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Alan Somers-2
On Fri, Dec 11, 2020 at 4:39 PM Rick Macklem <[hidden email]> wrote:

> Alan Somers wrote:
> [stuff snipped]
> >That's some good information.  However, it must not be the whole story.
> I've >been nullfs mounting my NFS mounts for years.  For example, right now
> on a >FreeBSD 12.2-RC2 machine:
> If I recall, you were one of the two people that needed to switch to
> "minorversion=1" to get rid of NFSERR_BADSEQID (10026) errors.
> Is that correct?
>

In fact, yes.  Though that case had nothing to do with nullfs or jails.


>
> >> sudo nfsstat -m
> >Password:
> >192.168.0.2:/home on /usr/home
>
> >nfsv4,minorversion=1,tcp,resvport,soft,cto,sec=sys,acdirmin=3,acdirmax=60,acreg>min=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=6553>6,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=>2147483647
> Btw, using "soft" with NFSv4 mounts is a bad idea. (See the BUGS section of
> "man mount_nfs".)
>

Grahh.  I forgot that was in there.  I can't remember why I put that
there.  These days I agree with you, and advise other people to use hard
mounts, too.  Thanks for point it out.


>
> If you have a hung NFSv4 mount, you can use
> # umount -N /usr/home
> to dismount it. (It may take a couple of minutes.)
>
> rick
>
> > mount | grep home
> 192.168.0.2:/home on /usr/home (nfs, nfsv4acls)
> /usr/home on /iocage/jails/rustup2/root/usr/home (nullfs)
>
> Are you using any mount options with nullfs?  It might be worth trying to
> make the read-only mount into read-write, to see if that helps.  And what
> does "jls -n" show?
> -Alan
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Rick Macklem
In reply to this post by J David
J David wrote:
[lots of stuff snipped]
>Even so, there may be no workaround for the simultaneous mount limit
>as long as reserved ports are required.  Solving the negative
>interaction with nullfs seems like the only long-term fix.
>
>What would be a good next step there?
Well, if you have a test system you can break, doing
# nfsstat -c -E
once it is constipated could be useful.

Look for the numbers under
OpenOwner   Opens  LockOwner ...
and see if any of them are getting very large.

rick

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

Konstantin Belousov
In reply to this post by J David
On Fri, Dec 11, 2020 at 03:30:29PM -0500, J David wrote:

> On Thu, Dec 10, 2020 at 1:20 PM Konstantin Belousov <[hidden email]> wrote:
> > E means exiting process.  Is it multithreaded ?
> > Show procstat -kk -p <pid> output for it.
>
> To answer this separately, procstat -kk of an exiting process
> generating huge volumes of getattr requests produces nothing but the
> headers:
>
> # ps Haxlww | fgrep DNE
>      0 21281 18549  1  20  0  11196  2560 piperd   S+     1
> 0:00.00 fgrep DNE
> 125428  9661     1  0  36 15      0    16 nfsreq   DNE+J   3-
> 3:22.54 job_exec
> # proctstat -kk 9661
>   PID    TID COMM                TDNAME              KSTACK
>
> This happened while retesting on NFSv4.1.  Although I don't know if
> the process was originally multithreaded, it appears it wasn't even
> single-threaded by the time it got into this state.

Ok, do 'procstat -kk -a' instead.  Exiting processes are not excluded from
the kstack sysctl, might be you just raced with termination.

Or, if you have serial console, enter ddb, then do 'bt <pid>'.

Or if you have kernel built with symbols,
# kgdb /boot/kernel/kernel /dev/mem
(gdb) proc <pid>
(gdb) bt
but this has low chances of work for running process.

procstat -kk -a output might be the most informative anyway.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
On Fri, Dec 11, 2020 at 8:09 PM Konstantin Belousov <[hidden email]> wrote:
> Ok, do 'procstat -kk -a' instead.  Exiting processes are not excluded from
> the kstack sysctl, might be you just raced with termination.

No, it's not a race.  When this is occurring, processes sit in
"exiting" for several minutes like that, doing (apparently) nothing.

What's weird is that I was able to unmount the nullfs mount, but not
the NFS mount, even though the process would have had to access the
NFS mount through the nullfs mount.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
In reply to this post by Rick Macklem
On Fri, Dec 11, 2020 at 6:28 PM Rick Macklem <[hidden email]> wrote:
> I am afraid I know nothing about nullfs and jails. I suspect it will be
> something related to when file descriptors in the NFS client mount
> get closed.

What does NFSv4 do differently than NFSv3 that might upset a low-level
consumer like nullfs?

> Well, NFSv3 is not going away any time soon, so if you don't need
> any of the additional features it offers...

If we did not want the additional features, we definitely would not be
attempting this.

> a user would have to run their own custom hacked
> userland NFS client. Although doable, I have never heard of it being done.

Alex beat me to libnfs.

What about this as a stopgap measure?

> How explosive would adding SO_REUSEADDR to the NFS client be?  It's
> not a full solution, but it would handle the TIME_WAIT side of the
> issue.

The kernel NFS networking code is confusing to me.  I can't even
figure out where/how NFSv4 binds a client socket to know if it's
possible.  (Pretty sure the code in sys/nfs/krpc_subr.c is not it.)

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Major issues with nfsv4

J David
In reply to this post by Alan Somers-2
On Fri, Dec 11, 2020 at 6:08 PM Alan Somers <[hidden email]> wrote:
> That's some good information.  However, it must not be the whole story.

Indeed not.  If it were, this would happen instantly every time.
There must be some sort of trigger.  But there are a lot of jobs that
run and I didn't write any of them.  So the search space is large.

> Are you using any mount options with nullfs?

nosuid and, on half the mounts, ro.

> It might be worth trying to make the read-only mount into read-write, to see if that helps.

It won't; the read-only mounts they are exported read-only on the
server side.  And no one is going to sign off on changing that, not
even for a minute.

> And what does "jls -n" show?

Here is an example, newlines added for readability:

devfs_ruleset=0
nodying
enforce_statfs=2
host=new
ip4=disable
ip6=disable
jid=1020
linux=new
name=job-1020
osreldate=1202000
osrelease=12.2-RELEASE
parent=0
path=/job/roots/job-1020
persist
securelevel=-1
sysvmsg=inherit
sysvsem=inherit
sysvshm=inherit
vnet=inherit
allow.nochflags
allow.nomlock
allow.nomount
allow.mount.nodevfs
allow.mount.nofdescfs
allow.mount.nofusefs
allow.mount.nonullfs
allow.mount.noprocfs
allow.mount.notmpfs
allow.noquotas
allow.noraw_sockets
allow.noread_msgbuf
allow.reserved_ports
allow.set_hostname
allow.nosocket_af
allow.sysvipc
children.cur=0
children.max=0
cpuset.id=87
host.domainname=/""}""
host.hostid=0
host.hostname=job1020.local
host.hostuuid=00000000-0000-0000-0000-000000000000
ip4.addr=10.0.3.252
ip4.saddrsel
ip6.addr=2001:db8::1
ip6.saddrsel
linux.osname=Linux
linux.osrelease=3.2.0
linux.oss_version=198144

Seems like the next step is to find a reproduction that doesn't
involve people calling me asking angry questions about why things are
broken again.

Thanks!
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "[hidden email]"
123