Re: ZFS, NFS and Network tuning

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: ZFS, NFS and Network tuning

Brent Jones-3
I'm reviving this, as I too am seeing something eerily similar. I have
made my own thread under freebsd-stable, so I will hopefully move that
discussion to this list.

I believe we are seeing performance problems when the FreeBSD NFS
client issues FSYNC NFS instead of ASYNC, sending performance to a
mere percentage of what disks and network links are capable of.
Further testing tonight demonstrates that other NFSv3 and v4 clients
do not issue FSYNC unless they modify attributed and close a file, or
append and close a file.
FreeBSD NFS client will issue FSYNCs anytime the write size (-w) is
reached, instead of when just closing the file.
This is not necessary, since NFSv3 and v4 TCP have provisions for safe
async writes that 'guarantee' state of NFS writes.

Here is the contents of what I wrote there verbatim:

http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/048063.html

-------


Hello FreeBSD users,
I am running into some performance problems with NFSv3/v4 mounts.
I have a Sun X4540 running OpenSolaris 2008.11 with ZFS exporting NFS shares
The NFS clients are a FreeBSD 6.3 32 bit, quad core xeon with 4GB ram
and a FreeBSD 7.1 32bit with same hardware.

The issue I am seeing, is that for certain file types, the FreeBSD NFS
client will either issue an ASYNC write, or an FSYNC.
However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP
versions of the protocol, so that should be the default.
Issuing FSYNC's for every compete block transmitted adds substantial
overhead and slows everything down.

The two test files I have that can reproduce this data are a file
created by 'dump' which is just binary data:

$ file testbinery
testbinery: data

ASCII text file from a Maildir format:

$ file ascittest
ascittest: ASCII mail text

My NFS mount command lines I have tried to get all data to ASYNC write:

$ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
$ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
$ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/

Here is an excerpt from a snoop from the binary data file:

$ snoop rpc nfs

obsmtp02.local -> pdxfilu01    NFS C ACCESS3 FH=57D3
(read,lookup,modify,extend,delete,execute)
   pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend)
obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 testbinery
   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 OK FH=57D3
obsmtp02.local -> pdxfilu01    NFS C ACCESS3 FH=57D3
(read,lookup,modify,extend,delete,execute)
   pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend)
obsmtp02.local -> pdxfilu01    NFS C SETATTR3 FH=57D3
   pdxfilu01 -> obsmtp02.local NFS R SETATTR3 OK
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 0 for 32768 (ASYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 582647808 for
32768 (ASYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 592871424 for
32768 (ASYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 605421568 for
32768 (ASYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)


And on and on.. it will acheive near full wire-speed, about 110MB/sec
during the copy


Here is the same snoop, only copying the ASCII mail file:

$ snoop rpc nfs

   obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 ascittest
   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory
obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 ascittest
   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory
obsmtp02.local -> pdxfilu01    NFS C CREATE3 FH=BB85 (UNCHECKED) ascittest
   pdxfilu01 -> obsmtp02.local NFS R CREATE3 OK FH=69D3
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 0 for 32768 (FSYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 32768 for 32768 (FSYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)
obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 65536 for 32768 (FSYNC)
   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)


And so on. I've reproduced this with several files, and the only
difference between tests is the file type.
Is the FreeBSD NFS client requesting FSYNC or ASYNC depending on the
file type/contents?
If so, is there a tuneable setting to make all write ASYNC?
Otherwise, FSYNC'ing for every block written over NFS will cause so
many IOPS on the NFS server, that performance will degrade severely.

Testing with an OpenSolaris 2008.11 client will issue ASYNC writes for
any file type, if mounted with NFSv3 of NFSv4 (TCP).

Any ideas?

Thanks in advance!




--
Brent Jones
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ZFS, NFS and Network tuning

Brent Jones-3
On Wed, Jan 28, 2009 at 11:21 PM, Brent Jones <[hidden email]> wrote:

> I'm reviving this, as I too am seeing something eerily similar. I have
> made my own thread under freebsd-stable, so I will hopefully move that
> discussion to this list.
>
> I believe we are seeing performance problems when the FreeBSD NFS
> client issues FSYNC NFS instead of ASYNC, sending performance to a
> mere percentage of what disks and network links are capable of.
> Further testing tonight demonstrates that other NFSv3 and v4 clients
> do not issue FSYNC unless they modify attributed and close a file, or
> append and close a file.
> FreeBSD NFS client will issue FSYNCs anytime the write size (-w) is
> reached, instead of when just closing the file.
> This is not necessary, since NFSv3 and v4 TCP have provisions for safe
> async writes that 'guarantee' state of NFS writes.
>
> Here is the contents of what I wrote there verbatim:
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2009-January/048063.html
>
> -------
>
>
> Hello FreeBSD users,
> I am running into some performance problems with NFSv3/v4 mounts.
> I have a Sun X4540 running OpenSolaris 2008.11 with ZFS exporting NFS shares
> The NFS clients are a FreeBSD 6.3 32 bit, quad core xeon with 4GB ram
> and a FreeBSD 7.1 32bit with same hardware.
>
> The issue I am seeing, is that for certain file types, the FreeBSD NFS
> client will either issue an ASYNC write, or an FSYNC.
> However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP
> versions of the protocol, so that should be the default.
> Issuing FSYNC's for every compete block transmitted adds substantial
> overhead and slows everything down.
>
> The two test files I have that can reproduce this data are a file
> created by 'dump' which is just binary data:
>
> $ file testbinery
> testbinery: data
>
> ASCII text file from a Maildir format:
>
> $ file ascittest
> ascittest: ASCII mail text
>
> My NFS mount command lines I have tried to get all data to ASYNC write:
>
> $ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
> $ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
> $ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
>
> Here is an excerpt from a snoop from the binary data file:
>
> $ snoop rpc nfs
>
> obsmtp02.local -> pdxfilu01    NFS C ACCESS3 FH=57D3
> (read,lookup,modify,extend,delete,execute)
>   pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend)
> obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 testbinery
>   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 OK FH=57D3
> obsmtp02.local -> pdxfilu01    NFS C ACCESS3 FH=57D3
> (read,lookup,modify,extend,delete,execute)
>   pdxfilu01 -> obsmtp02.local NFS R ACCESS3 OK (read,modify,extend)
> obsmtp02.local -> pdxfilu01    NFS C SETATTR3 FH=57D3
>   pdxfilu01 -> obsmtp02.local NFS R SETATTR3 OK
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 0 for 32768 (ASYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 582647808 for
> 32768 (ASYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 592871424 for
> 32768 (ASYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=57D3 at 605421568 for
> 32768 (ASYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (ASYNC)
>
>
> And on and on.. it will acheive near full wire-speed, about 110MB/sec
> during the copy
>
>
> Here is the same snoop, only copying the ASCII mail file:
>
> $ snoop rpc nfs
>
>   obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 ascittest
>   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory
> obsmtp02.local -> pdxfilu01    NFS C LOOKUP3 FH=BB85 ascittest
>   pdxfilu01 -> obsmtp02.local NFS R LOOKUP3 No such file or directory
> obsmtp02.local -> pdxfilu01    NFS C CREATE3 FH=BB85 (UNCHECKED) ascittest
>   pdxfilu01 -> obsmtp02.local NFS R CREATE3 OK FH=69D3
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 0 for 32768 (FSYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 32768 for 32768 (FSYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)
> obsmtp02.local -> pdxfilu01    NFS C WRITE3 FH=69D3 at 65536 for 32768 (FSYNC)
>   pdxfilu01 -> obsmtp02.local NFS R WRITE3 OK 32768 (FSYNC)
>
>
> And so on. I've reproduced this with several files, and the only
> difference between tests is the file type.
> Is the FreeBSD NFS client requesting FSYNC or ASYNC depending on the
> file type/contents?
> If so, is there a tuneable setting to make all write ASYNC?
> Otherwise, FSYNC'ing for every block written over NFS will cause so
> many IOPS on the NFS server, that performance will degrade severely.
>
> Testing with an OpenSolaris 2008.11 client will issue ASYNC writes for
> any file type, if mounted with NFSv3 of NFSv4 (TCP).
>
> Any ideas?
>
> Thanks in advance!
>
>
>
>
> --
> Brent Jones
> [hidden email]
>

I have found a 4 year old bug, which may be related to this. cp uses
mmap for small files (and I imagine lots of things use mmap for file
operations) and causes slowdowns via NFS, due to the fsync data
provided above.

http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/87792

That bugid accurately describes the issue, is there any way to attach
more 'interested parties' or additional details to that bug?


--
Brent Jones
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: ZFS, NFS and Network tuning

Bruce Evans-4
On Thu, 29 Jan 2009, Brent Jones wrote:

> On Wed, Jan 28, 2009 at 11:21 PM, Brent Jones <[hidden email]> wrote:

>> ...
>> The issue I am seeing, is that for certain file types, the FreeBSD NFS
>> client will either issue an ASYNC write, or an FSYNC.
>> However, NFSv3 and v4 both support "safe" ASYNC writes in the TCP
>> versions of the protocol, so that should be the default.
>> Issuing FSYNC's for every compete block transmitted adds substantial
>> overhead and slows everything down.

I use some patches (mainly for nfs write clustering on the server) by
Bjorn Gronwall and some local fixes (mainly for vfs write clustering
on the server, and tuning off excessive nfs[io]d daemons which get in
each other's way due to poor scheduling, and things that only help for
lots of small files), and see reasonable performance in all cases (~90%
of disk bandwidth with all-async mounts, and half that with the client
mounted noasync on an old version of FreeBSD.  The client in -current
is faster.)  Writing is actually faster than reading here.

>> ...
>> My NFS mount command lines I have tried to get all data to ASYNC write:
>>
>> $ mount_nfs -3T -o async 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
>> $ mount_nfs -3T 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/
>> $ mount_nfs -4TL 192.168.0.19:/pdxfilu01/obsmtp /mnt/obsmtp/

Also try -r16384 -w16384, and udp, and async on the server.  I think
block sizes default to 8K for udp and 32K for tcp.  8K is too small,
and 32K may be too large (it increases latency for little benefit
if the server fs block size is 16K).  udp gives lower latency.  async
on the server makes little difference provided the server block size
is not too small.

> I have found a 4 year old bug, which may be related to this. cp uses
> mmap for small files (and I imagine lots of things use mmap for file
> operations) and causes slowdowns via NFS, due to the fsync data
> provided above.
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/87792

mmap apparently breaks the async mount preference in the following code:
from vnode_pager.c:

% /*
% * pageouts are already clustered, use IO_ASYNC t o force a bawrite()
% * rather then a bdwrite() to prevent paging I/O from saturating
% * the buffer cache.  Dummy-up the sequential heuristic to cause
% * large ranges to cluster.  If neither IO_SYNC or IO_ASYNC is set,
% * the system decides how to cluster.
% */
% ioflags = IO_VMIO;
% if (flags & (VM_PAGER_PUT_SYNC | VM_PAGER_PUT_INVAL))
% ioflags |= IO_SYNC;

This apparently gives lots of sync writes.  (Sync writes are the default for
nfs, but we mount with async to try to get async writes.)

% else if ((flags & VM_PAGER_CLUSTER_OK) == 0)
% ioflags |= IO_ASYNC;

nfs doesn't even support this flag.  In fact, ffs is the only file
system that supports it, and here is the only place that sets it.  This
might explain some slowness.

One of the bugs in vfs clustering that I don't have is related to this.
IIRC, mounting the server with -o async doesn't work as well as it
should because the buffer cache becomes congested with i/o that should
have been sent to the disk.  Some writes must be done async as explained
above, but one place in vfs_cache.c is too agressive in delaying async
writes for file systems that are mounted async.  This problem is more
noticeable for nfs, at least with networks not much faster than disks,
since it results in the client and server taking turns waiting for
each other.  (The names here are very confusing -- the async mount
flag normally delays both sync and async writes for as long as possible,
except for nfs it doesn't affect delays but asks for async writes
instead of sync writes on the server, while the IO_ASYNC flag asks for
async writes and thus often has the opposite sense to the async mount
flag.)

% ioflags |= (flags & VM_PAGER_PUT_INVAL) ? IO_INVAL: 0;
% ioflags |= IO_SEQMAX << IO_SEQSHIFT;

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"