cmp(1) has a bottleneck, but where?

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

cmp(1) has a bottleneck, but where?

Dieter BSD
Task: cp(1) a several-GB file from one drive to another,
then run cmp(1) to verify.  Cp runs as expected, but
cmp runs slower than expected.  Neither the disks
nor the cpu is maxed out.  Local drives, no network
involved.  Machine is otherwise idle.

FreeBSD 8.2
amd64
4 GiB main memory
FFS soft updates
SATA 2TB drives
Doesn't matter which drives/controllers, the example below uses
3132 (siis) with 3726 PM, and JMB363 (ahci), both on PCIe x1 cards

Since this ML seems obscessed with the scheduler:
kern.sched.preemption: 1
kern.sched.idlespinthresh: 4
kern.sched.idlespins: 10000
kern.sched.static_boost: 160
kern.sched.preempt_thresh: 64
kern.sched.interact: 30
kern.sched.slice: 13
kern.sched.name: ULE

cp:

extended device statistics                                            
device    r/s    w/s     kr/s     kw/s wait  svc_t  %b  controller    
  ada3  610.9    0.0  77968.6      0.0    1    1.0  64  siisch0_pm3 NCQ
 ada12    0.5  607.4      8.0  77736.7    6    4.4  60  ahcich1     NCQ


    6 users    Load  0.54  0.46  0.33                  Jan  2 13:16

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  276728   42264  3615536   100420  117572  count
All  335744   43452 1077460k   135596          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow    3921 total
  1         153      8076  191 1277 1920 1341             zfod        atkbd0 1
                                                          ozfod       uart0 irq4
32.7%Sys   3.4%Intr  0.7%User  0.0%Nice 63.1%Idle        %ozfod    66 psm0 irq12
|    |    |    |    |    |    |    |    |    |    |       daefr       cx23880mpe
================++                                        prcfr   619 siis0 puc0
                                       120 dtbuf          totfr  1235 cx23880mpe
Namei     Name-cache   Dir-cache    135408 desvn          react     1 fwohci1 bg
   Calls    hits   %    hits   %     51442 numvn        2 pdwak       ohci0+ 21
      25      25 100                 33843 frevn    36283 pdpgs       ehci0+ 22
                                                          intrn       pcm0 nfe0
Disks  ada3 ada12                                  322804 wire   2000 cpu0: time
KB/t    128   127                                 1441764 act
tps     619   620                                 1670132 inact
MB/s  77.07 77.01                                  107684 cache
%busy    65    63                                    9888 free
                                                   380800 buf

cmp:

extended device statistics                                            
device    r/s    w/s     kr/s     kw/s wait  svc_t  %b  controller    
  ada3  705.2    0.0  45042.3      0.0    0    0.6  40  siisch0_pm3 NCQ
 ada12  706.2    0.0  45042.3      0.0    0    0.5  33  ahcich1     NCQ


    6 users    Load  0.00  0.07  0.17                  Jan  2 13:32

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act 1970756   45312 37243128   100420  145224  count  1399            
All 2029948   46696 1111086k   135596          pages 22348
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow    4187 total
  2   1     153       10k 7161 1312 2187 1524 7024        zfod        atkbd0 1
                                                          ozfod       uart0 irq4
10.6%Sys   2.5%Intr 19.3%User  0.0%Nice 67.5%Idle        %ozfod    88 psm0 irq12
|    |    |    |    |    |    |    |    |    |    |       daefr       cx23880mpe
=====++>>>>>>>>>                                          prcfr   700 siis0 puc0
                                        11 dtbuf     1321 totfr  1399 cx23880mpe
Namei     Name-cache   Dir-cache    135408 desvn          react       fwohci1 bg
   Calls    hits   %    hits   %     51442 numvn          pdwak       ohci0+ 21
                                     33639 frevn          pdpgs       ehci0+ 22
                                                          intrn       pcm0 nfe0
Disks  ada3 ada12                                  312052 wire   2000 cpu0: time
KB/t  63.88 63.88                                 1764068 act
tps     700   700                                 1330864 inact
MB/s  43.65 43.65                                  134912 cache
%busy    38    33                                   10312 free
                                                   380800 buf

The disks can read much faster:
cat file > /dev/null:

extended device statistics                                            
device    r/s    w/s     kr/s     kw/s wait  svc_t  %b  controller    
  ada3  781.2    0.0  99663.3      0.0    0    1.1  83  siisch0_pm3 NCQ
 ada12  712.3    0.0  90642.3      0.0    1    1.1  78  ahcich1     NCQ

    6 users    Load  0.16  0.14  0.20                  Jan  2 13:49

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  276604   41864  3630548   100420  191588  count
All  335608   43064 1077474k   135596          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        cow    4270 total
  3         153       12k 1599  95k 2271 1582             zfod        atkbd0 1
                                                          ozfod       uart0 irq4
37.2%Sys   3.7%Intr  0.9%User  0.0%Nice 58.2%Idle        %ozfod    82 psm0 irq12
|    |    |    |    |    |    |    |    |    |    |       daefr       cx23880mpe
===================+>                                     prcfr   757 siis0 puc0
                                         4 dtbuf          totfr  1431 cx23880mpe
Namei     Name-cache   Dir-cache    135408 desvn          react       fwohci1 bg
   Calls    hits   %    hits   %     51441 numvn        2 pdwak       ohci0+ 21
      25      25 100                 33826 frevn    49879 pdpgs       ehci0+ 22
                                                          intrn       pcm0 nfe0
Disks  ada3 ada12                                  305216 wire   2000 cpu0: time
KB/t    128   128                                 1442164 act
tps     757   716                                 1613304 inact
MB/s  94.31 89.20                                  181700 cache
%busy    80    79                                    9888 free
                                                   380800 buf

So reading from the disks isn't the bottleneck, and systat reports that the
cpu is 67% idle so the cpu isn't the bottleneck, I'm wondering what *is* cmp's
bottleneck?  What else is there that wouldn't show up as one or the other?
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Garrett Cooper
On Mon, Jan 2, 2012 at 2:19 PM, Dieter BSD <[hidden email]> wrote:
> Task: cp(1) a several-GB file from one drive to another,
> then run cmp(1) to verify.  Cp runs as expected, but
> cmp runs slower than expected.  Neither the disks
> nor the cpu is maxed out.  Local drives, no network
> involved.  Machine is otherwise idle.

    1. How are you running cmp?
    2. Why do you claim cmp is the bottleneck? Is it spinning the CPU?
Thanks,
-Garrett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Dieter BSD
In reply to this post by Dieter BSD
>> Task: cp(1) a several-GB file from one drive to another,
>> then run cmp(1) to verify.  Cp runs as expected, but
>> cmp runs slower than expected.  Neither the disks
>> nor the cpu is maxed out.  Local drives, no network
>> involved.  Machine is otherwise idle.
>
>    1. How are you running cmp?
>    2. Why do you claim cmp is the bottleneck? Is it spinning the CPU?

cmp big_file /other_disk/big_file

Cmp is running slower than it should.  It isn't cpu bound ( 67.5%Idle )
but it isn't disk bound either.  Seems like it should be one or the
other.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Garrett Cooper
On Mon, Jan 2, 2012 at 5:29 PM, Dieter BSD <[hidden email]> wrote:

>>> Task: cp(1) a several-GB file from one drive to another,
>>> then run cmp(1) to verify.  Cp runs as expected, but
>>> cmp runs slower than expected.  Neither the disks
>>> nor the cpu is maxed out.  Local drives, no network
>>> involved.  Machine is otherwise idle.
>>
>>    1. How are you running cmp?
>>    2. Why do you claim cmp is the bottleneck? Is it spinning the CPU?
>
> cmp big_file /other_disk/big_file
>
> Cmp is running slower than it should.  It isn't cpu bound ( 67.5%Idle )
> but it isn't disk bound either.  Seems like it should be one or the
> other.

    What gets output on the console when you do CTRL-T?
Thanks,
-Garrett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Dieter BSD
In reply to this post by Dieter BSD
>>>> Task: cp(1) a several-GB file from one drive to another,
>>>> then run cmp(1) to verify. Cp runs as expected, but
>>>> cmp runs slower than expected. Neither the disks
>>>> nor the cpu is maxed out. Local drives, no network
>>>> involved. Machine is otherwise idle.
>>>
>>> 1. How are you running cmp?
>>> 2. Why do you claim cmp is the bottleneck? Is it spinning the CPU?
>>
>> cmp big_file /other_disk/big_file
>>
>> Cmp is running slower than it should. It isn't cpu bound ( 67.5%Idle )
>> but it isn't disk bound either. Seems like it should be one or the
>> other.
>
>    What gets output on the console when you do CTRL-T?

load: 0.59  cmd: cmp 93304 [vnread] 56.99r 8.50u 3.80s 23% 720k
load: 0.59  cmd: cmp 93304 [vnread] 57.68r 8.60u 3.85s 22% 720k
load: 0.54  cmd: cmp 93304 [vnread] 60.69r 9.03u 4.12s 22% 780k
load: 0.54  cmd: cmp 93304 [runnable] 63.79r 9.58u 4.26s 22% 720k
load: 0.58  cmd: cmp 93304 [runnable] 68.33r 10.28u 4.62s 21% 788k
load: 0.53  cmd: cmp 93304 [runnable] 71.92r 10.78u 4.94s 23% 720k
load: 0.53  cmd: cmp 93304 [vnread] 72.31r 10.84u 4.96s 21% 780k
load: 0.44  cmd: cmp 93304 [vnread] 198.84r 30.64u 14.36s 23% 720k
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Garrett Cooper
On Mon, Jan 2, 2012 at 11:37 PM, Dieter BSD <[hidden email]> wrote:

>>>>> Task: cp(1) a several-GB file from one drive to another,
>>>>> then run cmp(1) to verify. Cp runs as expected, but
>>>>> cmp runs slower than expected. Neither the disks
>>>>> nor the cpu is maxed out. Local drives, no network
>>>>> involved. Machine is otherwise idle.
>>>>
>>>> 1. How are you running cmp?
>>>> 2. Why do you claim cmp is the bottleneck? Is it spinning the CPU?
>>>
>>> cmp big_file /other_disk/big_file
>>>
>>> Cmp is running slower than it should. It isn't cpu bound ( 67.5%Idle )
>>> but it isn't disk bound either. Seems like it should be one or the
>>> other.
>>
>>    What gets output on the console when you do CTRL-T?
>
> load: 0.59  cmd: cmp 93304 [vnread] 56.99r 8.50u 3.80s 23% 720k
> load: 0.59  cmd: cmp 93304 [vnread] 57.68r 8.60u 3.85s 22% 720k
> load: 0.54  cmd: cmp 93304 [vnread] 60.69r 9.03u 4.12s 22% 780k
> load: 0.54  cmd: cmp 93304 [runnable] 63.79r 9.58u 4.26s 22% 720k
> load: 0.58  cmd: cmp 93304 [runnable] 68.33r 10.28u 4.62s 21% 788k
> load: 0.53  cmd: cmp 93304 [runnable] 71.92r 10.78u 4.94s 23% 720k
> load: 0.53  cmd: cmp 93304 [vnread] 72.31r 10.84u 4.96s 21% 780k
> load: 0.44  cmd: cmp 93304 [vnread] 198.84r 30.64u 14.36s 23% 720k

    Here's a pastebin to the gprof output for cmp of two almost
identical files (I added a byte at the end of the file):
http://pastebin.com/Rw355d8G .
    Here's the time output of the process:

$ /usr/bin/time -l /usr/obj/usr/src/usr.bin/cmp/cmp /scratch/foo.iso*
       99.48 real        27.35 user        33.32 sys
      5820  maximum resident set size
       251  average shared memory size
      2083  average unshared data size
       127  average unshared stack size
      1310  page reclaims
   1569083  page faults
         0  swaps
     49325  block input operations
        88  block output operations
         0  messages sent
         0  messages received
         0  signals received
       396  voluntary context switches
     13514  involuntary context switches
$ uname -a
FreeBSD streetfighter.ixsystems.com 10.0-CURRENT FreeBSD 10.0-CURRENT
#0 r227801: Mon Nov 21 14:04:39 PST 2011
[hidden email]:/usr/obj/usr/src/sys/STREETFIGHTER
amd64

    The file is 3.0GB in size. Look at all those page faults though!
Thanks!
-Garrett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Marc Olzheim-2
On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote:
>     The file is 3.0GB in size. Look at all those page faults though!
> Thanks!
> -Garrett

From usr.bin/cmp/c_regular.c:

#define MMAP_CHUNK (8*1024*1024)
...
for (..) {
        mmap() chunk of size MMAP_CHUNK.
        compare
        munmap()k
}

That 8 MB chunk size sounds like a bad plan to me. I can imagine
something needed to be done to compare files larger than X GB on a 32bit
system, but 8MB is pretty small...

Marc

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Adrian Chadd-2
On 3 January 2012 00:34, Marc Olzheim <[hidden email]> wrote:

> On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote:
>>     The file is 3.0GB in size. Look at all those page faults though!
>> Thanks!
>> -Garrett
>
> From usr.bin/cmp/c_regular.c:
>
> #define MMAP_CHUNK (8*1024*1024)
> ...
> for (..) {
>        mmap() chunk of size MMAP_CHUNK.
>        compare
>        munmap()k
> }
>
> That 8 MB chunk size sounds like a bad plan to me. I can imagine
> something needed to be done to compare files larger than X GB on a 32bit
> system, but 8MB is pretty small...

Er, hint: look at the average IO size in the cmp versus cp stats above?

Something/somehow it's issuing smaller IOs when using mmap?


Adrian
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Bruce Evans-4
In reply to this post by Marc Olzheim-2
On Tue, 3 Jan 2012, Marc Olzheim wrote:

> On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote:
>>     The file is 3.0GB in size. Look at all those page faults though!
>> Thanks!
>> -Garrett
>
> From usr.bin/cmp/c_regular.c:
>
> #define MMAP_CHUNK (8*1024*1024)
> ...
> for (..) {
> mmap() chunk of size MMAP_CHUNK.
> compare
> munmap()k
> }
>
> That 8 MB chunk size sounds like a bad plan to me. I can imagine
> something needed to be done to compare files larger than X GB on a 32bit
> system, but 8MB is pretty small...

8MB is more than large enough.  It works at disk speed in my tests.  cp
still uses this value.  Old versions of cmp used the bogus value of
SIZE_T_MAX and aborted on large regular files when mmap() failed.
SIZE_T_MAX is bogus because it is larger than can possibly be mmapped
on 32-bit machines (except certain unsupported segmented ones), yet it
is not large enough for all files on 32-bit machines.  On 64-bite machines,
it is still more than can be mmapped (except...), but effectively infinity
since it is larger than all files.

cmp was changed to be more like cp.  Both are still remarkably defective.
cp is also remarkably ugly, especially in its fallback for when mmap()
fails.  The fallback for cmp is missing the ugliness, but it uses
getc() so it is very slow.  This might be the problem here.  The
fallback is to use c_special(), and c_special() is also used
unconditionally for "special" files, and special files are detected
badly:
- there is no way to force a file to be special (or not special).  This
   would be useful for testing the mmap() method and the non-mmap() method
   on the same file
- if one of the files is named "-", then this is an alias for stdin and
   the file is considered special.  I see no good reason to force
   specialness here.  It can be used to avoid the mmap() method.
- otherwise, one of the files is special if it is not regular according
   to fstat() on it.  For some reason, the fstat()s are not done if
   specialness was forced by one of the file names being "-".

In my tests, using "-" for one of the files mainly takes lots more user
time.  It only reduces the real time by 25%.  This is on a core2.  On
a system with a slow CPU, it is easy for getc() to be much slower than
the disk.

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Dieter BSD
In reply to this post by Dieter BSD
> Something/somehow it's issuing smaller IOs when using mmap?

On my box, 64K reads.  Using the '-' to avoid mmap it uses
128K.

The big difference I found was that the default mmap case isn't
using read-ahead. So it has to wait on the disk every time.  :-(

Using the '-' to avoid mmap it benefits from read-ahead, but the
default of 8 isn't large enough.  Crank up vfs.read_max and it
becomes cpu bound.  (assuming using 2 disks and not limited by
both disks being on the same wimpy controller)

A) Should the default vfs.read_max be increased?

B) Can the mmap case be fixed?  What is the aledged benefit of
using mmap anyway?  All I've even seen are problems.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Bruce Evans-4
In reply to this post by Bruce Evans-4
On Wed, 4 Jan 2012, Bruce Evans wrote:

> On Tue, 3 Jan 2012, Marc Olzheim wrote:
>
>> On Tue, Jan 03, 2012 at 12:21:10AM -0800, Garrett Cooper wrote:
>>>     The file is 3.0GB in size. Look at all those page faults though!
>>> Thanks!
>>> -Garrett
>>
>> From usr.bin/cmp/c_regular.c:
>>
>> #define MMAP_CHUNK (8*1024*1024)
>> ...
>> for (..) {
>> mmap() chunk of size MMAP_CHUNK.
>> compare
>> munmap()k
>> }
>>
>> That 8 MB chunk size sounds like a bad plan to me. I can imagine
>> something needed to be done to compare files larger than X GB on a 32bit
>> system, but 8MB is pretty small...
>
> 8MB is more than large enough.  It works at disk speed in my tests.  cp
> still uses this value.  Old versions of cmp used the bogus value of
> ...
> In my tests, using "-" for one of the files mainly takes lots more user
> time.  It only reduces the real time by 25%.  This is on a core2.  On
> a system with a slow CPU, it is easy for getc() to be much slower than
> the disk.

More careful tests showed serious slowness when the combined file sizes
exceeded the cache size.  cmp takes an enormous amount of CPU (see another
reply), and this seems to be done mostly in series with i/o, so the total
time increases too much.  A smaller mmap() size or not using mmap() at
all might improve paralellism.

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Bruce Evans-4
In reply to this post by Dieter BSD
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Dieter BSD
In reply to this post by Dieter BSD
> The hard \xc2\xa0 certainly deserves a :-(.

Agreed. Brain damaged guity-until-proven-innocent anti-spam measures
force the use of webmail for outgoing email. Which amoung other problems
inserts garbage. Sorry.

>> A) Should the default vfs.read_max be increased?
>
> Maybe, but I don't buy most claims that larger block sizes are better.

I didn't say anything about block sizes. There needs to be enough
data in memory so that the CPU doesn't run out while the disk is
seeking.

>> B) Can the mmap case be fixed? What is the aledged benefit of
>> using mmap anyway? All I've even seen are problems.
>
> It is much faster for cases where the file is already in memory. It
> is unclear whether this case is common enough to matter. I guess it
> isn't.

Is there a reasonably efficient way to tell if a file is already
in memory or not? If not, then we have to guess.
If the file is larger than memory it cannot already be in memory.
For real world uses, there are 2 files, and not all memory can be
used for buffering files. So cmp could check the file sizes and
if larger than x% of main memory then assume not in memory.
There could be a command line argument specifying which method to
use, or providing a guess whether the files are in memory or not.

I wrote a prototype no-features cmp using read(2) and memcmp(3).
For large files it is faster than the base cmp and uses less cpu.
It is I/O bound rather than CPU bound.

So perhaps use memcmp when possible and decide between read and mmap
based on (something)?

Assuming the added performance justifies the added complexity?
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Bruce Evans-4
On Thu, 12 Jan 2012, Dieter BSD wrote:

>>> A) Should the default vfs.read_max be increased?
>>
>> Maybe, but I don't buy most claims that larger block sizes are better.
>
> I didn't say anything about block sizes. There needs to be enough
> data in memory so that the CPU doesn't run out while the disk is
> seeking.

Oops.  I was thinking of read-ahead essentially extending the block
size.  It (or rather clustering) does exactly that for file systems
with small block sizes, provided the blocks are contiguous.  But
too much of it gives latency and resource wastage problems.  Reads
by other processes may be queued behind read-ahead that is never
used.

>>> B) Can the mmap case be fixed? What is the aledged benefit of
>>> using mmap anyway? All I've even seen are problems.
>>
>> It is much faster for cases where the file is already in memory. It
>> is unclear whether this case is common enough to matter. I guess it
>> isn't.
>
> Is there a reasonably efficient way to tell if a file is already
> in memory or not? If not, then we have to guess.

Not that I know of.  You would want to know how much of it is in
memory.

> If the file is larger than memory it cannot already be in memory.
> For real world uses, there are 2 files, and not all memory can be
> used for buffering files. So cmp could check the file sizes and
> if larger than x% of main memory then assume not in memory.
> There could be a command line argument specifying which method to
> use, or providing a guess whether the files are in memory or not.

I think the 8MB value does that well enough, especially now that
everyone has a GB or 16 of memory.

posix_fadvise() should probably be used for large files to tell the
system not to cache the data.  Its man page reminded me of the O_DIRECT
flag.  Certainly if the combined size exceeds the size of main memory,
O_DIRECT would be good (even for benchmarks that cmp the same files :-).
But cmp and cp are too old to use it.

> I wrote a prototype no-features cmp using read(2) and memcmp(3).
> For large files it is faster than the base cmp and uses less cpu.
> It is I/O bound rather than CPU bound.

What about using mmap() and memcmp()?  mmap() shouldn't be inherently
much worse than read().  I think it shouldn't and doesn't not read
ahead the whole mmap()ed size (8MB here), since that would be bad for
latency.  So it must page it in when it is accessed, and read ahead
for that.

there is another thread about how bad mmap() and sendfile() are with
zfs, because zfs is not merged with the buffer cache so using mmap()
with it wastes about a factor of 2 of memory; sendfile() uses mmap()
so using it with zfs is bad too.  Apparently no one uses cp or cmp
with zfs :-), or they would notice its slowness there too.

> So perhaps use memcmp when possible and decide between read and mmap
> based on (something)?
>
> Assuming the added performance justifies the added complexity?

I think memcmp() instead of byte comparision for cmp -lx is not very
complex.  More interesting is memcmp() for the general case.  For
small files (<= mmap()ed size), mmap() followed by memcmp(), then
go back to a byte comp to count the line number when memcmp() fails
seems good.  Going back is messier and slower for large files.  In
the worst case of files larger than memory with a difference at the
end, it involves reading everything twice, so it is twice as slow
if it is i/o bound.

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Dieter BSD
In reply to this post by Dieter BSD
> posix_fadvise() should probably be used for large files to tell the
> system not to cache the data. Its man page reminded me of the O_DIRECT
> flag. Certainly if the combined size exceeds the size of main memory,
> O_DIRECT would be good (even for benchmarks that cmp the same files :-).
> But cmp and cp are too old to use it.

8.2 says:
man -k posix_fadvise
posix_fadvise: nothing appropriate

The FreeBSD man pages web page says it is not in 9.0 either.

google found:
http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html

So what is this posix_fadvise() man page you mention?

O_DIRECT looked interesting, but I haven't found an explaination of
exactly what it does, and
find /usr/src/sys | xargs grep O_DIRECT | wc -l
188
was a bit much to wade through, so I didn't try O_DIRECT.

>> I wrote a prototype no-features cmp using read(2) and memcmp(3).
>> For large files it is faster than the base cmp and uses less cpu.
>> It is I/O bound rather than CPU bound.
>
> What about using mmap() and memcmp()? mmap() shouldn't be inherently
> much worse than read(). I think it shouldn't and doesn't not read
> ahead the whole mmap()ed size (8MB here), since that would be bad for
> latency. So it must page it in when it is accessed, and read ahead
> for that.

cmp 4GB 4GB
52.06 real 14.68 user 5.26 sys

cmp 4GB - < 4GB
44.37 real 33.87 user 5.53 sys

my_cmp 4GB 4GB
41.22 real 5.26 user 5.09 sys

> there is another thread about how bad mmap() and sendfile() are with
> zfs, because zfs is not merged with the buffer cache so using mmap()
> with it wastes about a factor of 2 of memory; sendfile() uses mmap()
> so using it with zfs is bad too. Apparently no one uses cp or cmp
> with zfs :-), or they would notice its slowness there too.

I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk.
People that run zfs obviously don't care about using lots of memory.

I only noticed the problem because cmp wasn't reading as fast as expected,
but wasn't cpu bound either.

> I think memcmp() instead of byte comparision for cmp -lx is not very
> complex. More interesting is memcmp() for the general case. For
> small files (<= mmap()ed size), mmap() followed by memcmp(), then
> go back to a byte comp to count the line number when memcmp() fails
> seems good. Going back is messier and slower for large files. In
> the worst case of files larger than memory with a difference at the
> end, it involves reading everything twice, so it is twice as slow
> if it is i/o bound.

Studying the cmp man page, it is... unfortunate. The default
prints the byte and line number if the files differ, so it needs
that info. The -l and -x options just keep going after the first
difference. If you want the first byte to be indexed 0 or 1 you can't
choose the radix independantly.

If we only needed the byte count it wouldn't be so bad, but needing
the line count really throws a wrench in the works if we want to use
memcpy(). The only way to avoid needing the line count is -s.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Bruce Evans-4
On Sun, 15 Jan 2012, Dieter BSD wrote:

>> posix_fadvise() should probably be used for large files to tell the
>> system not to cache the data. Its man page reminded me of the O_DIRECT
>> flag. Certainly if the combined size exceeds the size of main memory,
>> O_DIRECT would be good (even for benchmarks that cmp the same files :-).
>> But cmp and cp are too old to use it.
>
> 8.2 says:
> man -k posix_fadvise
> posix_fadvise: nothing appropriate
>
> The FreeBSD man pages web page says it is not in 9.0 either.
>
> google found:
> http://lists.freebsd.org/pipermail/freebsd-hackers/2011-May/035333.html
>
> So what is this posix_fadvise() man page you mention?

Standard in 10.0-current.  Not that I normally run that.  I thought I
remembered an older feature that gave this, and didn't notice that
the man page was so new.  Now I remember that the older feature is
madvise(), which is spelled posix_madvise() in POSIX-speak.  So
mmap() may be good for large files after all, but only with use of
madvise() for large files and complications to determine what is a
large file.

Recent mail about this was whether to the primary syscall for the new
API should be spelled correctly (as fadvise(), corresponding to
madvise()).  Currently, there is only the verbose() posix_fadvise().
The options for posix_fadvise() are a large subset of the ones for
madvise(), but spelled with F instead of M and a verbose POSIX prefix
(e.g., MADV_NORMAL for madavise() and even for posix_madvise() becomes
POSIX_FADV_NORMAL for posix_fadvise()).

> O_DIRECT looked interesting, but I haven't found an explaination of
> exactly what it does, and
> find /usr/src/sys | xargs grep O_DIRECT | wc -l
> 188
> was a bit much to wade through, so I didn't try O_DIRECT.

I have no experience using it, but think it is safe to try to see if
it helps.

>> I think memcmp() instead of byte comparision for cmp -lx is not very
>> complex. More interesting is memcmp() for the general case. For
>> small files (<= mmap()ed size), mmap() followed by memcmp(), then
>> go back to a byte comp to count the line number when memcmp() fails
>> seems good. Going back is messier and slower for large files. In
>> the worst case of files larger than memory with a difference at the
>> end, it involves reading everything twice, so it is twice as slow
>> if it is i/o bound.
>
> Studying the cmp man page, it is... unfortunate. The default
> prints the byte and line number if the files differ, so it needs
> that info. The -l and -x options just keep going after the first
> difference. If you want the first byte to be indexed 0 or 1 you can't
> choose the radix independantly.
>
> If we only needed the byte count it wouldn't be so bad, but needing
> the line count really throws a wrench in the works if we want to use
> memcpy(). The only way to avoid needing the line count is -s.

-l or -x also.  The FreeBSD man page isn't clear about when the line
number is printed.  It doesn't say that -l and -x cancel the general
requirement of printing the line number, but they do in practice.
POSIX doesn't have -x, at least in 2001, but it gives the precise
format for -l and there is no line number in it.

Maybe line counting is supposed to be pessimized further by supporting
wide characters.  wc is already fully pessimized for this, but it has
a not-quite-so-slow mode in which it doesn't call mbrtowc() and checks
for '\n' instead of L\'n'.  It also has an extremely fast mode for
wc -c and wc -m, in which for regular files, it just stats the file.

This is another indication that cmp is completely unsuitable for
comparing files for equality.  I couldn't find where POSIX says that
either wc or cmp must support wide characters or multi-byte characters,
but for cmp it says that if the file is not a text file then the line
count is simply the number of <newline> characters.  Clearly non-text
files consist of just bytes, so the <newline>s in them must be simply
'\n' characters which we don't want to count anyway.

Bruce
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: cmp(1) has a bottleneck, but where?

Tom Evans-3
In reply to this post by Dieter BSD
On Sun, Jan 15, 2012 at 11:32 PM, Dieter BSD <[hidden email]> wrote:
> I recently read somewhere that zfs needs 5 GB memory for each 1 TB of disk.
> People that run zfs obviously don't care about using lots of memory.

You read incorrectly. To run zfs with dedup needs ~ 5GB of RAM per TB,
but this depends upon file size.

However, the majority of ZFS users do not use dedup. My pool is 18 TB
with 8 GB of RAM, of which ZFS can only access 4 GB.

Cheers

Tom
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"