FreeBSD 7.1 disk performance issue on ESXi 3.5

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FreeBSD 7.1 disk performance issue on ESXi 3.5

Sebastiaan van Erk
Hi,

I want to deploy a production FreeBSD web site (database cluster, apache
cluster, ip failover using carp, etc.), however I'm experiencing painful
disk I/O throughput problems which currently does not make the above
project viable. I've done some rudimentary benchmarking of two
identically configured virtual machines (2 VCPUs, 512MB memory, 8GB
disk) and installed one with FreeBSD 7.1-amd64 and one with Linux Ubuntu
8.10-amd64. These are the results I'm getting with dbench <n>:

<n>             1               2               4
freebsd         12.0009         13.6348         12.9402         (MB/s)
linux           376.145         651.314         634.649         (MB/s)

Both virtual machines run dbench 3.04 and the results are extremely
stable over repeated runs.

The virtual hardware detected by the FreeBSD machine is as follows:

mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x1080-0x10ff mem
0xf4810000-0xf4810fff irq 17 at device 16.0 on pci0
mpt0: [ITHREAD]
mpt0: MPI Version=1.2.0.0

And:

da0 at mpt0 bus 0 target 0 lun 0
da0: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
da0: 3.300MB/s transfers
da0: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)

I've also run unixbench (4.1 and 5.1.2) and the performance of the
FreeBSD machine is horrible compared to Linux on many of the tests,
though my first guess is that it all comes back down the disk
performance (on the CPU-only tests the results are about the same).

Online when I see logs of da0 specs via google, they more look more like
this (much higher transfer rate, and SCSI-n, n>2):

da0: <ATA GB0500C8046 HPG1> Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers

Does anybody know how I can get proper performance for the drive under ESXi?

Regards,
Sebastiaan


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Ivan Voras
Sebastiaan van Erk wrote:

> Hi,
>
> I want to deploy a production FreeBSD web site (database cluster, apache
> cluster, ip failover using carp, etc.), however I'm experiencing painful
> disk I/O throughput problems which currently does not make the above
> project viable. I've done some rudimentary benchmarking of two
> identically configured virtual machines (2 VCPUs, 512MB memory, 8GB
> disk) and installed one with FreeBSD 7.1-amd64 and one with Linux Ubuntu
> 8.10-amd64. These are the results I'm getting with dbench <n>:
>
> <n>             1               2               4
> freebsd         12.0009         13.6348         12.9402         (MB/s)
> linux           376.145         651.314         634.649         (MB/s)
>
> Both virtual machines run dbench 3.04 and the results are extremely
> stable over repeated runs.
>
> The virtual hardware detected by the FreeBSD machine is as follows:

> Does anybody know how I can get proper performance for the drive under
> ESXi?

VMWare has many optimizations for Linux that are not used with FreeBSD.
VMI, for example, makes the Linux guest paravirtualized, and then there
are special drivers for networking, its vmotion driver (this one
probably doesn't contribute to performance much), etc. and Linux is in
any case much better tested and supported.

If VMWare allows, you may try changing the type of the controller (I
don't know about ESXi but VMWare Server supports LSI or Buslogic SCSI
emulation) or switch to ATA emulation and try again.

A generic optimization is to reduce kern.hz to something like 50 but it
probably won't help your disk performance.

As for unixbench, you need to examine and compare each microbenchmark
result individually before drawing a conclusion.


signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Sebastiaan van Erk
Ivan Voras wrote:

Hi,

Thanks for the reply.

> Sebastiaan van Erk wrote:
>> Hi,

[snip]

>> <n>             1               2               4
>> freebsd         12.0009         13.6348         12.9402         (MB/s)
>> linux           376.145         651.314         634.649         (MB/s)
>>
>> Both virtual machines run dbench 3.04 and the results are extremely
>> stable over repeated runs.

> VMWare has many optimizations for Linux that are not used with FreeBSD.
> VMI, for example, makes the Linux guest paravirtualized, and then there
> are special drivers for networking, its vmotion driver (this one
> probably doesn't contribute to performance much), etc. and Linux is in
> any case much better tested and supported.

VMI/paravirtualization is not enabled for this Linux host. Neither is
VMotion. Networking is performing extremely well (see also below).

> If VMWare allows, you may try changing the type of the controller (I
> don't know about ESXi but VMWare Server supports LSI or Buslogic SCSI
> emulation) or switch to ATA emulation and try again.

I tried this, and it has no significant effect. Just for completeness,
here's the relevant output of dmesg:

bt0: <Buslogic Multi-Master SCSI Host Adapter> port 0x1060-0x107f mem
0xf4810000-0xf481001f irq 17 at device 16.0 on pci0
bt0: BT-958 FW Rev. 5.07B Ultra Wide SCSI Host Adapter, SCSI ID 7, 192 CCBs
bt0: [GIANT-LOCKED]
bt0: [ITHREAD]

da0 at bt0 bus 0 target 0 lun 0
da0: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
da0: 40.000MB/s transfers (20.000MHz DT, offset 15, 16bit)
da0: 8192MB (16777216 512 byte sectors: 255H 63S/T 1044C)

The transfer rate for dbench 1 is 15.0118 MB/s.

> A generic optimization is to reduce kern.hz to something like 50 but it
> probably won't help your disk performance.

I already had this (not 50, but 100), but this doesn't do anything for
the disk performance.

> As for unixbench, you need to examine and compare each microbenchmark
> result individually before drawing a conclusion.

Yes, I realize that. However the dbench result is my first priority, and
  when (if) that is fixed, I'll run the unixbench again and see what my
next priority is.

(However, just to give you an idea I attached the basic 5.1.2 unixbench
outputs (the CPU info for FreeBSD is "fake", since unixbench does a cat
/proc/cpuinfo, so I removed the /proc/ part and copied the output under
linux to the "procinfo" file.)

Finally, I also ran some network benchmarks such as netio, and tested VM
to VM communication on *different* ESXi machines connected via Gigabit
ethernet, and it achieved more than 100MB/s throughput.

Since CPU speed + Network IO are doing just fine, I'm guessing this is a
pure disk (driver?) related issue. However, to go into production with
FreeBSD I *must* be able to fix it.

Note also the discrepency: 12 MB/s vs 350 MB/s on disk access! My lousy
home machine (FreeBSD) is even 5 times faster at 60 MB/s. This machine
has extremely fast disks in a RAID10 configuration.

Any ideas are welcome!

Regards,
Sebastiaan

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Sebastiaan van Erk
Sebastiaan van Erk wrote:
> (However, just to give you an idea I attached the basic 5.1.2 unixbench
> outputs (the CPU info for FreeBSD is "fake", since unixbench does a cat
> /proc/cpuinfo, so I removed the /proc/ part and copied the output under
> linux to the "procinfo" file.)

Of course I forgot to attach them... :-(

Here they are.

Regards,
Sebastiaan


gmake all
gmake[1]: Entering directory `/root/tmp/unixbench-5.1.2'
Checking distribution of files
./pgms  exists
./src  exists
./testdir  exists
./tmp  exists
./results  exists
gmake[1]: Leaving directory `/root/tmp/unixbench-5.1.2'

   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.2                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   December 22, 2007                  johantheghost at yahoo period com


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

2 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

2 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

2 x Execl Throughput  1 2 3

2 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

2 x File Copy 256 bufsize 500 maxblocks  1 2 3

2 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

2 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

2 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

2 x Process Creation  1 2 3

2 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

2 x Shell Scripts (1 concurrent)  1 2 3

2 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.2)

   System: test-fbsd.vpn1.sebster.com: FreeBSD
   OS: FreeBSD -- 7.1-RELEASE -- FreeBSD 7.1-RELEASE #1: Mon Feb  9 18:26:19 CET 2009     [hidden email]:/usr/obj/usr/src/sys/VMWARE
   Machine: amd64 (VMWARE)
   Language: en_US.utf8 (charmap=, collate=)
   CPU 0: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (4999.9 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   6:25AM  up  6:54, 1 user, load averages: 0.08, 0.02, 0.01; runlevel

------------------------------------------------------------------------
Benchmark Run: Tue Feb 10 2009 06:25:49 - 06:54:08
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       14144383.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3238.7 MWIPS (9.9 s, 7 samples)
Execl Throughput                                630.0 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         28793.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           33410.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks         33536.8 KBps  (30.1 s, 2 samples)
Pipe Throughput                             1146784.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  36203.6 lps   (10.0 s, 7 samples)
Process Creation                                783.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                    645.1 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                    115.4 lpm   (60.1 s, 2 samples)
System Call Overhead                         939647.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   14144383.9   1212.0
Double-Precision Whetstone                       55.0       3238.7    588.9
Execl Throughput                                 43.0        630.0    146.5
File Copy 1024 bufsize 2000 maxblocks          3960.0      28793.2     72.7
File Copy 256 bufsize 500 maxblocks            1655.0      33410.0    201.9
File Copy 4096 bufsize 8000 maxblocks          5800.0      33536.8     57.8
Pipe Throughput                               12440.0    1146784.7    921.9
Pipe-based Context Switching                   4000.0      36203.6     90.5
Process Creation                                126.0        783.3     62.2
Shell Scripts (1 concurrent)                     42.4        645.1    152.2
Shell Scripts (8 concurrent)                      6.0        115.4    192.3
System Call Overhead                          15000.0     939647.5    626.4
                                                                   ========
System Benchmarks Index Score                                         212.4

------------------------------------------------------------------------
Benchmark Run: Tue Feb 10 2009 06:54:08 - 07:22:31
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       28392958.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     6478.9 MWIPS (9.9 s, 7 samples)
Execl Throughput                                685.4 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks         13436.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           12444.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks         18346.1 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2284342.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 135677.4 lps   (10.0 s, 7 samples)
Process Creation                               1423.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                    758.1 lpm   (60.2 s, 2 samples)
Shell Scripts (8 concurrent)                    110.2 lpm   (60.3 s, 2 samples)
System Call Overhead                        1472342.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   28392958.1   2433.0
Double-Precision Whetstone                       55.0       6478.9   1178.0
Execl Throughput                                 43.0        685.4    159.4
File Copy 1024 bufsize 2000 maxblocks          3960.0      13436.5     33.9
File Copy 256 bufsize 500 maxblocks            1655.0      12444.8     75.2
File Copy 4096 bufsize 8000 maxblocks          5800.0      18346.1     31.6
Pipe Throughput                               12440.0    2284342.6   1836.3
Pipe-based Context Switching                   4000.0     135677.4    339.2
Process Creation                                126.0       1423.3    113.0
Shell Scripts (1 concurrent)                     42.4        758.1    178.8
Shell Scripts (8 concurrent)                      6.0        110.2    183.7
System Call Overhead                          15000.0    1472342.3    981.6
                                                                   ========
System Benchmarks Index Score                                         257.2


make all
make[1]: Entering directory `/root/tmp/unixbench-5.1.2'
Checking distribution of files
./pgms  exists
./src  exists
./testdir  exists
./tmp  exists
./results  exists
make[1]: Leaving directory `/root/tmp/unixbench-5.1.2'

   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.2                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   December 22, 2007                  johantheghost at yahoo period com


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

2 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

2 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

2 x Execl Throughput  1 2 3

2 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

2 x File Copy 256 bufsize 500 maxblocks  1 2 3

2 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

2 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

2 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

2 x Process Creation  1 2 3

2 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

2 x Shell Scripts (1 concurrent)  1 2 3

2 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.2)

   System: test-ubuntu: GNU/Linux
   OS: GNU/Linux -- 2.6.27-7-server -- #1 SMP Fri Oct 24 07:20:47 UTC 2008
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (4999.9 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (5000.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   15:15:06 up  2:19,  1 user,  load average: 0.00, 0.00, 0.00; runlevel 2

------------------------------------------------------------------------
Benchmark Run: Mon Feb 09 2009 15:15:06 - 15:43:20
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       18610575.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2990.1 MWIPS (10.0 s, 7 samples)
Execl Throughput                               1058.6 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        468973.2 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          132022.2 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        921448.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1132933.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  93429.0 lps   (10.0 s, 7 samples)
Process Creation                               1744.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2566.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    518.4 lpm   (60.1 s, 2 samples)
System Call Overhead                        1935577.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   18610575.3   1594.7
Double-Precision Whetstone                       55.0       2990.1    543.7
Execl Throughput                                 43.0       1058.6    246.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     468973.2   1184.3
File Copy 256 bufsize 500 maxblocks            1655.0     132022.2    797.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     921448.5   1588.7
Pipe Throughput                               12440.0    1132933.6    910.7
Pipe-based Context Switching                   4000.0      93429.0    233.6
Process Creation                                126.0       1744.3    138.4
Shell Scripts (1 concurrent)                     42.4       2566.9    605.4
Shell Scripts (8 concurrent)                      6.0        518.4    864.0
System Call Overhead                          15000.0    1935577.0   1290.4
                                                                   ========
System Benchmarks Index Score                                         656.1

------------------------------------------------------------------------
Benchmark Run: Mon Feb 09 2009 15:43:20 - 16:11:33
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       37293015.8 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     5980.1 MWIPS (10.0 s, 7 samples)
Execl Throughput                               2235.9 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        204300.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           68566.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        443410.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2259067.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 193899.7 lps   (10.0 s, 7 samples)
Process Creation                               3594.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   4145.7 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    528.6 lpm   (60.1 s, 2 samples)
System Call Overhead                        2673215.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   37293015.8   3195.6
Double-Precision Whetstone                       55.0       5980.1   1087.3
Execl Throughput                                 43.0       2235.9    520.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     204300.9    515.9
File Copy 256 bufsize 500 maxblocks            1655.0      68566.9    414.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     443410.3    764.5
Pipe Throughput                               12440.0    2259067.1   1816.0
Pipe-based Context Switching                   4000.0     193899.7    484.7
Process Creation                                126.0       3594.8    285.3
Shell Scripts (1 concurrent)                     42.4       4145.7    977.8
Shell Scripts (8 concurrent)                      6.0        528.6    881.1
System Call Overhead                          15000.0    2673215.9   1782.1
                                                                   ========
System Benchmarks Index Score                                         834.4


smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Ivan Voras
Sebastiaan van Erk wrote:
> Sebastiaan van Erk wrote:
>> (However, just to give you an idea I attached the basic 5.1.2
>> unixbench outputs (the CPU info for FreeBSD is "fake", since unixbench
>> does a cat /proc/cpuinfo, so I removed the /proc/ part and copied the
>> output under linux to the "procinfo" file.)


   System: test-fbsd.vpn1.sebster.com: FreeBSD
------------------------------------------------------------------------
Benchmark Run: Tue Feb 10 2009 06:25:49 - 06:54:08
2 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   14144383.9   1212.0
Double-Precision Whetstone                       55.0       3238.7    588.9
Execl Throughput                                 43.0        630.0    146.5
File Copy 1024 bufsize 2000 maxblocks          3960.0      28793.2     72.7
File Copy 256 bufsize 500 maxblocks            1655.0      33410.0    201.9
File Copy 4096 bufsize 8000 maxblocks          5800.0      33536.8     57.8
Pipe Throughput                               12440.0    1146784.7    921.9
Pipe-based Context Switching                   4000.0      36203.6     90.5
Process Creation                                126.0        783.3     62.2
Shell Scripts (1 concurrent)                     42.4        645.1    152.2
Shell Scripts (8 concurrent)                      6.0        115.4    192.3
System Call Overhead                          15000.0     939647.5    626.4
                                                                   ========
System Benchmarks Index Score                                         212.4


   System: test-ubuntu: GNU/Linux
------------------------------------------------------------------------
Benchmark Run: Mon Feb 09 2009 15:15:06 - 15:43:20
2 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   18610575.3   1594.7
Double-Precision Whetstone                       55.0       2990.1    543.7
Execl Throughput                                 43.0       1058.6    246.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     468973.2   1184.3
File Copy 256 bufsize 500 maxblocks            1655.0     132022.2    797.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     921448.5   1588.7
Pipe Throughput                               12440.0    1132933.6    910.7
Pipe-based Context Switching                   4000.0      93429.0    233.6
Process Creation                                126.0       1744.3    138.4
Shell Scripts (1 concurrent)                     42.4       2566.9    605.4
Shell Scripts (8 concurrent)                      6.0        518.4    864.0
System Call Overhead                          15000.0    1935577.0   1290.4
                                                                   ========
System Benchmarks Index Score                                         656.1


The results are ... interesting. It seems that FreeBSD simply dies in
any test having a high context switch rate. Hmmm, this looks familiar.
Either I or a collegue of mine had a similar situation some time ago,
with the same discrepancy in disk speeds and the same difference in
context switches. Unfortunately, there was no solution.



signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Antony Mawer-12
Ivan Voras wrote:
> Sebastiaan van Erk wrote:
>> Sebastiaan van Erk wrote:
>>> (However, just to give you an idea I attached the basic 5.1.2
>>> unixbench outputs (the CPU info for FreeBSD is "fake", since unixbench
>>> does a cat /proc/cpuinfo, so I removed the /proc/ part and copied the
>>> output under linux to the "procinfo" file.)
>
... benchmark results snipped ...
>
> The results are ... interesting. It seems that FreeBSD simply dies in
> any test having a high context switch rate. Hmmm, this looks familiar.
> Either I or a collegue of mine had a similar situation some time ago,
> with the same discrepancy in disk speeds and the same difference in
> context switches. Unfortunately, there was no solution.

How would one go about gathering data on such a scenario to help improve
this? We were planning a project involving VMware deployments with
FreeBSD 7.1 systems in the near future, but if performance is that bad
it is likely to be a show stopper.

Where do we start looking and who should we be talking to?

-- Antony
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Ivan Voras
Antony Mawer wrote:

> Ivan Voras wrote:
>> Sebastiaan van Erk wrote:
>>> Sebastiaan van Erk wrote:
>>>> (However, just to give you an idea I attached the basic 5.1.2
>>>> unixbench outputs (the CPU info for FreeBSD is "fake", since unixbench
>>>> does a cat /proc/cpuinfo, so I removed the /proc/ part and copied the
>>>> output under linux to the "procinfo" file.)
>>
> ... benchmark results snipped ...
>>
>> The results are ... interesting. It seems that FreeBSD simply dies in
>> any test having a high context switch rate. Hmmm, this looks familiar.
>> Either I or a collegue of mine had a similar situation some time ago,
>> with the same discrepancy in disk speeds and the same difference in
>> context switches. Unfortunately, there was no solution.
>
> How would one go about gathering data on such a scenario to help improve
> this? We were planning a project involving VMware deployments with
> FreeBSD 7.1 systems in the near future, but if performance is that bad
> it is likely to be a show stopper.
>
> Where do we start looking and who should we be talking to?
Relax, we didn't yet actually establish that it isn't a local problem.
I've talked a little with OP but nothing we did yet made it better.

For what it's worth, my experience is that on VMWare Server, whose
emulated SCSI hardware is detected exactly the same as on ESX, the
performance is "normal" (i.e. as expected), and on an ESX 3.0 on a slow
array (and sharing it with other active machines) I get results around
20 MB/s in dbench - which is better than the OP gets on a fast array.
All this is with 32-bit guests, FreeBSD 7.0 or 7.1.

If I get the chance, I'll try ESXi within the next few days and try to
replicate the problem.



signature.asc (260 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Ivan Voras
In reply to this post by Antony Mawer-12
2009/2/11 Antony Mawer <[hidden email]>:

> How would one go about gathering data on such a scenario to help improve
> this? We were planning a project involving VMware deployments with FreeBSD
> 7.1 systems in the near future, but if performance is that bad it is likely
> to be a show stopper.

I have now tested it under ESXi 3.5, and here's what I find:

In FreeBSD 7.1 amd64, 4 vCPUs performance for dbench is :
1 proc : 155 MB/s, 2 proc: 175 MB/s, 4 proc: 188 MB/s
The same performance *as reported by VMWare's Infrastructure Client*
("performance" tab): around 50 MB/s in all cases
Visual inspection of drives' LED indicators (2 drive 10k RPM RAID0 hw
array) confirms constant activity.

In Ubuntu 8.10 amd64, 4 vCPUs, performance for dbench is :
1 proc: 375 MB/s, 2 proc: 660 MB/s, 4 proc: 1055 MB/s (sic!)
The same performance *as reported by VMWare Infrastructure Client*:
around 25 MB/s in all cases (sic!)
Visual inspection of drives: very sporadic activity

The maximum performance expected from this array is around 150 MB/s
*at peaks* - there is physically no way it can go above this, so I
judge the above measurements bogus.

This is all very strange. Someone here is caching more than it should
be, and it looks like it's VMWare. It doesn't look as clock skew in
the guests since "iostat 1" et al work at about 1sec wallclock time.
The "visual inspection" oddity inspired me to do another benchmark:

Bonnie++ reports:
For FreeBSD: write: 52 MB/s, rewrite: 21 MB/s, read: 45 MB/s

For Linux: write: 141 MB/s, rewrite: 55 MB/s, read: 168 MB/s

VMWare's Infrastructure Client agrees with these performance
measurements in both cases, and drives are blinking as expected.

As previously demonstrated by me and others, Linux usually has
significantly better file system performance in the non-virtualized
case, so the difference could be simply increased by the
virtualization.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Josh Paetzel

On Feb 11, 2009, at 1:02 PM, Ivan Voras wrote:

<snip>
>
>
> As previously demonstrated by me and others, Linux usually has
> significantly better file system performance in the non-virtualized
> case, so the difference could be simply increased by the
> virtualization.


In my limited experience with VMWare linux seems to have near bare  
metal disk performance.  FreeBSD seems to incur a significant  
performance penalty.  For instance on my laptop, running OSX and  
VMWare Fusion, FreeBSd virtual machines can't saturate 100TX off the  
disk, raw dd manages about 7 Megs/sec, which is in line with what I  
get shovelling big files around.   Disk is a 7200 RPM SATA2.

Thanks,

Josh Paetzel

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Ivan Voras-9
On Feb 13, 2009 8:27pm, Josh Paetzel <[hidden email]> wrote:

> In my limited experience with VMWare linux seems to have near bare metal  
disk performance. FreeBSD seems to incur a significant performance penalty.  
For instance on my laptop, running OSX and VMWare Fusion, FreeBSd virtual  
machines can't saturate 100TX off the disk, raw dd manages about 7  
Megs/sec, which is in line with what I get shovelling big files around.  
Disk is a 7200 RPM SATA2.
>

You might want to try the patch Scott Long made recently  
(http://svn.freebsd.org/changeset/base/188570) - I found it doubles the  
performance in some cases for VMWare (writing mostly, though sequential  
reading can be similarly improved by the combination of this patch and  
increasing vfs.read_max), but it's still worse than with Linux (100 MB/s vs  
150 MB/s).

As for the original thread topic: I've communicated with the OP and it  
appears his method of benchmarking had an error so the problems that appear  
in his post are bogus.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 7.1 disk performance issue on ESXi 3.5

Sebastiaan van Erk
Hi,

[hidden email] wrote:

> As for the original thread topic: I've communicated with the OP and it
> appears his method of benchmarking had an error so the problems that
> appear in his post are bogus.

It is not quite true that the "method" is bogus, there just seems to be
a huge difference between a soft updates vs non-soft-updates disk.

These are the results I get now:

dbench -D <dir> -t 60 1

on / (ufs, local):
Throughput 13.4561 MB/sec 1 procs

on /tmp (ufs, local, soft-updates):
Throughput 92.299 MB/sec 1 procs

However, whether it is caching or not, Linux gets 350 MB/s using 1
process and even 650 MB/s using 2. As I understand it, this shouldn't be
possible on the physical disks, but still, the *virtual* disk seems to
get this performance.

When I benchmark the linux vs the freebsd using Unixbench 4.1/5.1 (I
tried both) I also get ***HUGE*** differences:

    System: test-fbsd.vpn1.sebster.com: FreeBSD
------------------------------------------------------------------------
Benchmark Run: Tue Feb 10 2009 06:25:49 - 06:54:08
2 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   14144383.9   1212.0
Double-Precision Whetstone                       55.0       3238.7    588.9
Execl Throughput                                 43.0        630.0    146.5
File Copy 1024 bufsize 2000 maxblocks          3960.0      28793.2     72.7
File Copy 256 bufsize 500 maxblocks            1655.0      33410.0    201.9
File Copy 4096 bufsize 8000 maxblocks          5800.0      33536.8     57.8
Pipe Throughput                               12440.0    1146784.7    921.9
Pipe-based Context Switching                   4000.0      36203.6     90.5
Process Creation                                126.0        783.3     62.2
Shell Scripts (1 concurrent)                     42.4        645.1    152.2
Shell Scripts (8 concurrent)                      6.0        115.4    192.3
System Call Overhead                          15000.0     939647.5    626.4
                                                                    ========
System Benchmarks Index Score                                         212.4


    System: test-ubuntu: GNU/Linux
------------------------------------------------------------------------
Benchmark Run: Mon Feb 09 2009 15:15:06 - 15:43:20
2 CPUs in system; running 1 parallel copy of tests

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   18610575.3   1594.7
Double-Precision Whetstone                       55.0       2990.1    543.7
Execl Throughput                                 43.0       1058.6    246.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     468973.2   1184.3
File Copy 256 bufsize 500 maxblocks            1655.0     132022.2    797.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     921448.5   1588.7
Pipe Throughput                               12440.0    1132933.6    910.7
Pipe-based Context Switching                   4000.0      93429.0    233.6
Process Creation                                126.0       1744.3    138.4
Shell Scripts (1 concurrent)                     42.4       2566.9    605.4
Shell Scripts (8 concurrent)                      6.0        518.4    864.0
System Call Overhead                          15000.0    1935577.0   1290.4
                                                                    ========
System Benchmarks Index Score                                         656.1

Here the disk intensive test (file copy) and context switch/process
creation test do terrible.

For all my personal servers this is not an issue for me at all. But for
a big high traffic web site I'm building, I'm afraid I'm going to have
to go for Linux. :-(

Regards,
Sebastiaan
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to
> "[hidden email]"

smime.p7s (4K) Download Attachment