System perforamance 4.x vs. 5.x and 6.x

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

System perforamance 4.x vs. 5.x and 6.x

Brett Bump

I've recently upgraded a mailserver from a 4.x version to 6.2.
This server had been upgraded a few years ago to 5.x but the
performance was so bad that we only let it run a few days before
moving it back to 4.x.  Years pass and it seemed time once again
to move forward.

What is the magic bullet in getting the same kind of performance
out of a 5.x or 6.x version that I've just come to expect from
FreeBSD ever since version 1?

I'm seeing signal 6's on apache and imapd (never happened before)
network errors, serious response time errors and generally poor
performance during peak activity (same box, same people).

ufs memory looks exactly like it did before and doesn't max:

vfs.ufs.dirhash_minsize: 2560
vfs.ufs.dirhash_maxmem: 2097152
vfs.ufs.dirhash_mem: 1923157
vfs.ufs.dirhash_docheck: 0

mbufs hasn't changed:

536/604/1140 mbufs in use (current/cache/total)

and disk performance is very good EXCEPT during peak activity:

--------------------------------------------------------------------
Mail Server (Dual Xeon P4 3mhz 2g memory [Perc] U320):

-bash-2.05b$ time dd if=/dev/zero bs=1024k of=tstfile2 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 47.037099 secs (22827552 bytes/sec)

real    0m47.041s
user    0m0.000s
sys     0m5.444s
-bash-2.05b$ time dd if=tstfile2 bs=1024k of=/dev/null
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 2213.643946 secs (485056 bytes/sec)

real    36m53.647s  <---Check it out.
user    0m0.008s
sys     0m3.619s

--------------------------------------------------------------------

I've changed the order of php extensions, disabled autonegotiation,
moved mail queues and large volume directory folders to separate
drives and set noatime.  Nothing seems to make much of an impact.
My next idea was to setup my kernel for device_polling, but none of
this is really diagnosing what the real problem is.  Any clues?

Brett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

mdtancsa
At 02:22 PM 2/14/2008, Brett Bump wrote:

>I've recently upgraded a mailserver from a 4.x version to 6.2.

I would say move to 6.3R as its a better release with a lot of bug
fixes.  In terms of your general performance issues, choice of
hardware really makes a difference as quality of drivers can be an
issue.  You might have a really awesome controller that works well on
Windows or LINUX, but does not do so well under FreeBSD because there
isnt any good driver support for it.


>I'm seeing signal 6's on apache and imapd (never happened before)

Did you do a fresh install or did you try and migrate from RELENG_4
to RELENG_6 ?  What network card are you using ? What are the errors
(CRC?).  How about a dmesg from the box.

         ---Mike

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Kris Kennaway-3
In reply to this post by Brett Bump
Brett Bump wrote:

> I've changed the order of php extensions, disabled autonegotiation,
> moved mail queues and large volume directory folders to separate
> drives and set noatime.  Nothing seems to make much of an impact.
> My next idea was to setup my kernel for device_polling, but none of
> this is really diagnosing what the real problem is.  Any clues?

We are going to need more information about your system.  What do you
mean by "peak activity"?  What is running on the system when it performs
badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
kernel configuration, dmesg and relevant aspects of the system
configuration?

Kris
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Brett Bump
In reply to this post by mdtancsa


On Thu, 14 Feb 2008, Mike Tancsa wrote:

> At 02:22 PM 2/14/2008, Brett Bump wrote:
>
> >I've recently upgraded a mailserver from a 4.x version to 6.2.
>
> I would say move to 6.3R as its a better release with a lot of bug
> fixes.  In terms of your general performance issues, choice of
> hardware really makes a difference as quality of drivers can be an
> issue.  You might have a really awesome controller that works well on
> Windows or LINUX, but does not do so well under FreeBSD because there
> isnt any good driver support for it.

Again, that isn't diagnosing the problem as much as just saying that 5.0
through 6.2 were all bad releases???  I doubt that can be the case.  Why
would the driver support for this machine (working FLAWLESSLY on 4.10)
now have bad drivers (this machine has been running 4.x for 4 years).

> >I'm seeing signal 6's on apache and imapd (never happened before)
>
> Did you do a fresh install or did you try and migrate from RELENG_4
> to RELENG_6 ?  What network card are you using ? What are the errors
> (CRC?).  How about a dmesg from the box.
>
>          ---Mike
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to "[hidden email]"
>

Fresh install ALWAYS (no migrate, I never go that route).

bge0: Broadcom BCM5704 A2, ASIC rev. 0x2002
bge1: Broadcom BCM5704 A2, ASIC rev. 0x2002

-bash-2.05b$ dmesg
pid 31611 (milter-greylist), uid 25: exited on signal 3
pid 43464 (httpd), uid 80: exited on signal 6
pid 86995 (imapd), uid 2151: exited on signal 6
pid 85706 (httpd), uid 80: exited on signal 6
pid 87600 (imapd), uid 1376: exited on signal 6
pid 45621 (httpd), uid 80: exited on signal 6
pid 45617 (httpd), uid 80: exited on signal 6

The greylist entry is a standard 3am cron restart.

Brett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Bill Moran-2
In reply to this post by Brett Bump
In response to Brett Bump <[hidden email]>:
>
> I'm seeing signal 6's on apache and imapd (never happened before)
> network errors, serious response time errors and generally poor
> performance during peak activity (same box, same people).

IIRC, signal 6 is an indicator that you've compiled binaries that are
almost, but not quite compatible with your CPU.

If this machine has been 4.X for a while, it's probably old hardware.
Make sure you're using the correct CPU definition in your kernel
config and in your make configuration.

What _is_ the hardware?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[hidden email]
Phone: 412-422-3463x4023
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Kris Kennaway-3
In reply to this post by Brett Bump
Brett Bump wrote:

>
> On Thu, 14 Feb 2008, Mike Tancsa wrote:
>
>> At 02:22 PM 2/14/2008, Brett Bump wrote:
>>
>>> I've recently upgraded a mailserver from a 4.x version to 6.2.
>> I would say move to 6.3R as its a better release with a lot of bug
>> fixes.  In terms of your general performance issues, choice of
>> hardware really makes a difference as quality of drivers can be an
>> issue.  You might have a really awesome controller that works well on
>> Windows or LINUX, but does not do so well under FreeBSD because there
>> isnt any good driver support for it.
>
> Again, that isn't diagnosing the problem as much as just saying that 5.0
> through 6.2 were all bad releases???  I doubt that can be the case.  Why
> would the driver support for this machine (working FLAWLESSLY on 4.10)
> now have bad drivers (this machine has been running 4.x for 4 years).

All it takes is a single bug (e.g. in a driver) to affect performance on
a certain specific configuration.  However, bugs tend to get fixed over
time.  Maybe that is the case for you.  It is well worth verifying
whether the problem persists on the most up-to-date sources, so that
everyone's time is not wasted in tracking down a problem that is already
fixed.  You can just do a source upgrade from 6.2, which will be quite
straightforward.

> bge0: Broadcom BCM5704 A2, ASIC rev. 0x2002
> bge1: Broadcom BCM5704 A2, ASIC rev. 0x2002
>
> -bash-2.05b$ dmesg
> pid 31611 (milter-greylist), uid 25: exited on signal 3
> pid 43464 (httpd), uid 80: exited on signal 6
> pid 86995 (imapd), uid 2151: exited on signal 6
> pid 85706 (httpd), uid 80: exited on signal 6
> pid 87600 (imapd), uid 1376: exited on signal 6
> pid 45621 (httpd), uid 80: exited on signal 6
> pid 45617 (httpd), uid 80: exited on signal 6
>
> The greylist entry is a standard 3am cron restart.

It is pretty unusual for applications to be aborting, but usually they
do it because they fail an application-specific run-time check.  What
diagnostics are logged by the applications?  You may need to increase
their respective verbosity/debug levels.

Kris

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

mdtancsa
In reply to this post by Brett Bump
At 03:09 PM 2/14/2008, Brett Bump wrote:


>On Thu, 14 Feb 2008, Mike Tancsa wrote:
>
> > At 02:22 PM 2/14/2008, Brett Bump wrote:
> >
> > >I've recently upgraded a mailserver from a 4.x version to 6.2.
> >
> > I would say move to 6.3R as its a better release with a lot of bug
> > fixes.  In terms of your general performance issues, choice of
> > hardware really makes a difference as quality of drivers can be an
> > issue.  You might have a really awesome controller that works well on
> > Windows or LINUX, but does not do so well under FreeBSD because there
> > isnt any good driver support for it.
>
>Again, that isn't diagnosing the problem as much as just saying that 5.0
>through 6.2 were all bad releases???


No, but you havent given the list much to go on as to what the
problems are or what hardware you are using, or really quantified the
issue. By "slow" is the disk blocking on IO ? or are processes
blocking on network IO etc etc.  6.2 was not a "bad" release, but 6.3
is better than 6.2.  By starting with a more contemporary release,
less effort by developers and other users need to be exerted in
figuring out if the problem(s) you are running into have already been fixed.


>I doubt that can be the case.  Why
>would the driver support for this machine (working FLAWLESSLY on 4.10)
>now have bad drivers (this machine has been running 4.x for 4 years).


Because the drivers have changed since 4.10.  "improvements" could
have introduced regressions... Change in the driver to support newer
versions of a chipset might break older chipsets.


> > >I'm seeing signal 6's on apache and imapd (never happened before)
> >
> > Did you do a fresh install or did you try and migrate from RELENG_4
> > to RELENG_6 ?  What network card are you using ? What are the errors
> > (CRC?).  How about a dmesg from the box.
>
>bge0: Broadcom BCM5704 A2, ASIC rev. 0x2002
>bge1: Broadcom BCM5704 A2, ASIC rev. 0x2002

bge is a good example of a driver that has had a lot of changes and
hasnt worked all that well at times.... hence the suggestion to try
6.3 as there have been many bug fixes.  Whether or not it fixes your
problem its hard to say, but start there to see if things are faster
and stable for you etc.
e.g.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c

You should also post a full dmesg of the box as well as kernel config etc...

what does
netstat -ni
give
and what options do you have on ifconfig ?  Are the errors seen on
your switch port as well or just in netstat -ni ?



>-bash-2.05b$ dmesg
>pid 31611 (milter-greylist), uid 25: exited on signal 3
>pid 43464 (httpd), uid 80: exited on signal 6
>pid 86995 (imapd), uid 2151: exited on signal 6
>pid 85706 (httpd), uid 80: exited on signal 6
>pid 87600 (imapd), uid 1376: exited on signal 6
>pid 45621 (httpd), uid 80: exited on signal 6
>pid 45617 (httpd), uid 80: exited on signal 6
>
>The greylist entry is a standard 3am cron restart.



Why are the processes sigabrting ? Is there anything in the
application logs to indicate why they are exiting ?

         ---Mike

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Brett Bump
In reply to this post by Kris Kennaway-3


On Thu, 14 Feb 2008, Kris Kennaway wrote:

> We are going to need more information about your system.  What do you
> mean by "peak activity"?  What is running on the system when it performs
> badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
> kernel configuration, dmesg and relevant aspects of the system
> configuration?
>
> Kris
>

I would call 120 processes with a load average of 0.03 and 99.9 idle
with 10-20 sendmail processes and 30 apache jobs nothing to write home
about.  But when that jumps to 250 processes, a load average of 30 with
50% idle (5-10 second waits on single character ssh echo) a bit busy.
That usually means my heavy pop3 users are checking in at the same time
someone (or 2 or 3) have sent email to the large volume listservs.  Proc
stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
drive with /var/mail being 97-100% busy and iostat will always show hi
tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).

Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).

On Thu, 14 Feb 2008, Bill Moran wrote:

> What _is_ the hardware?

Dell PowerEdge 1750 1U, 146Gig U320s.  The Broadcoms seem to be a change
from the earlier 1550s with intel pro/100s (I prefer the intel's).

On Thu, 14 Feb 2008, Kris Kennaway wrote:

> All it takes is a single bug (e.g. in a driver) to affect performance on
> a certain specific configuration.  However, bugs tend to get fixed over
> time.  Maybe that is the case for you.  It is well worth verifying
> whether the problem persists on the most up-to-date sources, so that
> everyone's time is not wasted in tracking down a problem that is already
> fixed.  You can just do a source upgrade from 6.2, which will be quite
> straightforward.

Agreed.  I have a 2nd machine that is identical to this one I could put
6.3 on to test this.

> It is pretty unusual for applications to be aborting, but usually they
> do it because they fail an application-specific run-time check.  What
> diagnostics are logged by the applications?  You may need to increase
> their respective verbosity/debug levels.
>
> Kris
>

I was suspicious that maybe we needed more memory but swap has barely even
been touched (232k used...with 1400meg inactive).

On Thu, 14 Feb 2008, Mike Tancsa wrote:

> No, but you havent given the list much to go on as to what the
> problems are or what hardware you are using, or really quantified the
> issue. By "slow" is the disk blocking on IO ? or are processes
> blocking on network IO etc etc.  6.2 was not a "bad" release, but 6.3
> is better than 6.2.  By starting with a more contemporary release,
> less effort by developers and other users need to be exerted in
> figuring out if the problem(s) you are running into have already been
> fixed.

It appears to me that disk access is extremely slow.  I can transfer
large files between the machines faster than making a duplicate copy
on disk.

> Because the drivers have changed since 4.10.  "improvements" could
> have introduced regressions... Change in the driver to support newer
> versions of a chipset might break older chipsets.

Any known issues with the Dell PERC RAID driver that anyone is aware
of?  I can start there.

> bge is a good example of a driver that has had a lot of changes and
> hasnt worked all that well at times.... hence the suggestion to try
> 6.3 as there have been many bug fixes.  Whether or not it fixes your
> problem its hard to say, but start there to see if things are faster
> and stable for you etc.
> e.g.
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/bge/if_bge.c
>
> You should also post a full dmesg of the box as well as kernel config
> etc...

There kernel is generic with ipfirewall, quota and smp.

Feb 14 02:53:37 mail sm-mta[33143]: m1E9qKLZ033143: SYSERR(root): collect: I/O error on connection from astro.pryor.com, from=<[hidden email]>pid 31611 (milter-greylist), uid 25: exited on signal 3
Feb 14 03:17:08 mail sshd[34844]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(host-200-6-102-230.iia.cl, AF_INET) failed
Feb 14 03:17:08 mail sshd[34844]: refused connect from 200.6.102.230 (200.6.102.230)
Feb 14 03:36:30 mail sshd[35944]: refused connect from 202.129.44.218 (202.129.44.218)
Feb 14 03:45:21 mail sshd[36667]: refused connect from 202.129.44.218 (202.129.44.218)
Feb 14 03:52:01 mail sm-mta[33092]: m1E9peX3033092: SYSERR(root): collect: read timeout on connection from astro.pryor.com, from=<[hidden email]>
Feb 14 07:24:01 mail sshd[52723]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(42.215.6.200.intelnet.net.gt, AF_INET) failed
Feb 14 07:24:01 mail sshd[52723]: refused connect from 200.6.215.42 (200.6.215.42)
Feb 14 07:28:56 mail sm-mta[52866]: m1EEPPLC052866: SYSERR(root): collect: I/O error on connection from astro.pryor.com, from=<[hidden email]>
Feb 14 07:29:15 mail sshd[53465]: warning: /etc/hosts.allow, line 45: can't verify hostname: getaddrinfo(42.215.6.200.intelnet.net.gt, AF_INET) failed
Feb 14 07:29:15 mail sshd[53465]: refused connect from 200.6.215.42 (200.6.215.42)
Feb 14 08:01:57 mail sshd[58183]: refused connect from mail.rsib.net (12.46.46.98)
Feb 14 08:07:22 mail sshd[59017]: refused connect from mail.rsib.net (12.46.46.98)
Feb 14 09:50:00 mail su: bbump to root on /dev/ttyp0
pid 43464 (httpd), uid 80: exited on signal 6
pid 86995 (imapd), uid 2151: exited on signal 6
pid 85706 (httpd), uid 80: exited on signal 6
pid 87600 (imapd), uid 1376: exited on signal 6
pid 45621 (httpd), uid 80: exited on signal 6
pid 45617 (httpd), uid 80: exited on signal 6
Feb 14 11:28:36 mail inetd[48076]: imap4 from 208.107.161.82 exceeded counts/min (limit 60/min)
Feb 14 11:28:38 mail last message repeated 2 times
Feb 14 11:52:34 mail sm-mta[99563]: m1EHqX9u099563: SYSERR(root): collect: read timeout on connection from fulltimeconsult.com, from=<[hidden email]>
Feb 14 13:06:27 mail su: bbump to root on /dev/ttyp0
pid 45995 (imapd), uid 3115: exited on signal 6
pid 46407 (imapd), uid 1873: exited on signal 6
pid 46418 (imapd), uid 2769: exited on signal 6
pid 46402 (imapd), uid 1873: exited on signal 6
pid 46651 (imapd), uid 2769: exited on signal 6
pid 46653 (imapd), uid 2769: exited on signal 6
pid 44499 (httpd), uid 80: exited on signal 6
pid 47035 (imapd), uid 1873: exited on signal 6
pid 46083 (httpd), uid 80: exited on signal 6
pid 46395 (httpd), uid 80: exited on signal 6
pid 46604 (httpd), uid 80: exited on signal 6
pid 46603 (httpd), uid 80: exited on signal 6

> what does
> netstat -ni
> give

-bash-2.05b$ netstat -ni
Name    Mtu Network       Address              Ipkts Ierrs    Opkts Oerrs  Coll
bge0   1500 <Link#1>      00:0f:1f:66:0e:e6 12511748   902 12025487     0     0
bge0   1500 208.107.160/2 208.107.161.82    17011211     - 16533277     -     -
bge1   1500 <Link#2>      00:0f:1f:66:0e:e8  3523091   586  4089056     0     0
bge1   1500 10.1.1/24     10.1.1.1           3516790     -  4087415     -     -
lo0   16384 <Link#3>                         4659734     0  4659733     0     0
lo0   16384 fe80:3::1/64  fe80:3::1                0     -        0     -     -
lo0   16384 ::1/128       ::1                   2772     -     2772     -     -
lo0   16384 127           127.0.0.1           147255     -   147255     -     -

> and what options do you have on ifconfig ?  Are the errors seen on
> your switch port as well or just in netstat -ni ?

ifconfig_bge0="inet 208.107.161.82  netmask 255.255.254.0 media 100baseTX mediaopt full-duplex"
ifconfig_bge1="inet 10.1.1.1        netmask 255.255.255.0 media 100baseTX mediaopt full-duplex"

No, the switch shows clear, they only show up as input errors on this box.
The box sitting under this one has an uptime of 621 days with 1 Oerr.

> Why are the processes sigabrting ? Is there anything in the
> application logs to indicate why they are exiting ?
>
>          ---Mike
>

[Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
httpd in malloc(): error: recursive call
[Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
httpd in free(): error: recursive call
[Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
httpd in free(): error: recursive call

Memory.  This is why I was willing to throw another 2gig of memory in it,
but why am I only seeing 268K of swap used?

Brett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Guy Helmer
Brett Bump wrote:
> [Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
> httpd in malloc(): error: recursive call
> [Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
> httpd in free(): error: recursive call
> [Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
> httpd in free(): error: recursive call
>  
Do you have a mix of modules that are both multi-threaded and
single-threaded loaded in Apache?

I'm not sure what else could be a root cause for this particularly nasty
problem.

Guy

--
Guy Helmer, Ph.D.
Chief System Architect
Palisade Systems, Inc.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Steven Hartland
In reply to this post by Brett Bump
----- Original Message -----
From: "Brett Bump" <[hidden email]>
> I would call 120 processes with a load average of 0.03 and 99.9 idle
> with 10-20 sendmail processes and 30 apache jobs nothing to write home
> about.  But when that jumps to 250 processes, a load average of 30 with
> 50% idle (5-10 second waits on single character ssh echo) a bit busy.
> That usually means my heavy pop3 users are checking in at the same time
> someone (or 2 or 3) have sent email to the large volume listservs.  Proc
> stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
> drive with /var/mail being 97-100% busy and iostat will always show hi
> tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).

Are you running any php on your machine? If so did you upgrade php as
well from say 5.1.x => 5.2.x and make use of open_basedir is so
the following thread may be interest:-
PHP with open_basedir performance problem

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [hidden email].

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Brett Bump
In reply to this post by Guy Helmer


On Thu, 14 Feb 2008, Guy Helmer wrote:

> Brett Bump wrote:
> > [Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
> > httpd in malloc(): error: recursive call
> > [Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
> > httpd in free(): error: recursive call
> > [Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
> > httpd in free(): error: recursive call
> >
> Do you have a mix of modules that are both multi-threaded and
> single-threaded loaded in Apache?
>
> I'm not sure what else could be a root cause for this particularly nasty
> problem.
>
> Guy
>
> --
> Guy Helmer, Ph.D.
> Chief System Architect
> Palisade Systems, Inc.
>

Running apache_1.3.37 with php5 (about as generic as I can get).  Here is
the php extensions.ini file:

extension=ctype.so
extension=dom.so
extension=ftp.so
extension=gd.so
extension=gettext.so
extension=iconv.so
extension=pcre.so
extension=zlib.so
extension=pdo.so
extension=posix.so
extension=session.so
extension=simplexml.so
extension=sqlite.so
extension=tokenizer.so
extension=xml.so
extension=xmlreader.so
extension=xmlwriter.so
extension=mysql.so
extension=imap.so
extension=sockets.so

On Thu, 14 Feb 2008, Steven Hartland wrote:

> Are you running any php on your machine? If so did you upgrade php as
> well from say 5.1.x => 5.2.x and make use of open_basedir is so
> the following thread may be interest:-
> PHP with open_basedir performance problem
>
>     Regards
>     Steve

NO---That I did not do.  I installed 5.1 directly from 6.2 ports.  I also
have another server running php5.1 however it is running apache2.2 so this
might be something I can check (heading there now...thanks).


Brett
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Kris Kennaway-3
In reply to this post by Brett Bump
Brett Bump wrote:

>
> On Thu, 14 Feb 2008, Kris Kennaway wrote:
>
>> We are going to need more information about your system.  What do you
>> mean by "peak activity"?  What is running on the system when it performs
>> badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
>> kernel configuration, dmesg and relevant aspects of the system
>> configuration?
>>
>> Kris
>>
>
> I would call 120 processes with a load average of 0.03 and 99.9 idle
> with 10-20 sendmail processes and 30 apache jobs nothing to write home
> about.  But when that jumps to 250 processes, a load average of 30 with
> 50% idle (5-10 second waits on single character ssh echo) a bit busy.
> That usually means my heavy pop3 users are checking in at the same time
> someone (or 2 or 3) have sent email to the large volume listservs.  Proc
> stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
> drive with /var/mail being 97-100% busy and iostat will always show hi
> tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).
>
> Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).

OK, then you definitely need to update to 6.3, quota support in older
releases had performance problems.

> [Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
> httpd in malloc(): error: recursive call
> [Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
> httpd in free(): error: recursive call
> [Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
> httpd in free(): error: recursive call

These typically indicate application errors, or errors in how the
applications are compiled (e.g. linked to inconsistent sets of libraries).

Kris

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Kris Kennaway-3
Kris Kennaway wrote:

> Brett Bump wrote:
>>
>> On Thu, 14 Feb 2008, Kris Kennaway wrote:
>>
>>> We are going to need more information about your system.  What do you
>>> mean by "peak activity"?  What is running on the system when it performs
>>> badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
>>> kernel configuration, dmesg and relevant aspects of the system
>>> configuration?
>>>
>>> Kris
>>>
>>
>> I would call 120 processes with a load average of 0.03 and 99.9 idle
>> with 10-20 sendmail processes and 30 apache jobs nothing to write home
>> about.  But when that jumps to 250 processes, a load average of 30 with
>> 50% idle (5-10 second waits on single character ssh echo) a bit busy.
>> That usually means my heavy pop3 users are checking in at the same time
>> someone (or 2 or 3) have sent email to the large volume listservs.  Proc
>> stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
>> drive with /var/mail being 97-100% busy and iostat will always show hi
>> tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).
>>
>> Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).
>
> OK, then you definitely need to update to 6.3, quota support in older
> releases had performance problems.

Actually I am not sure it was possible to merge it to 6.x, it is
definitely in 7.0 though.

Kris

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Julian Elischer
In reply to this post by Kris Kennaway-3
Kris Kennaway wrote:

> Brett Bump wrote:
>>
>> On Thu, 14 Feb 2008, Kris Kennaway wrote:
>>
>>> We are going to need more information about your system.  What do you
>>> mean by "peak activity"?  What is running on the system when it performs
>>> badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
>>> kernel configuration, dmesg and relevant aspects of the system
>>> configuration?
>>>
>>> Kris
>>>
>>
>> I would call 120 processes with a load average of 0.03 and 99.9 idle
>> with 10-20 sendmail processes and 30 apache jobs nothing to write home
>> about.  But when that jumps to 250 processes, a load average of 30 with
>> 50% idle (5-10 second waits on single character ssh echo) a bit busy.
>> That usually means my heavy pop3 users are checking in at the same time
>> someone (or 2 or 3) have sent email to the large volume listservs.  Proc
>> stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
>> drive with /var/mail being 97-100% busy and iostat will always show hi
>> tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).
>>
>> Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).
>
> OK, then you definitely need to update to 6.3, quota support in older
> releases had performance problems.
>
>> [Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort
>> trap (6)
>> httpd in malloc(): error: recursive call
>> [Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort
>> trap (6)
>> httpd in free(): error: recursive call
>> [Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort
>> trap (6)
>> httpd in free(): error: recursive call
>

typically a printf() in a signal handler...

> These typically indicate application errors, or errors in how the
> applications are compiled (e.g. linked to inconsistent sets of libraries).
>
> Kris
>
> _______________________________________________
> [hidden email] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-performance
> To unsubscribe, send any mail to
> "[hidden email]"

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Uwe Doering
In reply to this post by Brett Bump
Brett Bump wrote:

>
> On Thu, 14 Feb 2008, Guy Helmer wrote:
>
>> Brett Bump wrote:
>>> [Thu Feb 14 09:59:23 2008] [notice] child pid 43464 exit signal Abort trap (6)
>>> httpd in malloc(): error: recursive call
>>> [Thu Feb 14 10:07:34 2008] [notice] child pid 85706 exit signal Abort trap (6)
>>> httpd in free(): error: recursive call
>>> [Thu Feb 14 10:48:39 2008] [notice] child pid 45621 exit signal Abort trap (6)
>>> httpd in free(): error: recursive call
>>>
>> Do you have a mix of modules that are both multi-threaded and
>> single-threaded loaded in Apache?
>>
>> I'm not sure what else could be a root cause for this particularly nasty
>> problem.
>>
>> Guy
>
> Running apache_1.3.37 with php5 (about as generic as I can get).  Here is
> the php extensions.ini file:
>
> extension=ctype.so
> extension=dom.so
> extension=ftp.so
> extension=gd.so
> extension=gettext.so
> extension=iconv.so
> extension=pcre.so
> extension=zlib.so
> extension=pdo.so
> extension=posix.so
> extension=session.so
> extension=simplexml.so
> extension=sqlite.so
> extension=tokenizer.so
> extension=xml.so
> extension=xmlreader.so
> extension=xmlwriter.so
> extension=mysql.so
> extension=imap.so
> extension=sockets.so

Have you tried sorting this list alphabetically?  Believe it or not,
when I tried to use Apache 1.3.x with PHP 5.2.x with extensions in
arbitrary order I got inexplicable crashes, too.

Now, of course it was just a coincidence that it worked for me after
sorting the extension list.  What this in fact points to is that the
order of extensions can be important in that list, for whatever reason.
  For me it worked after sorting the list, but YMMV.  Might be worth a
try, though.

Regards,

    Uwe
--
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
[hidden email]  |  http://www.escapebox.net
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Adrian Chadd-2
On 15/02/2008, Uwe Doering <[hidden email]> wrote:
> Have you tried sorting this list alphabetically?  Believe it or not,
>  when I tried to use Apache 1.3.x with PHP 5.2.x with extensions in
>  arbitrary order I got inexplicable crashes, too.

Ah, stuff like "apache-ssl init's the SSL library, then php + ssl
init's the SSL library, and stuff gets funny."


--
Adrian Chadd - [hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Uwe Doering
Adrian Chadd wrote:
> On 15/02/2008, Uwe Doering <[hidden email]> wrote:
>> Have you tried sorting this list alphabetically?  Believe it or not,
>>  when I tried to use Apache 1.3.x with PHP 5.2.x with extensions in
>>  arbitrary order I got inexplicable crashes, too.
>
> Ah, stuff like "apache-ssl init's the SSL library, then php + ssl
> init's the SSL library, and stuff gets funny."

Right.  It has probably to do with some linking and initialization
details of the dynamic libraries involved.  However, in my case the
offending interaction seemed to be just between the PHP extension
modules.  To fix the problem I didn't have to change the load order of
the Apache modules.

Since then I have a line in my PHP upgrade notes that reminds me of
sorting the extension list as a last step.  This is certainly a
pragmatic approach, but for lack of time I didn't bother getting
acquainted with the PHP internals, and those of the libs involved, to
find the root cause.

Regards,

    Uwe
--
Uwe Doering         |  EscapeBox - Managed On-Demand UNIX Servers
[hidden email]  |  http://www.escapebox.net
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Kris Kennaway-3
In reply to this post by Kris Kennaway-3
Kris Kennaway wrote:

> Kris Kennaway wrote:
>> Brett Bump wrote:
>>>
>>> On Thu, 14 Feb 2008, Kris Kennaway wrote:
>>>
>>>> We are going to need more information about your system.  What do you
>>>> mean by "peak activity"?  What is running on the system when it
>>>> performs
>>>> badly (check top -S, ps, gstat, vmstat -w, vmstat -i).  What is your
>>>> kernel configuration, dmesg and relevant aspects of the system
>>>> configuration?
>>>>
>>>> Kris
>>>>
>>>
>>> I would call 120 processes with a load average of 0.03 and 99.9 idle
>>> with 10-20 sendmail processes and 30 apache jobs nothing to write home
>>> about.  But when that jumps to 250 processes, a load average of 30 with
>>> 50% idle (5-10 second waits on single character ssh echo) a bit busy.
>>> That usually means my heavy pop3 users are checking in at the same time
>>> someone (or 2 or 3) have sent email to the large volume listservs.  Proc
>>> stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
>>> drive with /var/mail being 97-100% busy and iostat will always show hi
>>> tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).
>>>
>>> Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).
>>
>> OK, then you definitely need to update to 6.3, quota support in older
>> releases had performance problems.
>
> Actually I am not sure it was possible to merge it to 6.x, it is
> definitely in 7.0 though.

I checked with the developer, and no-one running 6.x and quotas ever
replied to multiple requests to test the patch.  It can be found here if
you want to resolve this performance problem without upgrading to 7.0:

http://people.freebsd.org/~kib/quotagiant/quotas-RELENG_6-20070623-1455.patch

Kris

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

Aleksey Perov
Kris Kennaway wrote:

> I checked with the developer, and no-one running 6.x and quotas ever
> replied to multiple requests to test the patch.  It can be found here if
> you want to resolve this performance problem without upgrading to 7.0:
>
> http://people.freebsd.org/~kib/quotagiant/quotas-RELENG_6-20070623-1455.patch

Well, I'm running RELENG_6 SMP system with this patch since september
2007 (last updated 2008-02-08), and I haven't seen any quota-related
problem. Quota-enabled partition is 60 GB UFS2 containing 25,000+
customers' home directories with Maildir-style mailboxes and surviving
100,000+ deliveries (file creations) and approximately the same number
of retrievals (file deletions) each day.

mail:~# df -i /disk1
Filesystem  1K-blocks     Used    Avail Capacity iused   ifree %iused  Mounted on
/dev/da1s1d  66343254 25904146 35131648    42% 1479119 7117359   17%   /disk1
mail:~# mount
/dev/da1s1d on /disk1 (ufs, local, noexec, nosuid, with quotas, soft-updates)

If I can run some specific (stress?) tests that might provide useful
information, let me know. Output of dmesg, vmstat, iostat, top, etc --
just ask.



--
Aleksey

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: System perforamance 4.x vs. 5.x and 6.x

mdtancsa
In reply to this post by Brett Bump
At 05:27 PM 2/14/2008, Brett Bump wrote:

>stat doesn't show as much as gstat and iostat.  Gstat alwasy shows my
>drive with /var/mail being 97-100% busy and iostat will always show hi
>tps rates, but never anything above 8MB/s (4.10 gave me 30MB/s+).

If a lot of users are checking mail at once, the disk might be
busying seeking around the disk.



>Kernel is generic with ipfirewall quota and smp (no ipfw rules yet).

I would change to pf instead of ipf as its better supported. Or just use ipfw.

>On Thu, 14 Feb 2008, Bill Moran wrote:
>
> > What _is_ the hardware?
>
>Dell PowerEdge 1750 1U, 146Gig U320s.  The Broadcoms seem to be a change
>from the earlier 1550s with intel pro/100s (I prefer the intel's).

So this is not the same hardware as before that was running releng_4 ?


>I was suspicious that maybe we needed more memory but swap has barely even
>been touched (232k used...with 1400meg inactive).

Stiill, it might help by allowing more caching... Also I would still
increase dirhash as you are getting close to the limit.  Also, if you
have a large master.passwd file (e.g. > 1000), try changing
nsswitch.conf  as instructed in

http://www.freebsd.org/cgi/query-pr.cgi?pr=75855

We had to do this on our pop server otherwise just doing an ls in
/var/mail would take several minutes.


> > what does
> > netstat -ni
> > give
>
>-bash-2.05b$ netstat -ni
>Name    Mtu Network       Address              Ipkts Ierrs    Opkts
>Oerrs  Coll
>bge0   1500 <Link#1>      00:0f:1f:66:0e:e6 12511748   902
>12025487     0     0
>bge0   1500 208.107.160/2 208.107.161.82    17011211     -
>16533277     -     -
>bge1   1500
><Link#2>      00:0f:1f:66:0e:e8  3523091   586  4089056     0     0
>bge1   1500
>10.1.1/24     10.1.1.1           3516790     -  4087415     -     -
>lo0   16384
><Link#3>                         4659734     0  4659733     0     0
>lo0   16384
>fe80:3::1/64  fe80:3::1                0     -        0     -     -
>lo0   16384
>::1/128       ::1                   2772     -     2772     -     -
>lo0   16384
>127           127.0.0.1           147255     -   147255     -     -
>
> > and what options do you have on ifconfig ?  Are the errors seen on
> > your switch port as well or just in netstat -ni ?
>
>ifconfig_bge0="inet 208.107.161.82  netmask 255.255.254.0 media
>100baseTX mediaopt full-duplex"
>ifconfig_bge1="inet 10.1.1.1        netmask 255.255.255.0 media
>100baseTX mediaopt full-duplex"
>
>No, the switch shows clear, they only show up as input errors on this box.
>The box sitting under this one has an uptime of 621 days with 1 Oerr.


I seem to recall people having issues with the media selection using
bge based nics.  e.g.
http://www.freebsd.org/cgi/query-pr.cgi?pr=112570

I would try using autoneg instead. There are other options that might
not be getting set right (e.g. FC).  autoneg might take care of it
for you, but as I said before there have been a number of bug fixes
to the driver since 6.2.  Similarly, the ports you are using have
known security issues from 6.2 so you are better off to start from
6.3 and its port as you will have less patching to do.

         ---Mike

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
123