rrdtool / mtr causing stalling on 7.0

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

rrdtool / mtr causing stalling on 7.0

Steven Hartland
We've been suffering on our stats box for some time now
where by the machine will just stall for several seconds
preventing everything from tab completion to vi newfile.txt.

I was hoping an upgrade to 7.0 and ULE may help the situation
but unfortunately it hasn't.

I've attached both dmesg and output from lock profiling during
a 5 minute period where I know the stall happened at least
once.

Any advice / pointers would be gratefully received.

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [hidden email].
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: rrdtool / mtr causing stalling on 7.0

Robert N. M. Watson-2
On Sat, 8 Mar 2008, Steven Hartland wrote:

> We've been suffering on our stats box for some time now where by the machine
> will just stall for several seconds preventing everything from tab
> completion to vi newfile.txt.
>
> I was hoping an upgrade to 7.0 and ULE may help the situation but
> unfortunately it hasn't.
>
> I've attached both dmesg and output from lock profiling during a 5 minute
> period where I know the stall happened at least once.
>
> Any advice / pointers would be gratefully received.

It looks like the attachment got lost on the way through the mailing list.

I think the first starting point is: what sort of stall is this?  Is it, for
example, all network communication stalling, all disk I/O stalling, or the
entire kernel and all processes stalling?  The usual diagnostics are:

- Does the machine stop responding to pings while stalled, and/or possibly
   "catch up" all at once when it recovers?

- If you run the following loop on the machine without any network or console
   I/O, do you see gaps in time stamps:

  while (1) {
  sleep 1
  date >> date.log
  }

- If you write a short C program that looks a lot like the above loop, but
   logs time stamps into an in-memory buffer, and have it look for gaps in the
   sequence of >3 seconds, does it run across the stall?

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: rrdtool / mtr causing stalling on 7.0

Steven Hartland

----- Original Message -----
From: "Robert Watson" <[hidden email]>

> It looks like the attachment got lost on the way through the mailing list.
>
> I think the first starting point is: what sort of stall is this?  Is it, for
> example, all network communication stalling, all disk I/O stalling, or the
> entire kernel and all processes stalling?  The usual diagnostics are:
>
> - Does the machine stop responding to pings while stalled, and/or possibly
>   "catch up" all at once when it recovers?
>
> - If you run the following loop on the machine without any network or console
>   I/O, do you see gaps in time stamps:
>
>  while (1) {
>  sleep 1
>  date >> date.log
>  }
>
> - If you write a short C program that looks a lot like the above loop, but
>   logs time stamps into an in-memory buffer, and have it look for gaps in the
>   sequence of >3 seconds, does it run across the stall?

Thanks for the ideas Robert the output from the shell script
this shows significant gaps:-
Sun Mar  9 00:20:33 GMT 2008
Sun Mar  9 00:20:34 GMT 2008 <== Stall
Sun Mar  9 00:21:09 GMT 2008
Sun Mar  9 00:21:10 GMT 2008
...
Sun Mar  9 00:25:23 GMT 2008
Sun Mar  9 00:25:24 GMT 2008
Sun Mar  9 00:25:25 GMT 2008
Sun Mar  9 00:25:27 GMT 2008 <== Stall
Sun Mar  9 00:25:53 GMT 2008
Sun Mar  9 00:25:59 GMT 2008
Sun Mar  9 00:26:00 GMT 2008

Running a ping along side shows no missed responses.

Enabling lock profiling for the period changes the behaviour somewhat,
producing shorter but multiple stalls.

Sun Mar  9 00:30:31 GMT 2008
Sun Mar  9 00:30:32 GMT 2008
Sun Mar  9 00:30:34 GMT 2008
Sun Mar  9 00:30:35 GMT 2008
Sun Mar  9 00:30:36 GMT 2008
Sun Mar  9 00:30:37 GMT 2008
Sun Mar  9 00:30:38 GMT 2008
Sun Mar  9 00:30:41 GMT 2008
Sun Mar  9 00:30:42 GMT 2008 <== Stall
Sun Mar  9 00:30:44 GMT 2008
Sun Mar  9 00:30:45 GMT 2008 <== Stall
Sun Mar  9 00:30:47 GMT 2008 <== Stall
Sun Mar  9 00:30:49 GMT 2008
Sun Mar  9 00:30:50 GMT 2008 <== Stall
Sun Mar  9 00:30:52 GMT 2008 <== Stall
Sun Mar  9 00:30:54 GMT 2008
Sun Mar  9 00:30:55 GMT 2008
Sun Mar  9 00:30:56 GMT 2008
Sun Mar  9 00:30:57 GMT 2008 <== Stall
Sun Mar  9 00:31:03 GMT 2008 <== Stall
Sun Mar  9 00:31:05 GMT 2008
Sun Mar  9 00:31:06 GMT 2008 <== Stall
Sun Mar  9 00:31:08 GMT 2008
Sun Mar  9 00:31:09 GMT 2008
Sun Mar  9 00:31:10 GMT 2008
Sun Mar  9 00:31:11 GMT 2008 <== Stall
Sun Mar  9 00:31:14 GMT 2008
Sun Mar  9 00:31:15 GMT 2008
Sun Mar  9 00:31:16 GMT 2008 <== Stall
Sun Mar  9 00:31:20 GMT 2008
Sun Mar  9 00:31:21 GMT 2008
Sun Mar  9 00:31:22 GMT 2008

Using the following c code we also see stalls:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main( char **argv, int argc )
{
    time_t last = time( NULL );
    while ( 1 )
    {
        time_t now = time( NULL );
        time_t diff = now - last;
        if ( diff >= 2 )
        {
            fprintf( stderr, "stalled for %d seconds\n", diff );
        }
        fprintf( stderr, ctime( &now ) );
        last = now;
        sleep( 1 );
    }

    exit( 0 );
}

[date.log]
Sun Mar  9 00:55:40 GMT 2008
Sun Mar  9 00:55:43 GMT 2008 <== Stall
Sun Mar  9 00:56:11 GMT 2008
Sun Mar  9 00:56:12 GMT 2008
Sun Mar  9 00:56:13 GMT 2008
Sun Mar  9 00:56:14 GMT 2008
Sun Mar  9 00:56:15 GMT 2008
[/date.log]

[timec output]
Sun Mar  9 00:55:40 2008
Sun Mar  9 00:55:41 2008
Sun Mar  9 00:55:42 2008
stalled for 2 seconds
Sun Mar  9 00:55:44 2008
stalled for 5 seconds
Sun Mar  9 00:55:49 2008
stalled for 2 seconds
Sun Mar  9 00:55:51 2008
stalled for 2 seconds
Sun Mar  9 00:55:53 2008
Sun Mar  9 00:55:54 2008
Sun Mar  9 00:55:55 2008
Sun Mar  9 00:55:56 2008
Sun Mar  9 00:55:57 2008
Sun Mar  9 00:55:58 2008
Sun Mar  9 00:55:59 2008
Sun Mar  9 00:56:00 2008
Sun Mar  9 00:56:01 2008
Sun Mar  9 00:56:02 2008
Sun Mar  9 00:56:03 2008
Sun Mar  9 00:56:04 2008
Sun Mar  9 00:56:05 2008
Sun Mar  9 00:56:06 2008
Sun Mar  9 00:56:07 2008
Sun Mar  9 00:56:08 2008
Sun Mar  9 00:56:09 2008
Sun Mar  9 00:56:10 2008
Sun Mar  9 00:56:11 2008
Sun Mar  9 00:56:12 2008
Sun Mar  9 00:56:13 2008
Sun Mar  9 00:56:14 2008
Sun Mar  9 00:56:15 2008
[/timec output]


As the list ate the attachment, the output from the lock profile
can be found here:-
ftp://ftp1.multiplay.co.uk/pub/other/freebsd-7.0-rrdtool-stall.zip

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [hidden email].

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: rrdtool / mtr causing stalling on 7.0

Steven Hartland
Hi Kris I was wondering if you would be so kind as to take a look
at the results below to see if they highlight anything that might be the
cause of this performance issue.

I've raised this on the rrdtool list and quite a few people seem able to
run 10* the amount of updates on none FreeBSD systems without these
disruptive system wide stalls.

Given this and the statement on one of your papers saying you would be
interested in any loads that don't run well FreeBSD I hoped you might be
able to have a look at this and provide us with areas to focus on.

    Regards
    Steve
----- Original Message -----
From: "Steven Hartland"

> ----- Original Message -----
> From: "Robert Watson" <[hidden email]>
>> It looks like the attachment got lost on the way through the mailing list.
>>
>> I think the first starting point is: what sort of stall is this?  Is it, for
>> example, all network communication stalling, all disk I/O stalling, or the
>> entire kernel and all processes stalling?  The usual diagnostics are:
>>
>> - Does the machine stop responding to pings while stalled, and/or possibly
>>   "catch up" all at once when it recovers?
>>
>> - If you run the following loop on the machine without any network or console
>>   I/O, do you see gaps in time stamps:
>>
>>  while (1) {
>>  sleep 1
>>  date >> date.log
>>  }
>>
>> - If you write a short C program that looks a lot like the above loop, but
>>   logs time stamps into an in-memory buffer, and have it look for gaps in the
>>   sequence of >3 seconds, does it run across the stall?
>
> Thanks for the ideas Robert the output from the shell script
> this shows significant gaps:-
> Sun Mar  9 00:20:33 GMT 2008
> Sun Mar  9 00:20:34 GMT 2008 <== Stall
> Sun Mar  9 00:21:09 GMT 2008
> Sun Mar  9 00:21:10 GMT 2008
> ...
> Sun Mar  9 00:25:23 GMT 2008
> Sun Mar  9 00:25:24 GMT 2008
> Sun Mar  9 00:25:25 GMT 2008
> Sun Mar  9 00:25:27 GMT 2008 <== Stall
> Sun Mar  9 00:25:53 GMT 2008
> Sun Mar  9 00:25:59 GMT 2008
> Sun Mar  9 00:26:00 GMT 2008
>
> Running a ping along side shows no missed responses.
>
> Enabling lock profiling for the period changes the behaviour somewhat,
> producing shorter but multiple stalls.
>
> Sun Mar  9 00:30:31 GMT 2008
> Sun Mar  9 00:30:32 GMT 2008
> Sun Mar  9 00:30:34 GMT 2008
> Sun Mar  9 00:30:35 GMT 2008
> Sun Mar  9 00:30:36 GMT 2008
> Sun Mar  9 00:30:37 GMT 2008
> Sun Mar  9 00:30:38 GMT 2008
> Sun Mar  9 00:30:41 GMT 2008
> Sun Mar  9 00:30:42 GMT 2008 <== Stall
> Sun Mar  9 00:30:44 GMT 2008
> Sun Mar  9 00:30:45 GMT 2008 <== Stall
> Sun Mar  9 00:30:47 GMT 2008 <== Stall
> Sun Mar  9 00:30:49 GMT 2008
> Sun Mar  9 00:30:50 GMT 2008 <== Stall
> Sun Mar  9 00:30:52 GMT 2008 <== Stall
> Sun Mar  9 00:30:54 GMT 2008
> Sun Mar  9 00:30:55 GMT 2008
> Sun Mar  9 00:30:56 GMT 2008
> Sun Mar  9 00:30:57 GMT 2008 <== Stall
> Sun Mar  9 00:31:03 GMT 2008 <== Stall
> Sun Mar  9 00:31:05 GMT 2008
> Sun Mar  9 00:31:06 GMT 2008 <== Stall
> Sun Mar  9 00:31:08 GMT 2008
> Sun Mar  9 00:31:09 GMT 2008
> Sun Mar  9 00:31:10 GMT 2008
> Sun Mar  9 00:31:11 GMT 2008 <== Stall
> Sun Mar  9 00:31:14 GMT 2008
> Sun Mar  9 00:31:15 GMT 2008
> Sun Mar  9 00:31:16 GMT 2008 <== Stall
> Sun Mar  9 00:31:20 GMT 2008
> Sun Mar  9 00:31:21 GMT 2008
> Sun Mar  9 00:31:22 GMT 2008
>
> Using the following c code we also see stalls:
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> int main( char **argv, int argc )
> {
>    time_t last = time( NULL );
>    while ( 1 )
>    {
>        time_t now = time( NULL );
>        time_t diff = now - last;
>        if ( diff >= 2 )
>        {
>            fprintf( stderr, "stalled for %d seconds\n", diff );
>        }
>        fprintf( stderr, ctime( &now ) );
>        last = now;
>        sleep( 1 );
>    }
>
>    exit( 0 );
> }
>
> [date.log]
> Sun Mar  9 00:55:40 GMT 2008
> Sun Mar  9 00:55:43 GMT 2008 <== Stall
> Sun Mar  9 00:56:11 GMT 2008
> Sun Mar  9 00:56:12 GMT 2008
> Sun Mar  9 00:56:13 GMT 2008
> Sun Mar  9 00:56:14 GMT 2008
> Sun Mar  9 00:56:15 GMT 2008
> [/date.log]
>
> [timec output]
> Sun Mar  9 00:55:40 2008
> Sun Mar  9 00:55:41 2008
> Sun Mar  9 00:55:42 2008
> stalled for 2 seconds
> Sun Mar  9 00:55:44 2008
> stalled for 5 seconds
> Sun Mar  9 00:55:49 2008
> stalled for 2 seconds
> Sun Mar  9 00:55:51 2008
> stalled for 2 seconds
> Sun Mar  9 00:55:53 2008
> Sun Mar  9 00:55:54 2008
> Sun Mar  9 00:55:55 2008
> Sun Mar  9 00:55:56 2008
> Sun Mar  9 00:55:57 2008
> Sun Mar  9 00:55:58 2008
> Sun Mar  9 00:55:59 2008
> Sun Mar  9 00:56:00 2008
> Sun Mar  9 00:56:01 2008
> Sun Mar  9 00:56:02 2008
> Sun Mar  9 00:56:03 2008
> Sun Mar  9 00:56:04 2008
> Sun Mar  9 00:56:05 2008
> Sun Mar  9 00:56:06 2008
> Sun Mar  9 00:56:07 2008
> Sun Mar  9 00:56:08 2008
> Sun Mar  9 00:56:09 2008
> Sun Mar  9 00:56:10 2008
> Sun Mar  9 00:56:11 2008
> Sun Mar  9 00:56:12 2008
> Sun Mar  9 00:56:13 2008
> Sun Mar  9 00:56:14 2008
> Sun Mar  9 00:56:15 2008
> [/timec output]
>
>
> As the list ate the attachment, the output from the lock profile
> can be found here:-
> ftp://ftp1.multiplay.co.uk/pub/other/freebsd-7.0-rrdtool-stall.zip
>
>    Regards
>    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to [hidden email].

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[hidden email]"