high CPU usage in FreeBSD for a PCIe card driver

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

high CPU usage in FreeBSD for a PCIe card driver

Steevan Rodrigues
Hi

I  come from Linux background and new to FreeBSD.

I am investigating a high CPU usage issue on FreeBSD for PCIe driver of a
card . The card can process data at a rate of 30 to 50 Gbps . However, I
see very high CPU usage even when I run a single threaded user side
application which uses this card .

I ran hotkernel script from Dtrace toolkit.   A portion of the output is
given in the attached file  dtraceDiff.txt. Here I see that lock_delay
seems to be the reason for high CPU usage. Looks like it comes from mutex
locks

This driver uses  mutex locks ( mtx_lock).

The CPU usage is very high  on a multicore (Xeon CPU) server node than a
dual core (Intel i3) Desktop .

It shows 30 to 40 % CPU usage on a Intel i3 ( 2 core 4 threads) and 60 to
70% CPU usage  on a Xeon  CPU with 12 core (24 threads) for same output
data rate .

I wonder whether usage of mtx_lock is the cause of high CPU usage ? Any
pointers or suggestions welcome.

Using  FreeBSD 11.1 Release

Thanks

Steev

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

dtraceDiff.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Warner Losh
No. It's not from mtx_lock. It's likely from lock contention. Since the
list ate the attachment, I can't check for sure, but high CPU usage in
locking primitives is a classic sign of poor lock design leading to
contention.

Warner

On Wed, Jun 27, 2018 at 6:28 AM, Steevan Rodrigues <[hidden email]>
wrote:

> Hi
>
> I  come from Linux background and new to FreeBSD.
>
> I am investigating a high CPU usage issue on FreeBSD for PCIe driver of a
> card . The card can process data at a rate of 30 to 50 Gbps . However, I
> see very high CPU usage even when I run a single threaded user side
> application which uses this card .
>
> I ran hotkernel script from Dtrace toolkit.   A portion of the output is
> given in the attached file  dtraceDiff.txt. Here I see that lock_delay
> seems to be the reason for high CPU usage. Looks like it comes from mutex
> locks
>
> This driver uses  mutex locks ( mtx_lock).
>
> The CPU usage is very high  on a multicore (Xeon CPU) server node than a
> dual core (Intel i3) Desktop .
>
> It shows 30 to 40 % CPU usage on a Intel i3 ( 2 core 4 threads) and 60 to
> 70% CPU usage  on a Xeon  CPU with 12 core (24 threads) for same output
> data rate .
>
> I wonder whether usage of mtx_lock is the cause of high CPU usage ? Any
> pointers or suggestions welcome.
>
> Using  FreeBSD 11.1 Release
>
> Thanks
>
> Steev
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[hidden email]"
>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Steevan Rodrigues
Thank you for the quickly reply.     The same driver gives pretty good
performance with low CPU usage in kernel mode.
I see high CPU usage only for user mode application.

Here is the Dtrace hotkernel output details provided in the earlier
attachment.

CPU usage details on a  12 core 24 thread server PC.

part of the perl hotkernel  output

kernel`free                                              1314   0.2%
kernel`_sx_slock                                         1415   0.3%
kernel`dev_relthread                                     1435   0.3%
kernel`devvn_refthread                                   1842   0.3%
kernel`_mtx_lock_spin_cookie                             2059   0.4%
kernel`copyin                                            2164   0.4%
kernel`amd64_syscall                                     2235   0.4%
kernel`__mtx_lock_spin_flags                             3566   0.6%
kernel`bzero                                             4889   0.9%
kernel`__mtx_lock_sleep                                  9323   1.7%
kernel`acpi_cpu_idle                                    11166   2.0%
kernel`pmap_extract                                     17491   3.1%
kernel`spinlock_exit                                    64850  11.5%
kernel`cpu_idle                                        134617  23.9%
kernel`lock_delay                                      263330  46.8%

-------------------------------------------------

CPU usage details on a  2 core 4 thread Intel i3 based Desktop PC.

part of the perl hotkernel  output


kernel`0xffffffff80                                       558   0.6%
kernel`copyout                                            592   0.6%
0xffffffff822368a3                                        639   0.7%
kernel`amd64_syscall                                      683   0.7%
kernel`__mtx_unlock_flags                                 728   0.8%
kernel`copyin                                            1311   1.4%
kernel`bzero                                             1434   1.5%
kernel`pmap_extract                                      1615   1.7%
kernel`lock_delay                                        2329   2.4%
0xffffffff822b0b5e                                       2528   2.6%
kernel`spinlock_exit                                    20950  21.8%
kernel`cpu_idle                                         48124  50.2%

------------------------------------------------------

Thanks
Steevan

On Wed, Jun 27, 2018 at 7:41 PM, Warner Losh <[hidden email]> wrote:

> No. It's not from mtx_lock. It's likely from lock contention. Since the
> list ate the attachment, I can't check for sure, but high CPU usage in
> locking primitives is a classic sign of poor lock design leading to
> contention.
>
> Warner
>
> On Wed, Jun 27, 2018 at 6:28 AM, Steevan Rodrigues <
> [hidden email]> wrote:
>
>> Hi
>>
>> I  come from Linux background and new to FreeBSD.
>>
>> I am investigating a high CPU usage issue on FreeBSD for PCIe driver of a
>> card . The card can process data at a rate of 30 to 50 Gbps . However, I
>> see very high CPU usage even when I run a single threaded user side
>> application which uses this card .
>>
>> I ran hotkernel script from Dtrace toolkit.   A portion of the output is
>> given in the attached file  dtraceDiff.txt. Here I see that lock_delay
>> seems to be the reason for high CPU usage. Looks like it comes from mutex
>> locks
>>
>> This driver uses  mutex locks ( mtx_lock).
>>
>> The CPU usage is very high  on a multicore (Xeon CPU) server node than a
>> dual core (Intel i3) Desktop .
>>
>> It shows 30 to 40 % CPU usage on a Intel i3 ( 2 core 4 threads) and 60 to
>> 70% CPU usage  on a Xeon  CPU with 12 core (24 threads) for same output
>> data rate .
>>
>> I wonder whether usage of mtx_lock is the cause of high CPU usage ? Any
>> pointers or suggestions welcome.
>>
>> Using  FreeBSD 11.1 Release
>>
>> Thanks
>>
>> Steev
>>
>> _______________________________________________
>> [hidden email] mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "[hidden email]
>> "
>>
>>
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Andrew Gallatin
In reply to this post by Steevan Rodrigues
On 06/27/18 08:28, Steevan Rodrigues wrote:
> Hi
>
> I  come from Linux background and new to FreeBSD.
>
> I am investigating a high CPU usage issue on FreeBSD for PCIe driver of a

<...>

>
> I wonder whether usage of mtx_lock is the cause of high CPU usage ? Any
> pointers or suggestions welcome.


It seems like you're experiencing lock contention.  The next step is to
determine which lock is contended and why.

I would suggest using the lockstat program to identify the lock which
is contended.  Try this:  lockstat -x aggsize=4m sleep 10 > out

And this:  lockstat -x aggsize=4m -s 10 sleep 10 > out-stacks

The first will show you contended locks broken down by type.  Eg:


Adaptive mutex spin: 61262 events in 10.007 seconds (6122 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
15958  26%  26% 0.00     2960 tcp_hpts_lck           __tcp_hpts_remove+0xb5
11723  19%  45% 0.00     2608 tcp_hpts_lck
tcp_hpts_insert_diag+0xaf
  6903  11%  56% 0.00     4605 tcp_hpts_lck           tcp_hpts_thread+0x11e
  6834  11%  68% 0.00   125868 sleep mtxpool          _sleep+0x2e3
  2756   4%  72% 0.00     1530 vm page
sendfile_free_page+0x1c1


The second will show you the stacks associated with the lock contention,
and may help you identify the source of contention if, for example, your
driver is causing contention on a kernel lock.

Drew
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Steevan Rodrigues
Thank you so much for the suggestions .  I will use these commands.
Yes, I am already working on to identify lock contentions.

Re-built FreeBSD kernel by enabling lock profiling and  now I am able to
see some issues with contention.

Thanks
Steevan

On Thu, Jun 28, 2018 at 6:34 PM, Andrew Gallatin <[hidden email]>
wrote:

> On 06/27/18 08:28, Steevan Rodrigues wrote:
>
>> Hi
>>
>> I  come from Linux background and new to FreeBSD.
>>
>> I am investigating a high CPU usage issue on FreeBSD for PCIe driver of a
>>
>
> <...>
>
>
>> I wonder whether usage of mtx_lock is the cause of high CPU usage ? Any
>> pointers or suggestions welcome.
>>
>
>
> It seems like you're experiencing lock contention.  The next step is to
> determine which lock is contended and why.
>
> I would suggest using the lockstat program to identify the lock which
> is contended.  Try this:  lockstat -x aggsize=4m sleep 10 > out
>
> And this:  lockstat -x aggsize=4m -s 10 sleep 10 > out-stacks
>
> The first will show you contended locks broken down by type.  Eg:
>
>
> Adaptive mutex spin: 61262 events in 10.007 seconds (6122 events/sec)
>
> Count indv cuml rcnt     nsec Lock                   Caller
> ------------------------------------------------------------
> -------------------
> 15958  26%  26% 0.00     2960 tcp_hpts_lck           __tcp_hpts_remove+0xb5
> 11723  19%  45% 0.00     2608 tcp_hpts_lck tcp_hpts_insert_diag+0xaf
>  6903  11%  56% 0.00     4605 tcp_hpts_lck           tcp_hpts_thread+0x11e
>  6834  11%  68% 0.00   125868 sleep mtxpool          _sleep+0x2e3
>  2756   4%  72% 0.00     1530 vm page sendfile_free_page+0x1c1
>
>
> The second will show you the stacks associated with the lock contention,
> and may help you identify the source of contention if, for example, your
> driver is causing contention on a kernel lock.
>
> Drew
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Andrew Gallatin
On 06/28/18 12:44, Steevan Rodrigues wrote:
 > Thank you so much for the suggestions .  I will use these commands.
 > Yes, I am already working on to identify lock contentions.
 >
 > Re-built FreeBSD kernel by enabling lock profiling and  now I am able to
 > see some issues with contention.
 >
 > Thanks
 > Steevan


You need-not rebuild freebsd.  lockstat is based on dtrace..

The lock profiling you're referring to is a different mechanism.

Drew
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Steevan Rodrigues
Hello Andrew,

Sorry to bother you.  I am working on this problem again after a break of
few days.
I ran following command  to get lock statistics  when I run my userspace
application on a 12 core 24 thread server PC.
From this data below  to looks like my driver is causing a contention on a
kernel lock  (pmap ).  Am I right ?

 lockstat -x aggsize=4m -D 20 sleep 10

Adaptive mutex spin: 1122679 events in 10.013 seconds (112121 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
1089662  97%  97% 0.00    65375 pmap                   pmap_extract+0x1d2
31805   3% 100% 0.00    14881 cdev                   devvn_refthread+0x142
  651   0% 100% 0.00   180642 umtxql                 umtxq_sleep+0x1f5
  254   0% 100% 0.00    30212 umtxql
__umtx_op_sem2_wake+0x3b6
   68   0% 100% 0.00    14218 umtxql                 umtxq_busy+0x17e
   63   0% 100% 0.00     3441 umtxql
__umtx_op_sem2_wake+0x410
   59   0% 100% 0.00     1067 swi                    taskqueue_enqueue+0xba
   42   0% 100% 0.00     1632 swi
taskqueue_run_locked+0x4f
   21   0% 100% 0.00    65453 umtxql                 do_lock_umutex+0x5ec
   13   0% 100% 0.00     3693 umtxql                 do_sem2_wait+0x948
    8   0% 100% 0.00    49129 USB device mutex       usb_callback_proc+0x114
    7   0% 100% 0.00     2248 swi                    taskqueue_run+0x117
    5   0% 100% 0.00    11294 umtxql                 do_lock_umutex+0x632
    5   0% 100% 0.00    67423 umtxql                 _sleep+0x2e4
    3   0% 100% 0.00      697 umtxql                 do_cv_wait+0x619
    3   0% 100% 0.00    58600 Giant                  _cv_wait+0x1d1
    2   0% 100% 0.00    21866 umtxql
__umtx_op_wake2_umutex+0x419
    2   0% 100% 0.00     6097 cdev                   count_dev+0xe
    2   0% 100% 0.00     4695 cdev                   vputx+0xe0
    1   0% 100% 0.00      754 umtxql                 do_unlock_umutex+0x1027
-------------------------------------------------------------------------------
Adaptive mutex block: 10182 events in 10.013 seconds (1017 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
 9590  94%  94% 0.00   283901 pmap                   pmap_extract+0x1d2
  570   6% 100% 0.00   280204 cdev                   devvn_refthread+0x142
   13   0% 100% 0.00    36414 umtxql                 umtxq_sleep+0x1f5
    8   0% 100% 0.00   156849 umtxql
__umtx_op_sem2_wake+0x3b6
    1   0% 100% 0.00   169422 cdev                   vputx+0xe0
-------------------------------------------------------------------------------

Spin lock spin: 683399 events in 10.013 seconds (68250 events/sec)

Count indv cuml rcnt     nsec Lock                   Caller
-------------------------------------------------------------------------------
466417  68%  68% 0.00   137603 sleepq chain
MXD_ioctlFifoGetData.constprop.27+0x309
117135  17%  85% 0.00    64922 sleepq chain
MXD_kUsrPutData2Fifo+0xff
16013   2%  88% 0.00   107380 turnstile chain        turnstile_trywait+0x85
11157   2%  89% 0.00     9110 mos.memLock            MXD_mosMemFree+0x2d
 9534   1%  91% 0.00     4598 spinlock
MXD_mosPerCpuCacheMemFree+0x5f
 7084   1%  92% 0.00     6277 turnstile lock         turnstile_trywait+0x16b
 6689   1%  93% 0.00     5404 mos.memLock            MXD_mosMemAlloc+0xcd
 4298   1%  93% 0.00    15026 ACPI lock (0xfffff800086c0c00)
AcpiOsAcquireLock+0x80
 2201   0%  94% 0.00    17123 turnstile chain        __mtx_unlock_sleep+0x5d
 1420   0%  94% 0.00     4821 sleepq chain           wakeup+0xf
 1152   0%  94% 0.00     1353 spinlock
MXD_mosPerCpuCacheMemAlloc+0x3a
 1087   0%  94% 0.00     4561 turnstile lock         turnstile_lookup+0xa3
  959   0%  94% 0.00     6673 sched lock 20          sched_add+0xe5
  901   0%  95% 0.00     7814 sched lock 18          sched_add+0xe5
  884   0%  95% 0.00     7031 sched lock 22          sched_add+0xe5
  881   0%  95% 0.00     7597 sched lock 14          sched_add+0xe5
  879   0%  95% 0.00     6127 sched lock 23          sched_add+0xe5
  867   0%  95% 0.00     7511 sched lock 1           sched_add+0xe5
  862   0%  95% 0.00     6403 sched lock 15          sched_add+0xe5
  856   0%  95% 0.00     9624 sched lock 2           sched_add+0xe5
-------------------------------------------------------------------------------

Thanks
Steevan

On Thu, Jun 28, 2018 at 10:17 PM, Andrew Gallatin <[hidden email]>
wrote:

> On 06/28/18 12:44, Steevan Rodrigues wrote:
> > Thank you so much for the suggestions .  I will use these commands.
> > Yes, I am already working on to identify lock contentions.
> >
> > Re-built FreeBSD kernel by enabling lock profiling and  now I am able to
> > see some issues with contention.
> >
> > Thanks
> > Steevan
>
>
> You need-not rebuild freebsd.  lockstat is based on dtrace..
>
> The lock profiling you're referring to is a different mechanism.
>
> Drew
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Andrew Gallatin
On 07/11/18 07:46, Steevan Rodrigues wrote:

> Hello Andrew,
>
> Sorry to bother you.  I am working on this problem again after a break
> of few days.
> I ran following command  to get lock statistics  when I run my userspace
> application on a 12 core 24 thread server PC.
>  From this data below  to looks like my driver is causing a contention
> on a kernel lock  (pmap ).  Am I right ?
>
>   lockstat -x aggsize=4m -D 20 sleep 10
>
> Adaptive mutex spin: 1122679 events in 10.013 seconds (112121 events/sec)
>
> Count indv cuml rcnt     nsec Lock                   Caller
> -------------------------------------------------------------------------------
> 1089662  97%  97% 0.00    65375 pmap                   pmap_extract+0x1d2
> 31805   3% 100% 0.00    14881 cdev                   devvn_refthread+0x142


Yes.  You can get more information if you use the -s 10 argument to
lockstat.  That way, you'll see what's calling pmap_extract.

Are you doing frequent device ioctls?


Drew
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: high CPU usage in FreeBSD for a PCIe card driver

Steevan Rodrigues
Thanks for confirming it.  Yes I do very frequent device ioctl calls .

Regards,
Steevan

On Wed, Jul 11, 2018 at 11:52 PM, Andrew Gallatin <[hidden email]>
wrote:

> On 07/11/18 07:46, Steevan Rodrigues wrote:
>
>> Hello Andrew,
>>
>> Sorry to bother you.  I am working on this problem again after a break of
>> few days.
>> I ran following command  to get lock statistics  when I run my userspace
>> application on a 12 core 24 thread server PC.
>>  From this data below  to looks like my driver is causing a contention on
>> a kernel lock  (pmap ).  Am I right ?
>>
>>   lockstat -x aggsize=4m -D 20 sleep 10
>>
>> Adaptive mutex spin: 1122679 events in 10.013 seconds (112121 events/sec)
>>
>> Count indv cuml rcnt     nsec Lock                   Caller
>> ------------------------------------------------------------
>> -------------------
>> 1089662  97%  97% 0.00    65375 pmap                   pmap_extract+0x1d2
>> 31805   3% 100% 0.00    14881 cdev                   devvn_refthread+0x142
>>
>
>
> Yes.  You can get more information if you use the -s 10 argument to
> lockstat.  That way, you'll see what's calling pmap_extract.
>
> Are you doing frequent device ioctls?
>
>
> Drew
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"