Network card interrupt handling

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Network card interrupt handling

Sean Bruno-7
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

We've been diagnosing what appeared to be out of order processing in
the network stack this week only to find out that the network card
driver was shoveling bits to us out of order (em).

This *seems* to be due to a design choice where the driver is allowed
to assert a "soft interrupt" to the h/w device while real interrupts
are disabled.  This allows a fake "em_msix_rx" to be started *while*
"em_handle_que" is running from the taskqueue.  We've isolated and
worked around this by setting our processing_limit in the driver to
- -1.  This means that *most* packet processing is now handled in the
MSI-X handler instead of being deferred.  Some periodic interference
is still detectable via em_local_timer() which causes one of these
"fake" interrupt assertions in the normal, card is *not* hung case.

Both functions use identical code for a start.  Both end up down
inside of em_rxeof() to process packets.  Both drop the RX lock prior
to handing the data up the network stack.

This means that the em_handle_que running from the taskqueue will be
preempted.  Dtrace confirms that this allows out of order processing
to occur at times and generates a lot of resets.

The reason I'm bringing this up on -arch and not on -net is that this
is a common design pattern in some of the Ethernet drivers.  We've
done preliminary tests on a patch that moves *all* processing of RX
packets to the rx_task taskqueue, which means that em_handle_que is
now the only path to get packets processed.

<stable10 diff>
https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff

My sense is that this is a slightly "better" method to handle the
packets but removes some immediacy from packet processing since all
processing is deferred.  However, all packet processing is now
serialized per queue, which I think is the proper implementation.

Am I smoking "le dope" here or is this the way forward?

sean
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJV3em1XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5klyYH+wX22JSRYkKMeCJGVSi1dJiM
fcd+DWZVhru2qyUNEfhBSoGEgi7HzXqaBwddy7GK2IRtbKeRlF/oebsII941SIsz
t2f35MoZunw0rWObIEw4WxxkXAajeATDjx87wozVmsZZ40JbmgZ0jKIGXiNie3Is
04pkXiIOElWqjlLtFlkITUUrOeKsN7kKbwaZAHYeFRdbUgsnxsh7fRvsFucOCgyr
CONHBGWEwu/g50YUruR+rPOHFAA1HD3dQuIoHcTjQx/uX4l5bw+8CFzzMjpw6X9d
G+boH6l1ZZ6U3uZCXOSmkPiXka7Ix8rdbUyrUrJTJrGEB7+U7rF2lSSq8cX+4pk=
=UibL
-----END PGP SIGNATURE-----
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Jack Vogel
I recall actually trying something like this once myself Sean, but if
memory serves the performance was poor enough that I decided against
pursuing it. Still, maybe it deserves further investigation.

Jack


On Wed, Aug 26, 2015 at 9:30 AM, Sean Bruno <[hidden email]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> We've been diagnosing what appeared to be out of order processing in
> the network stack this week only to find out that the network card
> driver was shoveling bits to us out of order (em).
>
> This *seems* to be due to a design choice where the driver is allowed
> to assert a "soft interrupt" to the h/w device while real interrupts
> are disabled.  This allows a fake "em_msix_rx" to be started *while*
> "em_handle_que" is running from the taskqueue.  We've isolated and
> worked around this by setting our processing_limit in the driver to
> - -1.  This means that *most* packet processing is now handled in the
> MSI-X handler instead of being deferred.  Some periodic interference
> is still detectable via em_local_timer() which causes one of these
> "fake" interrupt assertions in the normal, card is *not* hung case.
>
> Both functions use identical code for a start.  Both end up down
> inside of em_rxeof() to process packets.  Both drop the RX lock prior
> to handing the data up the network stack.
>
> This means that the em_handle_que running from the taskqueue will be
> preempted.  Dtrace confirms that this allows out of order processing
> to occur at times and generates a lot of resets.
>
> The reason I'm bringing this up on -arch and not on -net is that this
> is a common design pattern in some of the Ethernet drivers.  We've
> done preliminary tests on a patch that moves *all* processing of RX
> packets to the rx_task taskqueue, which means that em_handle_que is
> now the only path to get packets processed.
>
> <stable10 diff>
> https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff
>
> My sense is that this is a slightly "better" method to handle the
> packets but removes some immediacy from packet processing since all
> processing is deferred.  However, all packet processing is now
> serialized per queue, which I think is the proper implementation.
>
> Am I smoking "le dope" here or is this the way forward?
>
> sean
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iQF8BAEBCgBmBQJV3em1XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
> ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
> MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5klyYH+wX22JSRYkKMeCJGVSi1dJiM
> fcd+DWZVhru2qyUNEfhBSoGEgi7HzXqaBwddy7GK2IRtbKeRlF/oebsII941SIsz
> t2f35MoZunw0rWObIEw4WxxkXAajeATDjx87wozVmsZZ40JbmgZ0jKIGXiNie3Is
> 04pkXiIOElWqjlLtFlkITUUrOeKsN7kKbwaZAHYeFRdbUgsnxsh7fRvsFucOCgyr
> CONHBGWEwu/g50YUruR+rPOHFAA1HD3dQuIoHcTjQx/uX4l5bw+8CFzzMjpw6X9d
> G+boH6l1ZZ6U3uZCXOSmkPiXka7Ix8rdbUyrUrJTJrGEB7+U7rF2lSSq8cX+4pk=
> =UibL
> -----END PGP SIGNATURE-----
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

John Baldwin
In reply to this post by Sean Bruno-7
On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:

> We've been diagnosing what appeared to be out of order processing in
> the network stack this week only to find out that the network card
> driver was shoveling bits to us out of order (em).
>
> This *seems* to be due to a design choice where the driver is allowed
> to assert a "soft interrupt" to the h/w device while real interrupts
> are disabled.  This allows a fake "em_msix_rx" to be started *while*
> "em_handle_que" is running from the taskqueue.  We've isolated and
> worked around this by setting our processing_limit in the driver to
> -1.  This means that *most* packet processing is now handled in the
> MSI-X handler instead of being deferred.  Some periodic interference
> is still detectable via em_local_timer() which causes one of these
> "fake" interrupt assertions in the normal, card is *not* hung case.
>
> Both functions use identical code for a start.  Both end up down
> inside of em_rxeof() to process packets.  Both drop the RX lock prior
> to handing the data up the network stack.
>
> This means that the em_handle_que running from the taskqueue will be
> preempted.  Dtrace confirms that this allows out of order processing
> to occur at times and generates a lot of resets.
>
> The reason I'm bringing this up on -arch and not on -net is that this
> is a common design pattern in some of the Ethernet drivers.  We've
> done preliminary tests on a patch that moves *all* processing of RX
> packets to the rx_task taskqueue, which means that em_handle_que is
> now the only path to get packets processed.

It is only a common pattern in the Intel drivers. :-/  We (collectively)
spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
like medium) term I have an update to the interrupt API I want to push in
that allows drivers to manually schedule interrupt handlers using an
'hwi' API to replace the manual taskqueues.  This also ensures that
the handler that dequeues packets is only ever running in an ithread
context and never concurrently.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

John-Mark Gurney-2
In reply to this post by Sean Bruno-7
Sean Bruno wrote this message on Wed, Aug 26, 2015 at 09:30 -0700:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> We've been diagnosing what appeared to be out of order processing in
> the network stack this week only to find out that the network card
> driver was shoveling bits to us out of order (em).
>
> This *seems* to be due to a design choice where the driver is allowed
> to assert a "soft interrupt" to the h/w device while real interrupts
> are disabled.  This allows a fake "em_msix_rx" to be started *while*
> "em_handle_que" is running from the taskqueue.  We've isolated and
> worked around this by setting our processing_limit in the driver to
> - -1.  This means that *most* packet processing is now handled in the
> MSI-X handler instead of being deferred.  Some periodic interference
> is still detectable via em_local_timer() which causes one of these
> "fake" interrupt assertions in the normal, card is *not* hung case.
I have a better question, for MSI-X, we have a dedicated interrupt
thread to do the processing, so why are we even doing any moderation
in this case?  It's not any different than spinning in the task queue..

How about the attached patch that just disables taskqueue processing
for MSI-X RX interrupts, and does all processing in the interrupt
thread?

Do you need to add the rx_task to the em_local_timer?  If so, then
I would look at setting a flag in the _rxeof that says that processing
is happening... and in the case of the taskqueue, when it sees this
flag set, it just exits, while for the interrupt filter case, we'd
need to be more careful (possibly set a flag that the taskqueue will
inspect, and cause it to stop processing the rx queue)...

> Both functions use identical code for a start.  Both end up down
> inside of em_rxeof() to process packets.  Both drop the RX lock prior
> to handing the data up the network stack.
>
> This means that the em_handle_que running from the taskqueue will be
> preempted.  Dtrace confirms that this allows out of order processing
> to occur at times and generates a lot of resets.
>
> The reason I'm bringing this up on -arch and not on -net is that this
> is a common design pattern in some of the Ethernet drivers.  We've
> done preliminary tests on a patch that moves *all* processing of RX
> packets to the rx_task taskqueue, which means that em_handle_que is
> now the only path to get packets processed.
>
> <stable10 diff>
> https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff
>
> My sense is that this is a slightly "better" method to handle the
> packets but removes some immediacy from packet processing since all
> processing is deferred.  However, all packet processing is now
> serialized per queue, which I think is the proper implementation.
>
> Am I smoking "le dope" here or is this the way forward?
I think you discovered an interresting issue..

btw, since you're hacking on em a lot, interrested in fixing em's
jumbo frames so it doesn't use 9k clusters, but instead page sized
clusters?

--
  John-Mark Gurney Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"

em.patch (685 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Jack Vogel
The reason the extra handling was added into the local timer
was due to chasing hangs in the past, the thought was an interrupt
may have been missed. Flags sound like a nice idea, but there is
the possibility of a race condition and something still gets missed.

Its been quite a few years ago, but there was a time when the em
driver was having very intermittent hangs, in fact Sean may have
been one of the victims, and this stuff was an attempt to solve that.

Every time I looked at the em driver it just cried out to be thoroughly
cleaned up or rewritten, but regression testing would be a pain doing
that too.

In any case, its no longer my job, and I'm glad Sean is giving it the
attention he is :)

Jack


On Fri, Aug 28, 2015 at 11:48 AM, John-Mark Gurney <[hidden email]> wrote:

> Sean Bruno wrote this message on Wed, Aug 26, 2015 at 09:30 -0700:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA512
> >
> > We've been diagnosing what appeared to be out of order processing in
> > the network stack this week only to find out that the network card
> > driver was shoveling bits to us out of order (em).
> >
> > This *seems* to be due to a design choice where the driver is allowed
> > to assert a "soft interrupt" to the h/w device while real interrupts
> > are disabled.  This allows a fake "em_msix_rx" to be started *while*
> > "em_handle_que" is running from the taskqueue.  We've isolated and
> > worked around this by setting our processing_limit in the driver to
> > - -1.  This means that *most* packet processing is now handled in the
> > MSI-X handler instead of being deferred.  Some periodic interference
> > is still detectable via em_local_timer() which causes one of these
> > "fake" interrupt assertions in the normal, card is *not* hung case.
>
> I have a better question, for MSI-X, we have a dedicated interrupt
> thread to do the processing, so why are we even doing any moderation
> in this case?  It's not any different than spinning in the task queue..
>
> How about the attached patch that just disables taskqueue processing
> for MSI-X RX interrupts, and does all processing in the interrupt
> thread?
>
> Do you need to add the rx_task to the em_local_timer?  If so, then
> I would look at setting a flag in the _rxeof that says that processing
> is happening... and in the case of the taskqueue, when it sees this
> flag set, it just exits, while for the interrupt filter case, we'd
> need to be more careful (possibly set a flag that the taskqueue will
> inspect, and cause it to stop processing the rx queue)...
>
> > Both functions use identical code for a start.  Both end up down
> > inside of em_rxeof() to process packets.  Both drop the RX lock prior
> > to handing the data up the network stack.
> >
> > This means that the em_handle_que running from the taskqueue will be
> > preempted.  Dtrace confirms that this allows out of order processing
> > to occur at times and generates a lot of resets.
> >
> > The reason I'm bringing this up on -arch and not on -net is that this
> > is a common design pattern in some of the Ethernet drivers.  We've
> > done preliminary tests on a patch that moves *all* processing of RX
> > packets to the rx_task taskqueue, which means that em_handle_que is
> > now the only path to get packets processed.
> >
> > <stable10 diff>
> > https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff
> >
> > My sense is that this is a slightly "better" method to handle the
> > packets but removes some immediacy from packet processing since all
> > processing is deferred.  However, all packet processing is now
> > serialized per queue, which I think is the proper implementation.
> >
> > Am I smoking "le dope" here or is this the way forward?
>
> I think you discovered an interresting issue..
>
> btw, since you're hacking on em a lot, interrested in fixing em's
> jumbo frames so it doesn't use 9k clusters, but instead page sized
> clusters?
>
> --
>   John-Mark Gurney                              Voice: +1 415 225 5579
>
>      "All that I will do, has been done, All that I have, has not."
>
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Adrian Chadd-4
[snip]

Well, the other big reason for doing it deferred like this is to avoid
network based deadlocks because you're being fed packets faster than
you can handle them. If you never yield, you stop other NIC
processing.

People used to do run-to-completion and then complained when this
happened, so polling was a thing.

So - I'm all for doing it with a fast interrupt handler and a fast
taskqueue. As long as we don't run things to completion and
re-schedule the taskqueue (so other things on that core get network
processing) then I'm okay.

(I kinda want us to have NAPI at some point...)



-adrian
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

K. Macy
In reply to this post by John Baldwin
On Aug 28, 2015 12:59 PM, "John Baldwin" <[hidden email]> wrote:

>
> On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:
> > We've been diagnosing what appeared to be out of order processing in
> > the network stack this week only to find out that the network card
> > driver was shoveling bits to us out of order (em).
> >
> > This *seems* to be due to a design choice where the driver is allowed
> > to assert a "soft interrupt" to the h/w device while real interrupts
> > are disabled.  This allows a fake "em_msix_rx" to be started *while*
> > "em_handle_que" is running from the taskqueue.  We've isolated and
> > worked around this by setting our processing_limit in the driver to
> > -1.  This means that *most* packet processing is now handled in the
> > MSI-X handler instead of being deferred.  Some periodic interference
> > is still detectable via em_local_timer() which causes one of these
> > "fake" interrupt assertions in the normal, card is *not* hung case.
> >
> > Both functions use identical code for a start.  Both end up down
> > inside of em_rxeof() to process packets.  Both drop the RX lock prior
> > to handing the data up the network stack.
> >
> > This means that the em_handle_que running from the taskqueue will be
> > preempted.  Dtrace confirms that this allows out of order processing
> > to occur at times and generates a lot of resets.
> >
> > The reason I'm bringing this up on -arch and not on -net is that this
> > is a common design pattern in some of the Ethernet drivers.  We've
> > done preliminary tests on a patch that moves *all* processing of RX
> > packets to the rx_task taskqueue, which means that em_handle_que is
> > now the only path to get packets processed.
>
> It is only a common pattern in the Intel drivers. :-/  We (collectively)
> spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
> like medium) term I have an update to the interrupt API I want to push in
> that allows drivers to manually schedule interrupt handlers using an
> 'hwi' API to replace the manual taskqueues.  This also ensures that
> the handler that dequeues packets is only ever running in an ithread
> context and never concurrently.
>

Jeff has a generalization of the net_task infrastructure used at Nokia
called grouptaskq that I've used for iflib. That does essentially what you
refer to. I've converted ixl and am currently about to test an ixgbe
conversion. I anticipate converting mlxen, all Intel drivers as well as the
remaining drivers with device specific code in netmap. The one catch is
finding someone who will publicly admit to owning re hardware so that I can
buy it from him and test my changes.

Cheers.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Enji Cooper

> On Aug 28, 2015, at 18:25, K. Macy <[hidden email]> wrote:
>
>> On Aug 28, 2015 12:59 PM, "John Baldwin" <[hidden email]> wrote:
>>
>>> On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:
>>> We've been diagnosing what appeared to be out of order processing in
>>> the network stack this week only to find out that the network card
>>> driver was shoveling bits to us out of order (em).
>>>
>>> This *seems* to be due to a design choice where the driver is allowed
>>> to assert a "soft interrupt" to the h/w device while real interrupts
>>> are disabled.  This allows a fake "em_msix_rx" to be started *while*
>>> "em_handle_que" is running from the taskqueue.  We've isolated and
>>> worked around this by setting our processing_limit in the driver to
>>> -1.  This means that *most* packet processing is now handled in the
>>> MSI-X handler instead of being deferred.  Some periodic interference
>>> is still detectable via em_local_timer() which causes one of these
>>> "fake" interrupt assertions in the normal, card is *not* hung case.
>>>
>>> Both functions use identical code for a start.  Both end up down
>>> inside of em_rxeof() to process packets.  Both drop the RX lock prior
>>> to handing the data up the network stack.
>>>
>>> This means that the em_handle_que running from the taskqueue will be
>>> preempted.  Dtrace confirms that this allows out of order processing
>>> to occur at times and generates a lot of resets.
>>>
>>> The reason I'm bringing this up on -arch and not on -net is that this
>>> is a common design pattern in some of the Ethernet drivers.  We've
>>> done preliminary tests on a patch that moves *all* processing of RX
>>> packets to the rx_task taskqueue, which means that em_handle_que is
>>> now the only path to get packets processed.
>>
>> It is only a common pattern in the Intel drivers. :-/  We (collectively)
>> spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
>> like medium) term I have an update to the interrupt API I want to push in
>> that allows drivers to manually schedule interrupt handlers using an
>> 'hwi' API to replace the manual taskqueues.  This also ensures that
>> the handler that dequeues packets is only ever running in an ithread
>> context and never concurrently.
>
> Jeff has a generalization of the net_task infrastructure used at Nokia
> called grouptaskq that I've used for iflib. That does essentially what you
> refer to. I've converted ixl and am currently about to test an ixgbe
> conversion. I anticipate converting mlxen, all Intel drivers as well as the
> remaining drivers with device specific code in netmap. The one catch is
> finding someone who will publicly admit to owning re hardware so that I can
> buy it from him and test my changes.
>
> Cheers.

I have 2 re NICs in my fileserver at home (Asus went cheap on some of their MBs a while back), but the cards shouldn't cost more than $15 + shipping (look for "Realtek 8169" on Google).

HTH!
-NGie
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Enji Cooper

> On Aug 28, 2015, at 18:52, Garrett Cooper <[hidden email]> wrote:



> I have 2 re NICs in my fileserver at home (Asus went cheap on some of their MBs a while back), but the cards shouldn't cost more than $15 + shipping (look for "Realtek 8169" on Google).

QEMU also emulates re(4) too, depending on what NIC you ask for at boot.
Cheers,
-NGie
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Hooman Fazaeli-3
In reply to this post by Sean Bruno-7
On 8/26/2015 9:00 PM, Sean Bruno wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> We've been diagnosing what appeared to be out of order processing in
> the network stack this week only to find out that the network card
> driver was shoveling bits to us out of order (em).
>
> This *seems* to be due to a design choice where the driver is allowed
> to assert a "soft interrupt" to the h/w device while real interrupts
> are disabled.  This allows a fake "em_msix_rx" to be started *while*
> "em_handle_que" is running from the taskqueue.  We've isolated and
> worked around this by setting our processing_limit in the driver to
> - -1.  This means that *most* packet processing is now handled in the
> MSI-X handler instead of being deferred.  Some periodic interference
> is still detectable via em_local_timer() which causes one of these
> "fake" interrupt assertions in the normal, card is *not* hung case.
>
> Both functions use identical code for a start.  Both end up down
> inside of em_rxeof() to process packets.  Both drop the RX lock prior
> to handing the data up the network stack.
>
> This means that the em_handle_que running from the taskqueue will be
> preempted.  Dtrace confirms that this allows out of order processing
> to occur at times and generates a lot of resets.
>
> The reason I'm bringing this up on -arch and not on -net is that this
> is a common design pattern in some of the Ethernet drivers.  We've
> done preliminary tests on a patch that moves *all* processing of RX
> packets to the rx_task taskqueue, which means that em_handle_que is
> now the only path to get packets processed.
>
> <stable10 diff>
> https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff
>
> My sense is that this is a slightly "better" method to handle the
> packets but removes some immediacy from packet processing since all
> processing is deferred.  However, all packet processing is now
> serialized per queue, which I think is the proper implementation.
>
> Am I smoking "le dope" here or is this the way forward?
>
>
Which versions of the driver have this problem?

--
Best regards
Hooman Fazaeli

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

John-Mark Gurney-2
In reply to this post by Adrian Chadd-4
Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
> [snip]
>
> Well, the other big reason for doing it deferred like this is to avoid
> network based deadlocks because you're being fed packets faster than
> you can handle them. If you never yield, you stop other NIC
> processing.

You snipped the part of me asking isn't the interrupt thread just the
same interruptable context as the task queue?  Maybe the priority is
different, but that can be adjusted to be the same and still save the
context switch...

There is no break/moderation in the taskqueue, as it'll just enqueue
itself, and when the task queue breaks out, it'll just immediately run
itself, since it has a dedicated thread to itself... So, looks like
you get the same spinning behavior...

> People used to do run-to-completion and then complained when this
> happened, so polling was a thing.

Maybe when using PCI shared interrupts, but we are talking about PCIe
MSI-X unshared interrupts.

> So - I'm all for doing it with a fast interrupt handler and a fast
> taskqueue. As long as we don't run things to completion and
> re-schedule the taskqueue (so other things on that core get network
> processing) then I'm okay.
>
> (I kinda want us to have NAPI at some point...)

--
  John-Mark Gurney Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Adrian Chadd-4
On 30 August 2015 at 17:00, John-Mark Gurney <[hidden email]> wrote:

> Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
>> [snip]
>>
>> Well, the other big reason for doing it deferred like this is to avoid
>> network based deadlocks because you're being fed packets faster than
>> you can handle them. If you never yield, you stop other NIC
>> processing.
>
> You snipped the part of me asking isn't the interrupt thread just the
> same interruptable context as the task queue?  Maybe the priority is
> different, but that can be adjusted to be the same and still save the
> context switch...
>
> There is no break/moderation in the taskqueue, as it'll just enqueue
> itself, and when the task queue breaks out, it'll just immediately run
> itself, since it has a dedicated thread to itself... So, looks like
> you get the same spinning behavior...
>
>> People used to do run-to-completion and then complained when this
>> happened, so polling was a thing.
>
> Maybe when using PCI shared interrupts, but we are talking about PCIe
> MSI-X unshared interrupts.

Well, try it and see what happens. You can still get network livelock
and starvation of other interfaces with ridiculously high pps if you
never yield. :P



-adrian
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

Sean Bruno-7
In reply to this post by John-Mark Gurney-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512




> I have a better question, for MSI-X, we have a dedicated interrupt
> thread to do the processing, so why are we even doing any
> moderation in this case?  It's not any different than spinning in
> the task queue..
>
> How about the attached patch that just disables taskqueue
> processing for MSI-X RX interrupts, and does all processing in the
> interrupt thread?

This is another design that I had thought of.  For em(4) when using
seperate ISR threads for *each* rx queue and *each* tx queue, I think
that doing processing in the interrupt thread is the right thing to do.

I'm unsure of what the correct thing to do when tx/rx is combined into
a single handler though (igb/ix for example).  This would lead to
possible starvation as Adrian has pointed out.  There is nothing
stopping us from breaking the queues apart into seperate tx/rx threads
of execution for these drivers.  em(4) was my little science project
to see what the behavior would be.

>
> Do you need to add the rx_task to the em_local_timer?  If so, then
> I would look at setting a flag in the _rxeof that says that
> processing is happening... and in the case of the taskqueue, when
> it sees this flag set, it just exits, while for the interrupt
> filter case, we'd need to be more careful (possibly set a flag that
> the taskqueue will inspect, and cause it to stop processing the rx
> queue)...
>

^^ I'll ponder this a bit further today and comment after coffee.

> btw, since you're hacking on em a lot, interrested in fixing em's
> jumbo frames so it doesn't use 9k clusters, but instead page sized
> clusters?
>
>

Uh ... hrm.   I can look into it, but would need more details as I'm
pretty ignorant of what you're referring to.  Ping me off list and
I'll take a look (jumbo frames is out of scope for $dayjob).

sean
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJV5Hd+XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kL4wH/AmMvnZ7EKfF04qKhUxdOA90
YN5OZpyeXc8zk0QUq+INNMIHiQJN1wCHiOVd46YIuwjdWeSvHxKgnJMV1whDod55
c4QO6WG5yRcP5Wl30YN5XhjfUW48MYytYXxlAY5cC5A+TIUq/HywSNmyEVxKAooY
lSw+8Zpdzaozv1LxS70bRggi9y/y5NEgcVViO1cjip+nkl3eNvYOFq5jp2KJc0vS
+oe/wqbF5syRiBO4R2XnJs6PJ9BALOF73pFteHBebAe5jUwt6UD17c35/I2B4v60
+zNuM3rdNdDCOecFEdFctOQe6XDpSAu6Q7Dv88SKoKeIs+2lXHD/AJf24heTQOg=
=oY0k
-----END PGP SIGNATURE-----
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

John Baldwin
In reply to this post by John-Mark Gurney-2
On Sunday, August 30, 2015 05:00:03 PM John-Mark Gurney wrote:

> Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
> > [snip]
> >
> > Well, the other big reason for doing it deferred like this is to avoid
> > network based deadlocks because you're being fed packets faster than
> > you can handle them. If you never yield, you stop other NIC
> > processing.
>
> You snipped the part of me asking isn't the interrupt thread just the
> same interruptable context as the task queue?  Maybe the priority is
> different, but that can be adjusted to be the same and still save the
> context switch...
>
> There is no break/moderation in the taskqueue, as it'll just enqueue
> itself, and when the task queue breaks out, it'll just immediately run
> itself, since it has a dedicated thread to itself... So, looks like
> you get the same spinning behavior...

Yes, that is true and why all the interrupt moderation stuff in the NIC
drivers that I've seen has always been pointless.  All it does is add
extra overhead since you waste time with extra context switches back to
yourself in between servicing packets.  It does not permit any other
NICs to run at all.  (One of the goals of my other patches that I
mentioned is to make it possible for multiple devices to share ithreads
even when using discrete interrupts (e.g. MSI) so that the yielding
done actually would give a chance for other devices to run, but currently
it is all just a waste of CPU cycles).

If you think this actually helps, I challenge to you capture a KTR_SCHED
trace of it ever working as intended.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Network card interrupt handling

John Baldwin
In reply to this post by K. Macy
On Friday, August 28, 2015 06:25:53 PM K. Macy wrote:

> On Aug 28, 2015 12:59 PM, "John Baldwin" <[hidden email]> wrote:
> >
> > On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:
> > > We've been diagnosing what appeared to be out of order processing in
> > > the network stack this week only to find out that the network card
> > > driver was shoveling bits to us out of order (em).
> > >
> > > This *seems* to be due to a design choice where the driver is allowed
> > > to assert a "soft interrupt" to the h/w device while real interrupts
> > > are disabled.  This allows a fake "em_msix_rx" to be started *while*
> > > "em_handle_que" is running from the taskqueue.  We've isolated and
> > > worked around this by setting our processing_limit in the driver to
> > > -1.  This means that *most* packet processing is now handled in the
> > > MSI-X handler instead of being deferred.  Some periodic interference
> > > is still detectable via em_local_timer() which causes one of these
> > > "fake" interrupt assertions in the normal, card is *not* hung case.
> > >
> > > Both functions use identical code for a start.  Both end up down
> > > inside of em_rxeof() to process packets.  Both drop the RX lock prior
> > > to handing the data up the network stack.
> > >
> > > This means that the em_handle_que running from the taskqueue will be
> > > preempted.  Dtrace confirms that this allows out of order processing
> > > to occur at times and generates a lot of resets.
> > >
> > > The reason I'm bringing this up on -arch and not on -net is that this
> > > is a common design pattern in some of the Ethernet drivers.  We've
> > > done preliminary tests on a patch that moves *all* processing of RX
> > > packets to the rx_task taskqueue, which means that em_handle_que is
> > > now the only path to get packets processed.
> >
> > It is only a common pattern in the Intel drivers. :-/  We (collectively)
> > spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
> > like medium) term I have an update to the interrupt API I want to push in
> > that allows drivers to manually schedule interrupt handlers using an
> > 'hwi' API to replace the manual taskqueues.  This also ensures that
> > the handler that dequeues packets is only ever running in an ithread
> > context and never concurrently.
> >
>
> Jeff has a generalization of the net_task infrastructure used at Nokia
> called grouptaskq that I've used for iflib. That does essentially what you
> refer to. I've converted ixl and am currently about to test an ixgbe
> conversion. I anticipate converting mlxen, all Intel drivers as well as the
> remaining drivers with device specific code in netmap. The one catch is
> finding someone who will publicly admit to owning re hardware so that I can
> buy it from him and test my changes.

Note that the ithread changes I refer to are for all devices (not just
network interfaces) and fix some other issues as well (e.g. INTR_FILTER is
always enabled and races with tearing down filters are closed, it also uses
a more thread_lock()-friendly state for idle ithreads, and it also allows us
to experiment with sharing ithreads among devices as well as having multiple
threads service a queue of interrupt handlers if desired).  It may be that
this will make your life easier since you might be able to reuse the new
primitives more directly rather than bypassing ithreads.  I've posted the
changes to arch@ a few different times over the past several years just
haven't pushed them in.  (They aren't perfect in that I don't yet have
APIs for changing the plumbing around due to lack of use cases to build
the APIs from.)

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "[hidden email]"