USB stack getting confused

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

USB stack getting confused

O'Connor, Daniel-2
Hi,
I'm developing a data acquisition system on FreeBSD using a USB3 interface (the OrangeTree ZestSC3) and I find that the USB stack appears to 'lose' the device after a while.

My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).

If I stop my program the device reappears in usbconfig. If I restart my program it works.

I did some GDB'ing and it appears that ugen20_enumerate (the libusb10 interface is implemented by calling libusb20 functions) can't open /dev/ugenX.Y and errno is 12 (ENOMEM).

After digging with dtrace I have seen the open method be something different for this device. I have also seen it where opening the device doesn't call usb_fifo_open (not sure what it *does* call though - I see user land call openat but haven't traced through what gets called).

I'm still digging but am somewhat hopeful someone can suggest some things to look at :)

This is on 11.2 if it matters.

Thanks.

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).

What is printed in dmesg? Maybe the device has a problem.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

O'Connor, Daniel-2


> On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]> wrote:
> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
>
> What is printed in dmesg? Maybe the device has a problem.

There is nothing in dmesg - no disconnect / reconnect etc.

If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
On 3/9/19 11:29 AM, O'Connor, Daniel wrote:
> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.

Check the output from "procstat -ak". Likely your application is not
closing the USB handle during device detach and so a deadlock happens.

Also see:
libusb20_dev_check_connected() . Poll this function regularly to figure
out if disconnect is needed.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Konstantin Belousov
In reply to this post by O'Connor, Daniel-2
On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:

>
>
> > On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]> wrote:
> > On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> >> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
> >
> > What is printed in dmesg? Maybe the device has a problem.
>
> There is nothing in dmesg - no disconnect / reconnect etc.
>
> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.

Does it mean that the file descriptor opened for ugen has a chance to
be closed ?

I suspect that usb subsystem tried to destroy the device but some internal
refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
the issue.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
On 3/9/19 4:26 PM, Konstantin Belousov wrote:

> On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
>>
>>
>>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]> wrote:
>>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
>>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
>>>
>>> What is printed in dmesg? Maybe the device has a problem.
>>
>> There is nothing in dmesg - no disconnect / reconnect etc.
>>
>> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.
>
> Does it mean that the file descriptor opened for ugen has a chance to
> be closed ?

The USB stack will wait for all FDs to be closed during detach also via
destroy_dev().

>
> I suspect that usb subsystem tried to destroy the device but some internal
> refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> the issue.

--HPS

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Konstantin Belousov
On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:

> On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> >>
> >>
> >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]> wrote:
> >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> >>>> My program normally runs continually doing acquisitions of data for N seconds, doing some checks and restarting. After a while (~30 1 minute acquisitions or ~8 30 minute ones) my program can't 'see' the device (it uses libusb10) any more (it reconnects each acquisition for $REASONS). Also pretty weirdly usbconfig can't see it either(!).
> >>>
> >>> What is printed in dmesg? Maybe the device has a problem.
> >>
> >> There is nothing in dmesg - no disconnect / reconnect etc.
> >>
> >> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.
> >
> > Does it mean that the file descriptor opened for ugen has a chance to
> > be closed ?
>
> The USB stack will wait for all FDs to be closed during detach also via
> destroy_dev().
So my guess was correct.  Do you agree that this behaviour is wrong ?

In fact I saw something similar with apcupsd and either usb/com adapters
or native usb control card for APC UPSes.  For reasons I do not understand,
these devices are often disconnected.  For older versions of apcupsd,
it required restart for newly reattached device to be recreated in /dev.
Sometimes it hangs whole usb stack.

Newer apcupsd seems to open /dev/ugen only for the duration of the query,
which makes the erratic behaviour is much less likely, but could still cause
breakage when device disappear while apcupsd has it opened.

>
> >
> > I suspect that usb subsystem tried to destroy the device but some internal
> > refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> > the issue.
>
> --HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Warner Losh
On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov <[hidden email]>
wrote:

> On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> > On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> > >>
> > >>
> > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]>
> wrote:
> > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> > >>>> My program normally runs continually doing acquisitions of data for
> N seconds, doing some checks and restarting. After a while (~30 1 minute
> acquisitions or ~8 30 minute ones) my program can't 'see' the device (it
> uses libusb10) any more (it reconnects each acquisition for $REASONS). Also
> pretty weirdly usbconfig can't see it either(!).
> > >>>
> > >>> What is printed in dmesg? Maybe the device has a problem.
> > >>
> > >> There is nothing in dmesg - no disconnect / reconnect etc.
> > >>
> > >> If I hold the user space process in gdb 'forever' (eg over night)
> usbconfig doesn't see the device, but the moment I quit the user space
> process it can be seen again.
> > >
> > > Does it mean that the file descriptor opened for ugen has a chance to
> > > be closed ?
> >
> > The USB stack will wait for all FDs to be closed during detach also via
> > destroy_dev().
> So my guess was correct.  Do you agree that this behaviour is wrong ?
>
> In fact I saw something similar with apcupsd and either usb/com adapters
> or native usb control card for APC UPSes.  For reasons I do not understand,
> these devices are often disconnected.  For older versions of apcupsd,
> it required restart for newly reattached device to be recreated in /dev.
> Sometimes it hangs whole usb stack.
>
> Newer apcupsd seems to open /dev/ugen only for the duration of the query,
> which makes the erratic behaviour is much less likely, but could still
> cause
> breakage when device disappear while apcupsd has it opened.
>

Is there a form of destroy_dev() that does a revoke on all open instances?
Eg, this is gone, you can't use it anymore, and all further attempts to use
the device will generate an error, but in the mean time we destroy the
device and let the detach routine get on with life. waiting may make sense
when you are merely unloading the driver (and getting to the detach routine
that way), but when the device is gone, I've come around to the point of
view that we should just destroy it w/o waiting for closes and anybody that
touches it afterwards gets an error and has to cope with the error. But
even in the unload case, we maybe we shouldn't get to the detach routine
unless we're forcing and/or the detach routine just returns EBUSY since the
only one that knows what dev_t's are associated with the device_t is the
driver itself.

Warner

>
> > >
> > > I suspect that usb subsystem tried to destroy the device but some
> internal
> > > refcounting prevents it.  Proper use of destroy_dev(_cb)(9) avoids
> > > the issue.
> >
> > --HPS
> _______________________________________________
> [hidden email] mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "[hidden email]"
>
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Konstantin Belousov
On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:

> On Sat, Mar 9, 2019 at 11:25 AM Konstantin Belousov <[hidden email]>
> wrote:
>
> > On Sat, Mar 09, 2019 at 04:42:50PM +0100, Hans Petter Selasky wrote:
> > > On 3/9/19 4:26 PM, Konstantin Belousov wrote:
> > > > On Sat, Mar 09, 2019 at 08:59:30PM +1030, O'Connor, Daniel wrote:
> > > >>
> > > >>
> > > >>> On 9 Mar 2019, at 19:30, Hans Petter Selasky <[hidden email]>
> > wrote:
> > > >>> On 3/9/19 12:08 AM, O'Connor, Daniel wrote:
> > > >>>> My program normally runs continually doing acquisitions of data for
> > N seconds, doing some checks and restarting. After a while (~30 1 minute
> > acquisitions or ~8 30 minute ones) my program can't 'see' the device (it
> > uses libusb10) any more (it reconnects each acquisition for $REASONS). Also
> > pretty weirdly usbconfig can't see it either(!).
> > > >>>
> > > >>> What is printed in dmesg? Maybe the device has a problem.
> > > >>
> > > >> There is nothing in dmesg - no disconnect / reconnect etc.
> > > >>
> > > >> If I hold the user space process in gdb 'forever' (eg over night)
> > usbconfig doesn't see the device, but the moment I quit the user space
> > process it can be seen again.
> > > >
> > > > Does it mean that the file descriptor opened for ugen has a chance to
> > > > be closed ?
> > >
> > > The USB stack will wait for all FDs to be closed during detach also via
> > > destroy_dev().
> > So my guess was correct.  Do you agree that this behaviour is wrong ?
> >
> > In fact I saw something similar with apcupsd and either usb/com adapters
> > or native usb control card for APC UPSes.  For reasons I do not understand,
> > these devices are often disconnected.  For older versions of apcupsd,
> > it required restart for newly reattached device to be recreated in /dev.
> > Sometimes it hangs whole usb stack.
> >
> > Newer apcupsd seems to open /dev/ugen only for the duration of the query,
> > which makes the erratic behaviour is much less likely, but could still
> > cause
> > breakage when device disappear while apcupsd has it opened.
> >
>
> Is there a form of destroy_dev() that does a revoke on all open instances?
> Eg, this is gone, you can't use it anymore, and all further attempts to use
> the device will generate an error, but in the mean time we destroy the
> device and let the detach routine get on with life. waiting may make sense
> when you are merely unloading the driver (and getting to the detach routine
> that way), but when the device is gone, I've come around to the point of
> view that we should just destroy it w/o waiting for closes and anybody that
> touches it afterwards gets an error and has to cope with the error. But
> even in the unload case, we maybe we shouldn't get to the detach routine
> unless we're forcing and/or the detach routine just returns EBUSY since the
> only one that knows what dev_t's are associated with the device_t is the
> driver itself.
You are asking very basic questions about devfs there.

destroy_dev(9) waits for two things:
- that all threads left the cdevsw methods for the given device;
- that all cdevpriv destructors finished running.
To facilitate waking up threads potentially sleeping inside the cdevsw
methods, drivers might implement d_purge method which must weed out sleeping
threads from inside the code in the bound time.

After that we return from destroy_dev(9) and guarantee that no new calls
into cdevsw is done for this device.  devfs magic consumes  the fo_ and
VOP_ calls and does not allow them to reach into the driver.

So what usb does there is actively defeating existing mechanism by
keeping internal refcount on opens and refusing to call destroy_dev()
until the count goes to zero (I did not read the usb code, but I believe
that I am not too wrong).  Would usb core just destroy_dev() when the
physical device goes away, then at worst the existing file descriptors
opened against the lost devices would become dead (not same dead as
terminals after revoke(2), but very similar).

If the problem is due to keeping some instance data for the opened device,
then cdevpriv might be the better fit (at least the KPI was designed
to be) than blocking destroy until all users are gone.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Rozhuk Ivan-2
In reply to this post by Konstantin Belousov
On Sat, 9 Mar 2019 18:26:40 +0200
Konstantin Belousov <[hidden email]> wrote:

> In fact I saw something similar with apcupsd and either usb/com
> adapters or native usb control card for APC UPSes.  For reasons I do
> not understand, these devices are often disconnected.  For older
> versions of apcupsd, it required restart for newly reattached device
> to be recreated in /dev. Sometimes it hangs whole usb stack.
>
> Newer apcupsd seems to open /dev/ugen only for the duration of the
> query, which makes the erratic behaviour is much less likely, but
> could still cause breakage when device disappear while apcupsd has it
> opened.
>

Same problem with usb sound cards.
I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
https://reviews.freebsd.org/D11140
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
In reply to this post by Warner Losh
On 3/9/19 7:41 PM, Warner Losh wrote:

>> Newer apcupsd seems to open /dev/ugen only for the duration of the query,
>> which makes the erratic behaviour is much less likely, but could still
>> cause
>> breakage when device disappear while apcupsd has it opened.
>>
> Is there a form of destroy_dev() that does a revoke on all open instances?
> Eg, this is gone, you can't use it anymore, and all further attempts to use
> the device will generate an error, but in the mean time we destroy the
> device and let the detach routine get on with life. waiting may make sense
> when you are merely unloading the driver (and getting to the detach routine
> that way), but when the device is gone, I've come around to the point of
> view that we should just destroy it w/o waiting for closes and anybody that
> touches it afterwards gets an error and has to cope with the error. But
> even in the unload case, we maybe we shouldn't get to the detach routine
> unless we're forcing and/or the detach routine just returns EBUSY since the
> only one that knows what dev_t's are associated with the device_t is the
> driver itself.

Hi,

There are multiple issues here:

1) The USB stack use device numbers from device_get_unit() when creating
character devices. That means it must wait at least until the VNODE in
/dev is removed, and the same device name can be re-used.

2) When disconnecting the "struct file" from the USB, lost memory might
pile up if these daemons which are typically created by devd don't get
killed.

Many of these applications are using libusb. We can add a heartbeat
thread inside there to simply close the ugen device handle when we
understand the device is gone. That will close 99% of these issues.

--HPS


--HPS


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
In reply to this post by Konstantin Belousov
On 3/9/19 8:23 PM, Konstantin Belousov wrote:

> On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
>>
>> Is there a form of destroy_dev() that does a revoke on all open instances?
>> Eg, this is gone, you can't use it anymore, and all further attempts to use
>> the device will generate an error, but in the mean time we destroy the
>> device and let the detach routine get on with life. waiting may make sense
>> when you are merely unloading the driver (and getting to the detach routine
>> that way), but when the device is gone, I've come around to the point of
>> view that we should just destroy it w/o waiting for closes and anybody that
>> touches it afterwards gets an error and has to cope with the error. But
>> even in the unload case, we maybe we shouldn't get to the detach routine
>> unless we're forcing and/or the detach routine just returns EBUSY since the
>> only one that knows what dev_t's are associated with the device_t is the
>> driver itself.
> You are asking very basic questions about devfs there.
>
> destroy_dev(9) waits for two things:
> - that all threads left the cdevsw methods for the given device;
> - that all cdevpriv destructors finished running.

Hi,

> To facilitate waking up threads potentially sleeping inside the cdevsw
> methods, drivers might implement d_purge method which must weed out sleeping
> threads from inside the code in the bound time.

USB will purge all callers before calling destroy_dev(). This is not the
problem.

> After that we return from destroy_dev(9) and guarantee that no new calls
> into cdevsw is done for this device.  devfs magic consumes  the fo_ and
> VOP_ calls and does not allow them to reach into the driver.

When I designed the current USB devfs it was important to me to keep
open() and close() calls balanced to avoid situations where an open call
may setup some resource and then close(), which free this resource
again, never gets called. destroy_dev(9) makes no such guarantee, and I
think that is a failure of destroy_dev(9). That's when I started using
the cdev's destructor callback function.

> So what usb does there is actively defeating existing mechanism by
> keeping internal refcount on opens and refusing to call destroy_dev()
> until the count goes to zero

The FreeBSD USB stack also is used in environments w/o DEVFS and need
own refcounts.

> (I did not read the usb code, but I believe
> that I am not too wrong).  
 >Would usb core just destroy_dev() when the
> physical device goes away, then at worst the existing file descriptors
> opened against the lost devices would become dead (not same dead as
> terminals after revoke(2), but very similar).

Yes, I can do that if destroy_dev() ensures that d_close is called for
all open file handles once and only once before it returns. I think this
is where the problem comes from.

>
> If the problem is due to keeping some instance data for the opened device,
> then cdevpriv might be the better fit (at least the KPI was designed
> to be) than blocking destroy until all users are gone.
>

The USB stack does not use MMAP, so this is not a problem.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
In reply to this post by Rozhuk Ivan-2
On 3/9/19 8:28 PM, Rozhuk Ivan wrote:

> On Sat, 9 Mar 2019 18:26:40 +0200
> Konstantin Belousov <[hidden email]> wrote:
>
>> In fact I saw something similar with apcupsd and either usb/com
>> adapters or native usb control card for APC UPSes.  For reasons I do
>> not understand, these devices are often disconnected.  For older
>> versions of apcupsd, it required restart for newly reattached device
>> to be recreated in /dev. Sometimes it hangs whole usb stack.
>>
>> Newer apcupsd seems to open /dev/ugen only for the duration of the
>> query, which makes the erratic behaviour is much less likely, but
>> could still cause breakage when device disappear while apcupsd has it
>> opened.
>>
>
> Same problem with usb sound cards.
> I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
> https://reviews.freebsd.org/D11140
>

Hi,

How will these apps detect that they need to open the new /dev/mixer node?

I mean, after hang is fixed, mixer app will still try to query the old
file handle forever?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Rozhuk Ivan-2
On Sat, 9 Mar 2019 22:40:02 +0100
Hans Petter Selasky <[hidden email]> wrote:

> > Same problem with usb sound cards.
> > I try to fix it, but fail with dsp, only mixer can be fixed with
> > small code change. https://reviews.freebsd.org/D11140
> >  
>
> Hi,
>
> How will these apps detect that they need to open the new /dev/mixer
> node?
>
> I mean, after hang is fixed, mixer app will still try to query the
> old file handle forever?
>

Main problem for me is: usb device lost/reconnected, new device connected,
but FreeBSD does nothink because USB stack hang - it wait for all fd closed for mixer and dsp.

Apps can be rewrited/pathed: on dev lost - get error on operations with fd, then try to reopen it.
I dont remember now how that work in patch, it is undone.
Another OSS issue - apps do not react on hw.snd.default_unit change.

I mitigate reconnect issue in hardware:
- switch to sound via HDMI
- add real LC filter to home power line: I have long USB link from PC to work place USB HUB with
kb, mouse, usb sound ...,and every time after refregerator start/stop I got lost USB link to hub,
LC filter fix this. After that kb, mouse and other usb devices does not replug untill I close
all apps that have opened fd from mixer and dsp.


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

O'Connor, Daniel-2
In reply to this post by Hans Petter Selasky-6


> On 10 Mar 2019, at 01:55, Hans Petter Selasky <[hidden email]> wrote:
> On 3/9/19 11:29 AM, O'Connor, Daniel wrote:
>> If I hold the user space process in gdb 'forever' (eg over night) usbconfig doesn't see the device, but the moment I quit the user space process it can be seen again.
>
> Check the output from "procstat -ak". Likely your application is not closing the USB handle during device detach and so a deadlock happens.

I ran it while stopped in the debugger..
[maarsytest 23:34] ~> procstat -k 20033
  PID    TID COMM                TDNAME              KSTACK
20033 100135 tclsh8.6            -                   mi_switch thread_suspend_switch ptracestop cursig ast doreti_ast

Then continued it and ran it a few more times..
[maarsytest 23:34] ~> procstat -k 20033
  PID    TID COMM                TDNAME              KSTACK
20033 100135 tclsh8.6            -                   mi_switch sleepq_catch_signals sleepq_wait_sig _sleep pipe_read dofileread kern_readv sys_read amd64_syscall fast_syscall_common
[maarsytest 23:34] ~> procstat -k 20033
  PID    TID COMM                TDNAME              KSTACK
20033 100135 tclsh8.6            -                   mi_switch sleepq_catch_signals sleepq_timedwait_sig _cv_timedwait_sig_sbt seltdwait kern_select sys_select amd64_syscall fast_syscall_common

> Also see:
> libusb20_dev_check_connected() . Poll this function regularly to figure out if disconnect is needed.

Hmm, is this exposed in the libusb10 interface? The code I am using uses that to talk to the device (although I have the source for it so can modify it)

--
Daniel O'Connor
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
On 3/10/19 1:37 AM, O'Connor, Daniel wrote:
> Hmm, is this exposed in the libusb10 interface? The code I am using uses that to talk to the device (although I have the source for it so can modify it)

See libusb_check_connected().

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Konstantin Belousov
In reply to this post by Hans Petter Selasky-6
On Sat, Mar 09, 2019 at 10:35:28PM +0100, Hans Petter Selasky wrote:

> On 3/9/19 8:23 PM, Konstantin Belousov wrote:
> > On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
> >>
> >> Is there a form of destroy_dev() that does a revoke on all open instances?
> >> Eg, this is gone, you can't use it anymore, and all further attempts to use
> >> the device will generate an error, but in the mean time we destroy the
> >> device and let the detach routine get on with life. waiting may make sense
> >> when you are merely unloading the driver (and getting to the detach routine
> >> that way), but when the device is gone, I've come around to the point of
> >> view that we should just destroy it w/o waiting for closes and anybody that
> >> touches it afterwards gets an error and has to cope with the error. But
> >> even in the unload case, we maybe we shouldn't get to the detach routine
> >> unless we're forcing and/or the detach routine just returns EBUSY since the
> >> only one that knows what dev_t's are associated with the device_t is the
> >> driver itself.
> > You are asking very basic questions about devfs there.
> >
> > destroy_dev(9) waits for two things:
> > - that all threads left the cdevsw methods for the given device;
> > - that all cdevpriv destructors finished running.
>
> Hi,
>
> > To facilitate waking up threads potentially sleeping inside the cdevsw
> > methods, drivers might implement d_purge method which must weed out sleeping
> > threads from inside the code in the bound time.
>
> USB will purge all callers before calling destroy_dev(). This is not the
> problem.
>
> > After that we return from destroy_dev(9) and guarantee that no new calls
> > into cdevsw is done for this device.  devfs magic consumes  the fo_ and
> > VOP_ calls and does not allow them to reach into the driver.
>
> When I designed the current USB devfs it was important to me to keep
> open() and close() calls balanced to avoid situations where an open call
> may setup some resource and then close(), which free this resource
> again, never gets called. destroy_dev(9) makes no such guarantee, and I
> think that is a failure of destroy_dev(9). That's when I started using
> the cdev's destructor callback function.
Lets correct the terminology first.
Are you referring to the d_open/d_close pairing ?

Without D_TRACKCLOSE, d_close() is only called on the last close of
the device.  With D_TRACKCLOSE, devfs _tries_ to call d_close each time
it sees the VOP_CLOSE() operation from VFS, but due to way VFS works
VOP_CLOSE() could be missed.  Also, d_open vs d_close are not synchronized,
so a driver might get call to d_open in parallel to last d_close.

What do you mean by cdev destructor callback function ?  Do you mean
callback from destroy_dev_cb(), or do you actually reference the
destructors from devfs_set_cdevpriv(9) ?

If the later, then destroy_dev() guarantees that all cdevpriv destructors
for all file descriptors opened against the destroyed cdev are finished
before destroy_dev() returns.  In other words, if you use cdevpriv, you
can remove the drain for your refcount and everything should just work.

>
> > So what usb does there is actively defeating existing mechanism by
> > keeping internal refcount on opens and refusing to call destroy_dev()
> > until the count goes to zero
>
> The FreeBSD USB stack also is used in environments w/o DEVFS and need
> own refcounts.
I completely disagree with use of code sharing as excuse for FreeBSD bugs.

>
> > (I did not read the usb code, but I believe
> > that I am not too wrong).  
>  >Would usb core just destroy_dev() when the
> > physical device goes away, then at worst the existing file descriptors
> > opened against the lost devices would become dead (not same dead as
> > terminals after revoke(2), but very similar).
>
> Yes, I can do that if destroy_dev() ensures that d_close is called for
> all open file handles once and only once before it returns. I think this
> is where the problem comes from.
See above.  For d_close it is impossible, for cdevpriv dtr it is already
there by design.

>
> >
> > If the problem is due to keeping some instance data for the opened device,
> > then cdevpriv might be the better fit (at least the KPI was designed
> > to be) than blocking destroy until all users are gone.
> >
>
> The USB stack does not use MMAP, so this is not a problem.
I do not follow, why does it matter ?

On Sat, Mar 09, 2019 at 10:40:02PM +0100, Hans Petter Selasky wrote:

> On 3/9/19 8:28 PM, Rozhuk Ivan wrote:
> > On Sat, 9 Mar 2019 18:26:40 +0200
> > Konstantin Belousov <[hidden email]> wrote:
> >
> >> In fact I saw something similar with apcupsd and either usb/com
> >> adapters or native usb control card for APC UPSes.  For reasons I do
> >> not understand, these devices are often disconnected.  For older
> >> versions of apcupsd, it required restart for newly reattached device
> >> to be recreated in /dev. Sometimes it hangs whole usb stack.
> >>
> >> Newer apcupsd seems to open /dev/ugen only for the duration of the
> >> query, which makes the erratic behaviour is much less likely, but
> >> could still cause breakage when device disappear while apcupsd has it
> >> opened.
> >>
> >
> > Same problem with usb sound cards.
> > I try to fix it, but fail with dsp, only mixer can be fixed with small code change.
> > https://reviews.freebsd.org/D11140
> >
>
> Hi,
>
> How will these apps detect that they need to open the new /dev/mixer node?
>
> I mean, after hang is fixed, mixer app will still try to query the old
> file handle forever?
Userspace gets either ENXIO or EIO from syscalls.  For polls, it gets
POLLIN | POLLHUP immediately.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
On 3/10/19 10:47 AM, Konstantin Belousov wrote:

> On Sat, Mar 09, 2019 at 10:35:28PM +0100, Hans Petter Selasky wrote:
>> On 3/9/19 8:23 PM, Konstantin Belousov wrote:
>>> On Sat, Mar 09, 2019 at 11:41:31AM -0700, Warner Losh wrote:
>>>>
>>>> Is there a form of destroy_dev() that does a revoke on all open instances?
>>>> Eg, this is gone, you can't use it anymore, and all further attempts to use
>>>> the device will generate an error, but in the mean time we destroy the
>>>> device and let the detach routine get on with life. waiting may make sense
>>>> when you are merely unloading the driver (and getting to the detach routine
>>>> that way), but when the device is gone, I've come around to the point of
>>>> view that we should just destroy it w/o waiting for closes and anybody that
>>>> touches it afterwards gets an error and has to cope with the error. But
>>>> even in the unload case, we maybe we shouldn't get to the detach routine
>>>> unless we're forcing and/or the detach routine just returns EBUSY since the
>>>> only one that knows what dev_t's are associated with the device_t is the
>>>> driver itself.
>>> You are asking very basic questions about devfs there.
>>>
>>> destroy_dev(9) waits for two things:
>>> - that all threads left the cdevsw methods for the given device;
>>> - that all cdevpriv destructors finished running.
>>
>> Hi,
>>
>>> To facilitate waking up threads potentially sleeping inside the cdevsw
>>> methods, drivers might implement d_purge method which must weed out sleeping
>>> threads from inside the code in the bound time.
>>
>> USB will purge all callers before calling destroy_dev(). This is not the
>> problem.
>>
>>> After that we return from destroy_dev(9) and guarantee that no new calls
>>> into cdevsw is done for this device.  devfs magic consumes  the fo_ and
>>> VOP_ calls and does not allow them to reach into the driver.
>>
>> When I designed the current USB devfs it was important to me to keep
>> open() and close() calls balanced to avoid situations where an open call
>> may setup some resource and then close(), which free this resource
>> again, never gets called. destroy_dev(9) makes no such guarantee, and I
>> think that is a failure of destroy_dev(9). That's when I started using
>> the cdev's destructor callback function.
> Lets correct the terminology first.
> Are you referring to the d_open/d_close pairing ?
>
> Without D_TRACKCLOSE, d_close() is only called on the last close of
> the device.  With D_TRACKCLOSE, devfs _tries_ to call d_close each time
> it sees the VOP_CLOSE() operation from VFS, but due to way VFS works
> VOP_CLOSE() could be missed.  Also, d_open vs d_close are not synchronized,
> so a driver might get call to d_open in parallel to last d_close.

Hi,

I'm using D_TRACKCLOSE.

>
> What do you mean by cdev destructor callback function ?  Do you mean
> callback from destroy_dev_cb(), or do you actually reference the
> destructors from devfs_set_cdevpriv(9) ?

Yes, I mean the use of devfs_set_cdevpriv(9).

> If the later, then destroy_dev() guarantees that all cdevpriv destructors
> for all file descriptors opened against the destroyed cdev are finished
> before destroy_dev() returns.  In other words, if you use cdevpriv, you
> can remove the drain for your refcount and everything should just work.

Using devfs_set_cdevpriv(9), means destroy_dev() will be called after
the last close() on the file descriptor in question. Is this strictly
needed? You see this in the code that devfs_close_f() calls devfs_fpdrop().

>
>>
>>> So what usb does there is actively defeating existing mechanism by
>>> keeping internal refcount on opens and refusing to call destroy_dev()
>>> until the count goes to zero
>>
>> The FreeBSD USB stack also is used in environments w/o DEVFS and need
>> own refcounts.

 >
> I completely disagree with use of code sharing as excuse for FreeBSD bugs.
 >

Like said, using devfs_set_cdevpriv(9), which the USB stack needs,
basically means we are waiting for the final user-space close() or
"struct file" refcount drop. This behaviour is not like announced. I
would like to have the cdevpriv's destructor executed before
destroy_dev() returns.

Again, the USB stack needs paired operation. An open() call must always
be followed by a close() call on the same FD. Else memory resources will
leak simply.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
In reply to this post by Konstantin Belousov
On 3/10/19 10:47 AM, Konstantin Belousov wrote:
>> Hi,
>>
>> How will these apps detect that they need to open the new /dev/mixer node?
>>
>> I mean, after hang is fixed, mixer app will still try to query the old
>> file handle forever?
> Userspace gets either ENXIO or EIO from syscalls.  For polls, it gets
> POLLIN | POLLHUP immediately.
>

It is likely that the app doesn't check the return value from the mixer
IOCTL at all.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: USB stack getting confused

Hans Petter Selasky-6
In reply to this post by Konstantin Belousov
On 3/10/19 10:47 AM, Konstantin Belousov wrote:
>> Yes, I can do that if destroy_dev() ensures that d_close is called for
>> all open file handles once and only once before it returns. I think this
>> is where the problem comes from.
> See above.  For d_close it is impossible, for cdevpriv dtr it is already
> there by design.
>

Yes, cdevpriv_dtr will wait for the final close() from user-space
unfortunately. Or am I mistaken?

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
12