FreeBSD 11 not sending repeated TURs until good status returned?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

FreeBSD 11 not sending repeated TURs until good status returned?

Rebecca Cran
I've been asked about a bug report at $work for a SAS drive, which is
failing to be detected when hot-plugged under FreeBSD 11 since I used to
be a FreeBSD developer. Not being being familiar with the *scsi* code
however, I thought people on this mailing list might have a better idea!

 From the report:

"[the affected drive] is more closely conforming to the T10 spec as a
Mode Sense command does not require media access."

"The proper thing for the system to do is send repeated TURs to the
drive until good status is returned. After that moment, the drive can
properly do I/O."



The log file contains:

   READ(10). CDB: 28 00 2e 93 90 af 00 00 01 00

   CAM status: SCSI Status Error

   SCSI status: Check Condition

   SCSI sense: NOT READY asc:4,1 (Logical unit is in process of becoming
ready)

   Progress: 0% (2/65536) complete

   Polling device for readiness

   mpssas_prepare_remove: Sending reset for target ID 123

da12 at mps0 bus 0 scbus0 target 123 lun 0

...


Does this sound like a valid bug?


--
Rebecca Cran
[hidden email]

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 11 not sending repeated TURs until good status returned?

Alan Somers-2
On Tue, Dec 19, 2017 at 12:30 PM, Rebecca Cran <[hidden email]> wrote:

> I've been asked about a bug report at $work for a SAS drive, which is
> failing to be detected when hot-plugged under FreeBSD 11 since I used to be
> a FreeBSD developer. Not being being familiar with the *scsi* code however,
> I thought people on this mailing list might have a better idea!
>
> From the report:
>
> "[the affected drive] is more closely conforming to the T10 spec as a Mode
> Sense command does not require media access."
>
> "The proper thing for the system to do is send repeated TURs to the drive
> until good status is returned. After that moment, the drive can properly do
> I/O."
>
>
>
> The log file contains:
>
>   READ(10). CDB: 28 00 2e 93 90 af 00 00 01 00
>
>   CAM status: SCSI Status Error
>
>   SCSI status: Check Condition
>
>   SCSI sense: NOT READY asc:4,1 (Logical unit is in process of becoming
> ready)
>
>   Progress: 0% (2/65536) complete
>
>   Polling device for readiness
>
>   mpssas_prepare_remove: Sending reset for target ID 123
>
> da12 at mps0 bus 0 scbus0 target 123 lun 0
>
> ...
>
>
> Does this sound like a valid bug?
>

What's the problem exactly?  Does FreeBSD poll the device or not?   Does
FreeBSD give up too soon, or poll with the wrong command, or what?  And if
you don't mind me asking, what sort of drive is this that takes so long to
come ready?

-Alan
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 11 not sending repeated TURs until good status returned?

Rebecca Cran
On 12/19/2017 05:29 PM, Alan Somers wrote:

>
> What's the problem exactly?  Does FreeBSD poll the device or not?  
> Does FreeBSD give up too soon, or poll with the wrong command, or
> what?  And if you don't mind me asking, what sort of drive is this
> that takes so long to come ready?

FreeBSD thinks the device is ready before it really is, and ends up
issuing read commands that fail, resulting in the device being removed.
The drive is a SAS SSD, and I don't know why it takes longer than most
to become read.

--
Rebecca Cran
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 11 not sending repeated TURs until good status returned?

Ken Merry

> On Dec 19, 2017, at 7:53 PM, Rebecca Cran <[hidden email]> wrote:
>
> On 12/19/2017 05:29 PM, Alan Somers wrote:
>
>>
>> What's the problem exactly?  Does FreeBSD poll the device or not?   Does FreeBSD give up too soon, or poll with the wrong command, or what?  And if you don't mind me asking, what sort of drive is this that takes so long to come ready?
>
> FreeBSD thinks the device is ready before it really is, and ends up issuing read commands that fail, resulting in the device being removed.
> The drive is a SAS SSD, and I don't know why it takes longer than most to become read.
>

I have seen this behavior on some HGST SSDs.  I haven’t had a chance to fully chase it down.

The polling code is in there and is active in this case.  You can tell because of this message:

>
>   Polling device for readiness


It will send a TUR every half second for a minute to wait for the device to become ready, and then retry the read if the TURs succeeded.  I *think* (I’d have to look more closely), it’ll retry the READ four more times, and will go through the 1 minute TUR sequence each time.

But the mpssas_prepare_remove message indicates that this disk (or another one) is getting removed by the controller.

IMO, the sense data probably means the SSD is doing something wrong.  They should become ready before they turn on the SAS port.  The initiator is going to try sending commands as soon as the port comes active.  And if an SSD can’t come ready in a minute (spinning drives take ~10 seconds to spin up), something is wrong.

We’ll probably need full logs to get a better idea of what is going on.

Ken

Ken Merry
[hidden email]

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 11 not sending repeated TURs until good status returned?

Rebecca Cran
On 12/20/2017 09:09 AM, Ken Merry wrote:

> I have seen this behavior on some HGST SSDs.  I haven’t had a chance to fully chase it down.

You're right, this is one of those drives.

> We’ll probably need full logs to get a better idea of what is going on.

Thanks, I've forwarded your email to the people working on it.

--
Rebecca Cran
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: FreeBSD 11 not sending repeated TURs until good status returned?

Ken Merry

> On Dec 20, 2017, at 11:37 AM, Rebecca Cran <[hidden email]> wrote:
>
> On 12/20/2017 09:09 AM, Ken Merry wrote:
>
>> I have seen this behavior on some HGST SSDs.  I haven’t had a chance to fully chase it down.
>
> You're right, this is one of those drives.

Looks like you have it plugged in to a 6Gb controller.  Try plugging it in to a 12Gb expander and 12Gb controller and see what happens.  We have some of those that refuse to probe at all when they’re attached to a 12Gb topology.  6Gb works ok, I think, but we do get those messages.  One of my co-workers is doing the debugging on that one, so I’m not sure whether some or all of the drives are having issues.

>
>> We’ll probably need full logs to get a better idea of what is going on.
>
> Thanks, I've forwarded your email to the people working on it.
>

Ok.

Ken

Ken Merry
[hidden email]

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"