CAM Target Layer and dev/isp

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

CAM Target Layer and dev/isp

Trent Nelson-2
Hi Ken,

So, first off, nice work on CTL!  I accidentally stumbled onto this little
gem yesterday and was shocked I hadn't heard of it before.  (Although it
seems I'm not alone, even Google only seems to know about your original
mailing list posts -- `+freebsd +ctladm` returns surprisingly little.)

(Somewhat related question before I get into my main issue: `ctladm`
should remove the need for share/examples/scsi_target, right?  The CAM
Target Layer stuff encompasses all the original functionality exposed in
scsi_target?)

I'm not sure how to expose LUNs via my isp devices.  Here's what I've got
so far:

[root@s24/ttypts/1(../misc/isp/bin)#] for i in {0..3}; do; ./isp_getrole
/dev/isp$i; done
/dev/isp0 chan 0: role target
/dev/isp1 chan 0: role target
/dev/isp2 chan 0: role target
/dev/isp3 chan 0: role target

[root@s24/ttypts/1(../misc/isp/bin)#] ctladm devlist -v
LUN Backend       Size (Blocks)   BS Serial Number    Device ID
  0 ramdisk             2097152  512 MYSERIAL   0     MYDEVID   0
      lun_type=0

[root@s24/ttypts/1(../misc/isp/bin)#] ctladm port -l
Port Online Type     Name         pp vp WWNN               WWPN
 
0    YES    IOCTL    CTL ioctl    0  0  0                  0
 
1    YES    INTERNAL ctl2cam      0  0  0x5000000357375b00
0x5000000357375b02
2    YES    INTERNAL CTL internal 0  0  0                  0
 
3    YES    FC       isp0         0  0  0x200000e08b146178
0x210000e08b146178
4    YES    FC       isp1         1  0  0x400000007f000009
0x400000007f000009
5    YES    FC       isp2         2  0  0x200000e08b102f5b
0x210000e08b102f5b
6    YES    FC       isp3         3  0  0x200000e08b302f5b
0x210100e08b302f5b


It's not obvious from the docs how to export CTL LUNs through my isp
devices.  I tried passing -D /dev/isp[n] to `ctladm create`, but that
returns: 'ctladm: cctl_create_lun: error issuing CTL_LUN_REQ ioctl:
Inappropriate ioctl for device'.

Sample output from one of the switches on my fabric (s24's /dev/isp1 is
plugged into port 8 of sf1):

sf1:admin> switchshow
switchName: sf1
switchType: 17.2
switchState: Online
switchMode: Native
switchRole: Principal
switchDomain: 1
switchId: fffc01
switchWwn: 10:00:00:60:69:5a:1a:40
switchBeacon: OFF
Zoning:       ON (cfg_2012_07_16)
port  0: id N2 Online         F-Port 50:06:0b:00:00:13:18:72
port  1: id N2 No_Light
port  2: id N2 Online         E-Port 10:00:00:60:69:5a:09:c0 "sf2"
(downstream)
port  3: id N2 No_Light
port  4: id N2 No_Light
port  5: id N2 Online         F-Port 21:01:00:e0:8b:ab:cf:be
port  6: id N2 Online         F-Port 21:01:00:e0:8b:a6:98:ca
port  7: id N2 Online         F-Port 21:01:00:e0:8b:30:2f:5b
port  8: id N1 Online         L-Port 1 public
port  9: id N2 No_Light
port 10: id N2 No_Light
port 11: id N2 Online         L-Port 8 public
port 12: id N2 Online         L-Port 8 public
port 13: id N2 Online         L-Port 8 public
port 14: id N2 Online         L-Port 8 public
port 15: id N2 No_Light
sf1:admin> portshow 8
portName:  
portHealth: No License
Authentication: None
portFlags:  0x223806b portLbMod:  0x0 PRESENT ACTIVE F_PORT L_PORT U_PORT
LOGIN NOELP LED ACCEPT WAS_EPORT
portType:   4.1
portState:  1 Online
portPhys:   6 In_Sync
portScn:    6 F_Port
portRegs:   0x81100000
portData:   0x102b8f40
portId:     010800
portWwn:    20:08:00:60:69:5a:1a:40
portWwn of device(s) connected: 40:00:00:00:7f:00:00:09
Distance:   normal
Speed:      N1Gbps

Interrupts:        10126      Link_failure: 0          Frjt:         0
     
Unknown:           51         Loss_of_sync: 10058      Fbsy:         0
     
Lli:               10084      Loss_of_sig:  2          Lip_in:       1
     
Proc_rqrd:         21         Protocol_err: 0          Lip_out:      2
     
Timed_out:         0          Invalid_word: 0          Lip_rx:       F7,F7
Rx_flushed:        0          Invalid_crc:  0
Tx_unavail:        0          Delim_err:    0
Free_buffer:       0          Address_err:  1
Overrun:           0          Lr_in:        0
Suspended:         0          Lr_out:       0
Parity_err:        0          Ols_in:       0
                              Ols_out:      0
sf1:admin>



Ideally I'd like to be able to use CTL to export multiple ZFS zvols as
separate targets (i.e. all with unique WWNN/WWPNs), such that, from the
fabric's point of view, the port would look like just another FC-AL JBOD,
like, say, port 11:

sf1:admin> portshow 11
portName:  
portHealth: No License
Authentication: None
portFlags:  0x223806b portLbMod:  0x0 PRESENT ACTIVE F_PORT L_PORT U_PORT
LOGIN NOELP LED ACCEPT WAS_EPORT
portType:   4.1
portState:  1 Online
portPhys:   6 In_Sync
portScn:    6 F_Port
portRegs:   0x81130000
portData:   0x11dc6770
portId:     010b00
portWwn:    20:0b:00:60:69:5a:1a:40
portWwn of device(s) connected: 21:00:00:14:c3:ca:23:ca
        21:00:00:14:c3:c1:9c:90
        21:00:00:14:c3:c4:41:9a
        21:00:00:14:c3:c1:23:6b
        21:00:00:14:c3:ca:23:df
        21:00:00:14:c3:c4:47:97
        21:00:00:14:c3:ca:20:0e
        21:00:00:14:c3:c4:40:ca


That'll allow me to zone the zvols just as if they were actual JBOD disks.
 Which will be awesome.

Is that possible?

Also, the HA stuff sounds bad-ass.  What's the best way to stay up to date
with CTL development?  Watch commits to sys/cam/ctl?  (There's no separate
list or anything for this stuff, right?)

Regards,

        Trent.


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: CAM Target Layer and dev/isp

Ken Merry
On Mon, Jul 16, 2012 at 20:14:31 -0700, Trent Nelson wrote:
> Hi Ken,
>
> So, first off, nice work on CTL!  I accidentally stumbled onto this little
> gem yesterday and was shocked I hadn't heard of it before.  (Although it
> seems I'm not alone, even Google only seems to know about your original
> mailing list posts -- `+freebsd +ctladm` returns surprisingly little.)

Glad you like it!

> (Somewhat related question before I get into my main issue: `ctladm`
> should remove the need for share/examples/scsi_target, right?  The CAM
> Target Layer stuff encompasses all the original functionality exposed in
> scsi_target?)

Well, scsi_target is still useful as an example of how to create a userland
target application.

> I'm not sure how to expose LUNs via my isp devices.  Here's what I've got
> so far:
>
> [root@s24/ttypts/1(../misc/isp/bin)#] for i in {0..3}; do; ./isp_getrole
> /dev/isp$i; done
> /dev/isp0 chan 0: role target
> /dev/isp1 chan 0: role target
> /dev/isp2 chan 0: role target
> /dev/isp3 chan 0: role target

Looks good.  What model Qlogic board is it?  I know the driver works pretty
well in target mode with 4Gb and 8Gb boards, but I don't know how well 2Gb
and older boards work.

> [root@s24/ttypts/1(../misc/isp/bin)#] ctladm devlist -v
> LUN Backend       Size (Blocks)   BS Serial Number    Device ID
>   0 ramdisk             2097152  512 MYSERIAL   0     MYDEVID   0
>       lun_type=0

That looks fine.  Note that it is a fake ramdisk that is only backed by 1MB
of memory no matter how large the reported size is.  So you can make it as
large as you want.

> [root@s24/ttypts/1(../misc/isp/bin)#] ctladm port -l
> Port Online Type     Name         pp vp WWNN               WWPN
>  
> 0    YES    IOCTL    CTL ioctl    0  0  0                  0
>  
> 1    YES    INTERNAL ctl2cam      0  0  0x5000000357375b00
> 0x5000000357375b02
> 2    YES    INTERNAL CTL internal 0  0  0                  0
>  
> 3    YES    FC       isp0         0  0  0x200000e08b146178
> 0x210000e08b146178
> 4    YES    FC       isp1         1  0  0x400000007f000009
> 0x400000007f000009
> 5    YES    FC       isp2         2  0  0x200000e08b102f5b
> 0x210000e08b102f5b
> 6    YES    FC       isp3         3  0  0x200000e08b302f5b
> 0x210100e08b302f5b

All of the ports are online, so that looks good.

> It's not obvious from the docs how to export CTL LUNs through my isp
> devices.  I tried passing -D /dev/isp[n] to `ctladm create`, but that
> returns: 'ctladm: cctl_create_lun: error issuing CTL_LUN_REQ ioctl:
> Inappropriate ioctl for device'.

The block backend only works on block devices or files.  e.g.:

ctladm create -b block -o file=/path/to/my/file
ctladm create -b block -o file=/dev/da5

If you use a block device (like a zvol) as the backing store, you'll want
to disable sending cache syncs to the disk, since that will trigger a GEOM
assertion.

ctladm realsync off

(Do that before putting the ports online.)

> Sample output from one of the switches on my fabric (s24's /dev/isp1 is
> plugged into port 8 of sf1):
>
> sf1:admin> switchshow
> switchName: sf1
> switchType: 17.2
> switchState: Online
> switchMode: Native
> switchRole: Principal
> switchDomain: 1
> switchId: fffc01
> switchWwn: 10:00:00:60:69:5a:1a:40
> switchBeacon: OFF
> Zoning:       ON (cfg_2012_07_16)
> port  0: id N2 Online         F-Port 50:06:0b:00:00:13:18:72
> port  1: id N2 No_Light
> port  2: id N2 Online         E-Port 10:00:00:60:69:5a:09:c0 "sf2"
> (downstream)
> port  3: id N2 No_Light
> port  4: id N2 No_Light
> port  5: id N2 Online         F-Port 21:01:00:e0:8b:ab:cf:be
> port  6: id N2 Online         F-Port 21:01:00:e0:8b:a6:98:ca
> port  7: id N2 Online         F-Port 21:01:00:e0:8b:30:2f:5b
> port  8: id N1 Online         L-Port 1 public

Looks like it is in loop mode.  Can your switch make a loop mode device
visible on another port?  What are you using for an initiator?  Does it
work if you connect the initiator directly to the FreeBSD target?

> Ideally I'd like to be able to use CTL to export multiple ZFS zvols as
> separate targets (i.e. all with unique WWNN/WWPNs), such that, from the
> fabric's point of view, the port would look like just another FC-AL JBOD,
> like, say, port 11:

[ ... ]

> That'll allow me to zone the zvols just as if they were actual JBOD disks.
>  Which will be awesome.
>
> Is that possible?

CTL will just create multiple LUNs, not multiple targets.  Each LUN will
show up on all of the ports.

If your switch has NP-IV support, you can also try creating multiple
virtual ports with the isp(4) driver if you set the hint.isp.0.vports=N
loader tunable, where N is the number of virtual ports.  I haven't tried
that in several years, though, and Matt Jacob has indicated it needs more
testing.

As for using zvols, the code that is in FreeBSD right now will lead to very
slow performance with zvols.  Justin Gibbs and Will Andrews gave a talk at
BSDCan that explained their work to eliminate COW (Copy On Write) faults
for files used as block devices and zvols.  Their slides are here:

http://www.bsdcan.org/2012/schedule/events/316.en.html

And the talk itself is here:

http://www.youtube.com/watch?v=LtY3vpX-cdM

It's fine to use zvols now, but you may want to wait until their code does
into FreeBSD/head at least to use zvols for anything that requires
reasonable performance.

Until their code goes in the tree, you'll probably get somewhat faster
performance by using files on top of ZFS, or on top of UFS.  (UFS will be
much faster at the moment, but you don't get the software RAID
functionality of ZFS.)

The first write pass through on a zvol or a ZFS file will go very quickly,
but subsequent writes will be pretty slow, especially if they are not on
exact ZFS record boundaries.

Also, Matt Jacob and I are chasing down a possible data corruption bug.
We don't know exactly where it is, and it might not be in the code that
is in FreeBSD right now.  The point is, run some data integrity tests
before using this in production.

> Also, the HA stuff sounds bad-ass.  What's the best way to stay up to date
> with CTL development?  Watch commits to sys/cam/ctl?  (There's no separate
> list or anything for this stuff, right?)

That's pretty much the best way to keep up with it, there's not a separate
mailing list.

I don't think I'm going to have time to do anything with the HA hooks in
the near future.  Hopefully other folks will be interested and do some
development in that area.  It would be nice to have a fully HA block stack,
but that will take a lot of effort.

Ken
--
Kenneth Merry
[hidden email]
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: CAM Target Layer and dev/isp

Trent Nelson-2

On 7/17/12 12:46 PM, "Kenneth D. Merry" <[hidden email]> wrote:

>On Mon, Jul 16, 2012 at 20:14:31 -0700, Trent Nelson wrote:
>> I'm not sure how to expose LUNs via my isp devices.  Here's what I've
>>got
>> so far:
>>
>> [root@s24/ttypts/1(../misc/isp/bin)#] for i in {0..3}; do; ./isp_getrole
>> /dev/isp$i; done
>> /dev/isp0 chan 0: role target
>> /dev/isp1 chan 0: role target
>> /dev/isp2 chan 0: role target
>> /dev/isp3 chan 0: role target
>
>Looks good.  What model Qlogic board is it?  I know the driver works
>pretty
>well in target mode with 4Gb and 8Gb boards, but I don't know how well 2Gb
>and older boards work.

I'm still frolicking in 2Gb land, SAN-wise.  All my HBAs are Qlogic 2Gb
(single and dual port) models.  Sample dmesg from one of the dual port
ones:

sp0: Board Type 2312, Chip Revision 0x2, loaded F/W Revision 3.3.26
isp0: Attributes: TargetMode SCC-Lun Fabric 2K-Login
isp0: 876 max I/O command limit set


>
>> [root@s24/ttypts/1(../misc/isp/bin)#] ctladm devlist -v
>> LUN Backend       Size (Blocks)   BS Serial Number    Device ID
>>   0 ramdisk             2097152  512 MYSERIAL   0     MYDEVID   0
>>       lun_type=0
>
>That looks fine.  Note that it is a fake ramdisk that is only backed by
>1MB
>of memory no matter how large the reported size is.  So you can make it as
>large as you want.

Haha, yeah I was wondering about the "10485760000000000" you used in the
examples.  You should definitely make mention of that 1MB fact in the docs
somewhere if you're going to use 9536TB as the size in the example ;-)

(My interaction went like this: created the crazy huge one used in the
examples, was thoroughly confused when I realized it was 9536TB, deleted
it, then created a 1GB one instead.)

>All of the ports are online, so that looks good.
>
>> It's not obvious from the docs how to export CTL LUNs through my isp
>> devices.  I tried passing -D /dev/isp[n] to `ctladm create`, but that
>> returns: 'ctladm: cctl_create_lun: error issuing CTL_LUN_REQ ioctl:
>> Inappropriate ioctl for device'.
>
>The block backend only works on block devices or files.  e.g.:
>
>ctladm create -b block -o file=/path/to/my/file
>ctladm create -b block -o file=/dev/da5
>
>If you use a block device (like a zvol) as the backing store, you'll want
>to disable sending cache syncs to the disk, since that will trigger a GEOM
>assertion.
>
>ctladm realsync off
>
>(Do that before putting the ports online.)

Ah, ok, I played a bit more with block devices instead of ramdisk, and it
cleared up some stuff.

>
>> Sample output from one of the switches on my fabric (s24's /dev/isp1 is
>> plugged into port 8 of sf1):
>>
>> sf1:admin> switchshow
>> switchName: sf1
>> switchType: 17.2
>> switchState: Online
>> switchMode: Native
>> switchRole: Principal
>> switchDomain: 1
>> switchId: fffc01
>> switchWwn: 10:00:00:60:69:5a:1a:40
>> switchBeacon: OFF
>> Zoning:       ON (cfg_2012_07_16)
>> port  0: id N2 Online         F-Port 50:06:0b:00:00:13:18:72
>> port  1: id N2 No_Light
>> port  2: id N2 Online         E-Port 10:00:00:60:69:5a:09:c0 "sf2"
>> (downstream)
>> port  3: id N2 No_Light
>> port  4: id N2 No_Light
>> port  5: id N2 Online         F-Port 21:01:00:e0:8b:ab:cf:be
>> port  6: id N2 Online         F-Port 21:01:00:e0:8b:a6:98:ca
>> port  7: id N2 Online         F-Port 21:01:00:e0:8b:30:2f:5b
>> port  8: id N1 Online         L-Port 1 public
>
>Looks like it is in loop mode.  Can your switch make a loop mode device
>visible on another port?

You know what, I have no idea what was going on there.  I can't replicate
that behavior (getting the port to present itself to the fabric as an
L-port) on another (almost identical) box.  And I managed to panic that
box whilst composing this e-mail, to the point where it doesn't even get
past the PCI BIOS routines.  (It happened when I unplugged the HBA, I'll
paste a backtrace and CC Matt in a separate e-mail.)

>  What are you using for an initiator?  Does it
>work if you connect the initiator directly to the FreeBSD target?

Ah, ok, those questions (and various other comments you made) have cleared
up a couple of things.  What I was expecting to see from the switch when I
did `portshow [n]` was a list of all the LUNs (via their WWNs) being made
available by camctl/isp/target-mode.  This is what I'm alluding to when I
say this:

> > Ideally I'd like to be able to use CTL to export multiple ZFS zvols as
> > separate targets (i.e. all with unique WWNN/WWPNs), such that, from the
> > fabric's point of view, the port would look like just another FC-AL
>JBOD,
> > like, say, port 11:
> >
> > sf1:admin> portshow 11
[snip]
> > portWwn:    20:0b:00:60:69:5a:1a:40
> > portWwn of device(s) connected: 21:00:00:14:c3:ca:23:ca
> > 21:00:00:14:c3:c1:9c:90
> > 21:00:00:14:c3:c4:41:9a
> > 21:00:00:14:c3:c1:23:6b
> > 21:00:00:14:c3:ca:23:df
> > 21:00:00:14:c3:c4:47:97
> > 21:00:00:14:c3:ca:20:0e
> > 21:00:00:14:c3:c4:40:ca

However, it just occurred to me that in order for that to happen, the
block LUN I'm exporting would essentially have to mimic/implement an
FC-AL-ported disk (that is public-loop aware) -- I.e. it would have to
support fabric login and all the other fancy cruft my physical dual-ported
FC SCSI drives in the FC-AL JBODs implement.


I reckon it's safe to assume none of that is in place, no? ;-)

(Would that even be possible?  Seems like it would need a lot of
underlying driver support in dev/isp, at the very least.  As well as a
bucket-load of ctl support.  Perhaps better suited to a new backend type?
I.e. `ctladm create -b fc-al ...`.)

Now, with that being said, back to your questions...

> What are you using for an initiator?
> Does it work if you connect the initiator directly to the FreeBSD target?


Right, so, I haven't played around with any initiators yet as I wasn't
seeing the expected output from the SAN switch (I.e. the WWNs of "devices
connected to this port").  Now that I know this isn't going to happen,
I'll try see if I can forcibly connect initiators.  I was happy to see
that when I created a block device via ctladm, it was assigned WWPN/WWNNs
automatically.

I'm not sure if this means I'll still be able to use a fabric (with zoning
enabled) or not.  If I have no luck with that approach, I'll try plug the
initiator's HBA port directly into the target HBA and see if that helps.

>If your switch has NP-IV support, you can also try creating multiple
>virtual ports with the isp(4) driver if you set the hint.isp.0.vports=N
>loader tunable, where N is the number of virtual ports.  I haven't tried
>that in several years, though, and Matt Jacob has indicated it needs more
>testing.

Yeah unfortunately neither my switches nor my HBAs have support for VSAN
stuff.

>As for using zvols, the code that is in FreeBSD right now will lead to
>very
>slow performance with zvols.  Justin Gibbs and Will Andrews gave a talk at
>BSDCan that explained their work to eliminate COW (Copy On Write) faults
>for files used as block devices and zvols.  Their slides are here:
>
>http://www.bsdcan.org/2012/schedule/events/316.en.html
>
>And the talk itself is here:
>
>http://www.youtube.com/watch?v=LtY3vpX-cdM
>
>It's fine to use zvols now, but you may want to wait until their code does
>into FreeBSD/head at least to use zvols for anything that requires
>reasonable performance.
>
>Until their code goes in the tree, you'll probably get somewhat faster
>performance by using files on top of ZFS, or on top of UFS.  (UFS will be
>much faster at the moment, but you don't get the software RAID
>functionality of ZFS.)
>
>The first write pass through on a zvol or a ZFS file will go very quickly,
>but subsequent writes will be pretty slow, especially if they are not on
>exact ZFS record boundaries.

Gotcha'.  Super info, thanks.  The "perfect" solution down the track would
be zvol, but there's an enormous amount of other stuff I'd need to flush
about before that.

(What I'm aiming to do is present zvols to either virtual or physical
boxes for the OS install disk.  That would allow me to snapshot/clone
entire OS instances (I.e. AIX/HP-UX/IRIX/Tru64-UNIX et al) and export them
as a new disk that could be picked up by another box, which would be
incredibly cool for provisioning, tinkering, dev stuff, etc.)

>Also, Matt Jacob and I are chasing down a possible data corruption bug.
>We don't know exactly where it is, and it might not be in the code that
>is in FreeBSD right now.  The point is, run some data integrity tests
>before using this in production.

Heh, roger.  (FWIW, this is all pie-in-the-sky level at the moment --
it'll be a while before I can put it into production.)

>
>> Also, the HA stuff sounds bad-ass.  What's the best way to stay up to
>>date
>> with CTL development?  Watch commits to sys/cam/ctl?  (There's no
>>separate
>> list or anything for this stuff, right?)
>
>That's pretty much the best way to keep up with it, there's not a separate
>mailing list.
>
>I don't think I'm going to have time to do anything with the HA hooks in
>the near future.  Hopefully other folks will be interested and do some
>development in that area.  It would be nice to have a fully HA block
>stack,
>but that will take a lot of effort.

Nod.  It's certainly not a deal breaker for me, and I can get around it by
other (clunky) means, but it certainly would be awesome.  I'd essentially
be able to build a ZFS-backed storage controller indistinguishable from
the (proprietary) HA RAID controllers that I was originally intended to
use with my disk arrays instead of just straight JBOD.)

Thanks for such thorough information.  Very useful.  I've got a few things
to play around with now.

Regards,

        Trent.


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

dev/isp panic (was Re: CAM Target Layer and dev/isp)

Trent Nelson-2


On 7/18/12 9:13 AM, "Trent Nelson" <[hidden email]> wrote:

>
>On 7/17/12 12:46 PM, "Kenneth D. Merry" <[hidden email]> wrote:
>
>>> port  8: id N1 Online         L-Port 1 public
>>
>>Looks like it is in loop mode.  Can your switch make a loop mode device
>>visible on another port?
>
>You know what, I have no idea what was going on there.  I can't replicate
>that behavior (getting the port to present itself to the fabric as an
>L-port) on another (almost identical) box.  And I managed to panic that
>box whilst composing this e-mail, to the point where it doesn't even get
>past the PCI BIOS routines.  (It happened when I unplugged the HBA, I'll
>paste a backtrace and CC Matt in a separate e-mail.)

db> bt
Tracing pid 12 tid 100062 td 0xfffffe001c00f470
acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
acpi_timer_get_timecount+0x16
DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
putbuf() at 0xffffffff8097400c = putbuf+0xac
kvprintf() at 0xffffffff80972643 = kvprintf+0x83
vprintf() at 0xffffffff80973b15 = vprintf+0x85
printf() at 0xffffffff80973be7 = printf+0x67
isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
isp_async() at 0xffffffff805c6736 = isp_async+0x356
isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
intr_event_execute_handlers() at 0xffffffff80907214 =
intr_event_execute_handlers+0x104
ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
db> x/s panicstr
0xffffffff8130fd08 = panicstr:


No idea what the panic string was.  The box stopped responding via ssh so
I logged into the serial port (via a sio mux over telnet) and simply got a
'db>' prompt -- no idea what came before that (and the vidconsole was
strangely blank).

Not sure how useful the backtrace is.  It's a Qlogic 2313 board, it was in
target mode, and the panic happened after I unplugged the HBA.  (Could be
completely unrelated -- I unplugged the HBA, waltzed back to my laptop,
and my ssh sessions were all stuck -- I.e. I didn't witness it panic
immediately after unplugging.)

I've got a few more commands like show intr|ps|thread etc in my buffer,
I'll paste if that's of any use.  I'll see if I can replicate it later
today (I've also fixed it so that I've got a proper dump device now).  Let
me know if there are any ddb commands that would be of use to you if it
happens again.

Box is running: FreeBSD s16.snakebite.net 9.1-PRERELEASE FreeBSD
9.1-PRERELEASE #0 r0: Mon Jul 16 06:28:19 UTC 2012
[hidden email]:/usr/obj/src/freebsd/9/r238513m/sys/AMD64
amd64

Also worth mentioning, heh, I manually svn merged available changes
reported in head/dev/isp to my local stable/9 branch.  The branches aren't
identical (seems there were some head changes that svn doesn't think are
eligible for merging back), but I haven't looked into the specifics.
Relevant thread:
http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068828.html.

If I can reliably reproduce the panic, I'll revert back to a clean
stable/9/dev/isp first to see if my cowboy merging is to blame ;-)


        Trent.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Matthew Jacob-2
--------------

db> bt
Tracing pid 12 tid 100062 td 0xfffffe001c00f470
acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
acpi_timer_get_timecount+0x16
DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
putbuf() at 0xffffffff8097400c = putbuf+0xac
kvprintf() at 0xffffffff80972643 = kvprintf+0x83
vprintf() at 0xffffffff80973b15 = vprintf+0x85
printf() at 0xffffffff80973be7 = printf+0x67
isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
isp_async() at 0xffffffff805c6736 = isp_async+0x356
isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
intr_event_execute_handlers() at 0xffffffff80907214 =
intr_event_execute_handlers+0x104
ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
db> x/s panicstr
0xffffffff8130fd08 = panicstr:


-------------

Hmm. No panic string because it wasn't a panic. isp driver is trying to
(successfully) print something and this blew up in the ACPI code. If
there was a bad string it would have blown up in kvprintf.  At least
that's my read of this.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Trent Nelson-2


On 7/18/12 10:00 AM, "Matthew Jacob" <[hidden email]> wrote:

>--------------
>
>db> bt
>Tracing pid 12 tid 100062 td 0xfffffe001c00f470
>acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
>acpi_timer_get_timecount+0x16
>DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
>ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
>uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
>cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
>cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
>putbuf() at 0xffffffff8097400c = putbuf+0xac
>kvprintf() at 0xffffffff80972643 = kvprintf+0x83
>vprintf() at 0xffffffff80973b15 = vprintf+0x85
>printf() at 0xffffffff80973be7 = printf+0x67
>isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
>isp_async() at 0xffffffff805c6736 = isp_async+0x356
>isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
>isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
>intr_event_execute_handlers() at 0xffffffff80907214 =
>intr_event_execute_handlers+0x104
>ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
>fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
>fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
>--- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
>db> x/s panicstr
>0xffffffff8130fd08 = panicstr:
>
>
>-------------
>
>Hmm. No panic string because it wasn't a panic. isp driver is trying to
>(successfully) print something and this blew up in the ACPI code.

Hmm, just to clarify, do you mean "the isp tried to print something that
blew up ACPI", or "ACPI blew up whilst isp just happened to be printing
something"?

Would knowing what isp was trying to print be of any help?  (I can poke
around the *putc buffers if it happens again.)

I'm not surprised that ACPI is involved, though.  This box has always
seemed to run into ACPI issues (HP ProLiant DL585 G1, quad dual-core
Opteron, 64GB RAM), like hanging on boot during the pci->bios probe stuff
from a kernel circa 2-3 months ago.


        Trent.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Matthew Jacob-2
On 7/18/2012 8:58 AM, Trent Nelson wrote:
> Hmm, just to clarify, do you mean "the isp tried to print something
> that blew up ACPI", or "ACPI blew up whilst isp just happened to be
> printing something"? Would knowing what isp was trying to print be of
> any help? (I can poke around the *putc buffers if it happens again.)
> I'm not surprised that ACPI is involved, though. This box has always
> seemed to run into ACPI issues (HP ProLiant DL585 G1, quad dual-core
> Opteron, 64GB RAM), like hanging on boot during the pci->bios probe
> stuff from a kernel circa 2-3 months ago. Trent.

I wouldn't know. Since there are a limited number of printfs from
isp_async you could probably narrow it down. Try booting with ACPI
disabled. Or upgrade f/w.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Andriy Gapon
In reply to this post by Matthew Jacob-2
on 18/07/2012 17:00 Matthew Jacob said the following:

> --------------
>
> db> bt
> Tracing pid 12 tid 100062 td 0xfffffe001c00f470
> acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
> acpi_timer_get_timecount+0x16
> DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
> ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
> uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
> cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
> cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
> putbuf() at 0xffffffff8097400c = putbuf+0xac
> kvprintf() at 0xffffffff80972643 = kvprintf+0x83
> vprintf() at 0xffffffff80973b15 = vprintf+0x85
> printf() at 0xffffffff80973be7 = printf+0x67
> isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
> isp_async() at 0xffffffff805c6736 = isp_async+0x356
> isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
> isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
> intr_event_execute_handlers() at 0xffffffff80907214 =
> intr_event_execute_handlers+0x104
> ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
> fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
> fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
> db> x/s panicstr
> 0xffffffff8130fd08 = panicstr:
>
>
> -------------
>
> Hmm. No panic string because it wasn't a panic. isp driver is trying to
> (successfully) print something and this blew up in the ACPI code. If there was a
> bad string it would have blown up in kvprintf.  At least that's my read of this.

I think that the vague reference to ACPI is unnecessarily too vague, given the
quite obvious stack trace (hint: DELAY) and both simplicity and utility of
acpi_timer_get_timecount (essentially an I/O read operation).
But there is no indication in the above stack trace that something blew up at
all (no magic words like "panic", "trap").

--
Andriy Gapon

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Trent Nelson-2
On 7/19/12 1:55 AM, "Andriy Gapon" <[hidden email]> wrote:

>on 18/07/2012 17:00 Matthew Jacob said the following:
>> --------------
>>
>> db> bt
>> Tracing pid 12 tid 100062 td 0xfffffe001c00f470
>> acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
>> acpi_timer_get_timecount+0x16
>> DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
>> ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
>> uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
>> cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
>> cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
>> putbuf() at 0xffffffff8097400c = putbuf+0xac
>> kvprintf() at 0xffffffff80972643 = kvprintf+0x83
>> vprintf() at 0xffffffff80973b15 = vprintf+0x85
>> printf() at 0xffffffff80973be7 = printf+0x67
>> isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
>> isp_async() at 0xffffffff805c6736 = isp_async+0x356
>> isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
>> isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
>> intr_event_execute_handlers() at 0xffffffff80907214 =
>> intr_event_execute_handlers+0x104
>> ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
>> fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
>> fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
>> db> x/s panicstr
>> 0xffffffff8130fd08 = panicstr:
>>
>>
>> -------------
>>
>> Hmm. No panic string because it wasn't a panic. isp driver is trying to
>> (successfully) print something and this blew up in the ACPI code. If
>>there was a
>> bad string it would have blown up in kvprintf.  At least that's my read
>>of this.
>
>I think that the vague reference to ACPI is unnecessarily too vague,
>given the
>quite obvious stack trace (hint: DELAY) and both simplicity and utility of
>acpi_timer_get_timecount (essentially an I/O read operation).
>But there is no indication in the above stack trace that something blew
>up at
>all (no magic words like "panic", "trap").

Hrm.  What else would cause 'db>' to show up on the console?  Ctrl-Alt-Esc
and hitting a breakpoint are all I can think of at the moment -- and
neither of those are applicable here.


        Trent.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Andriy Gapon
on 19/07/2012 15:22 Trent Nelson said the following:
> Hrm.  What else would cause 'db>' to show up on the console?  Ctrl-Alt-Esc
> and hitting a breakpoint are all I can think of at the moment -- and
> neither of those are applicable here.

That's a very good question.  I honestly don't have any idea.

--
Andriy Gapon


_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Gary Palmer-2
In reply to this post by Trent Nelson-2
On Thu, Jul 19, 2012 at 05:22:21AM -0700, Trent Nelson wrote:

> On 7/19/12 1:55 AM, "Andriy Gapon" <[hidden email]> wrote:
>
> >on 18/07/2012 17:00 Matthew Jacob said the following:
> >> --------------
> >>
> >> db> bt
> >> Tracing pid 12 tid 100062 td 0xfffffe001c00f470
> >> acpi_timer_get_timecount() at 0xffffffff803ccfc6 =
> >> acpi_timer_get_timecount+0x16
> >> DELAY() at 0xffffffff80ce68c3 = DELAY+0x83
> >> ns8250_putc() at 0xffffffff807a17ba = ns8250_putc+0x9a
> >> uart_cnputc() at 0xffffffff807a3b85 = uart_cnputc+0x75
> >> cnputc() at 0xffffffff808e9cbc = cnputc+0x4c
> >> cnputs() at 0xffffffff808ea0f5 = cnputs+0x35
> >> putbuf() at 0xffffffff8097400c = putbuf+0xac
> >> kvprintf() at 0xffffffff80972643 = kvprintf+0x83
> >> vprintf() at 0xffffffff80973b15 = vprintf+0x85
> >> printf() at 0xffffffff80973be7 = printf+0x67
> >> isp_prt() at 0xffffffff805c15a0 = isp_prt+0xd0
> >> isp_async() at 0xffffffff805c6736 = isp_async+0x356
> >> isp_intr() at 0xffffffff805b854c = isp_intr+0x13ac
> >> isp_platform_intr() at 0xffffffff805c1fd9 = isp_platform_intr+0x99
> >> intr_event_execute_handlers() at 0xffffffff80907214 =
> >> intr_event_execute_handlers+0x104
> >> ithread_loop() at 0xffffffff809089a6 = ithread_loop+0xa6
> >> fork_exit() at 0xffffffff809038ef = fork_exit+0x11f
> >> fork_trampoline() at 0xffffffff80c5139e = fork_trampoline+0xe
> >> --- trap 0, rip = 0, rsp = 0xffffff9178bd3cf0, rbp = 0 ---
> >> db> x/s panicstr
> >> 0xffffffff8130fd08 = panicstr:
> >>
> >>
> >> -------------
> >>
> >> Hmm. No panic string because it wasn't a panic. isp driver is trying to
> >> (successfully) print something and this blew up in the ACPI code. If
> >>there was a
> >> bad string it would have blown up in kvprintf.  At least that's my read
> >>of this.
> >
> >I think that the vague reference to ACPI is unnecessarily too vague,
> >given the
> >quite obvious stack trace (hint: DELAY) and both simplicity and utility of
> >acpi_timer_get_timecount (essentially an I/O read operation).
> >But there is no indication in the above stack trace that something blew
> >up at
> >all (no magic words like "panic", "trap").
>
> Hrm.  What else would cause 'db>' to show up on the console?  Ctrl-Alt-Esc
> and hitting a breakpoint are all I can think of at the moment -- and
> neither of those are applicable here.

Is there a serial console attached?  Sending BREAK via serial can also
do it (or used to anyway), and some terminal servers send BREAK when they
reset/reboot.

The fact you were in ns8250_putc() could point at a serial port issue.

Gary
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: dev/isp panic (was Re: CAM Target Layer and dev/isp)

Trent Nelson-2


On 7/20/12 9:53 PM, "Gary Palmer" <[hidden email]> wrote:

>On Thu, Jul 19, 2012 at 05:22:21AM -0700, Trent Nelson wrote:
>>
>>
>> Hrm.  What else would cause 'db>' to show up on the console?
>>Ctrl-Alt-Esc
>> and hitting a breakpoint are all I can think of at the moment -- and
>> neither of those are applicable here.
>
>Is there a serial console attached?  Sending BREAK via serial can also
>do it (or used to anyway), and some terminal servers send BREAK when they
>reset/reboot.

Yeah, the serial port was connected to a console/terminal server -- I had
telnet'd into the relevant port when I saw the 'db>' prompt.  I don't have
'options BREAK_TO_DEBUGGER' in my kernel config (which is based off
GENERIC, and it's not in GENERIC), so I doubt that's it.

I'm also convinced my console server (Jetstream 8500) is physically
incapable of actually sending/simulating BREAK/STOP sequences.  I can't
use it for any of my Sun boxes for this reason.

Good suggestion though -- didn't even know 'options BREAK_TO_DEBUGGER'
(and ALT_BREAK_TO_DEBUGGER) even existed before your e-mail.


        Trent.

_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
To unsubscribe, send any mail to "[hidden email]"