7.1-RELEASE I/O hang

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

7.1-RELEASE I/O hang

Matt Burke-3
I have a machine with a PERC6/e controller. Attached to that are 3 disk
shelves, each configured as individual 14-disk RAID10 arrays (the PERC
annoyingly only lets you use 8 spans per array)

I can run bonnie++ on the arrays individually with no problem.
I can also run it across a gstripe of the arrays with no problem.

However running it over the 3 arrays in parallel causes something I/O
related in the kernel to hang.

To define 'hang' better:

It appears anything which needs disk io, even on a different controller
(albeit the same mfi driver), will hang. A command like 'ps' cached in
ram will work but bash hangs after execution, presumably while trying to
write ~/.bash_history

'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs

I've done some research and it seems the usual cause of bonnie++
crashing a system is due to overflowing TCQ. camcontrol doesn't see any
disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but
it hadn't made any difference.

The bonnie++ invocation is this:

(newfs devices mfid[2-3], mount)
bonnie++ -s 64g -u root -p3
bonnie++ -d /data/2 -s 64g -u root -y s >b2 2>&1 &
bonnie++ -d /data/3 -s 64g -u root -y s >b3 2>&1 &
bonnie++ -d /data/4 -s 64g -u root -y s >b4 2>&1 &

and it always hangs on "Rewriting...". It's a fresh 7.1-RELEASE with
nothing else running (devd, sshd, syslogd, etc)


Any ideas?


--
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 7.1-RELEASE I/O hang

Konstantin Belousov
On Wed, Feb 04, 2009 at 12:46:53PM +0000, Matt Burke wrote:

> I have a machine with a PERC6/e controller. Attached to that are 3 disk
> shelves, each configured as individual 14-disk RAID10 arrays (the PERC
> annoyingly only lets you use 8 spans per array)
>
> I can run bonnie++ on the arrays individually with no problem.
> I can also run it across a gstripe of the arrays with no problem.
>
> However running it over the 3 arrays in parallel causes something I/O
> related in the kernel to hang.
>
> To define 'hang' better:
>
> It appears anything which needs disk io, even on a different controller
> (albeit the same mfi driver), will hang. A command like 'ps' cached in
> ram will work but bash hangs after execution, presumably while trying to
> write ~/.bash_history
>
> 'sysctl -a' works but trying to run 'sysctl kern.msgbuf' also hangs
>
> I've done some research and it seems the usual cause of bonnie++
> crashing a system is due to overflowing TCQ. camcontrol doesn't see any
> disks, so I've tried setting hw.mfi.max_cmds=32 in /boot/loader.conf but
> it hadn't made any difference.
>
> The bonnie++ invocation is this:
>
> (newfs devices mfid[2-3], mount)
> bonnie++ -s 64g -u root -p3
> bonnie++ -d /data/2 -s 64g -u root -y s >b2 2>&1 &
> bonnie++ -d /data/3 -s 64g -u root -y s >b3 2>&1 &
> bonnie++ -d /data/4 -s 64g -u root -y s >b4 2>&1 &
>
> and it always hangs on "Rewriting...". It's a fresh 7.1-RELEASE with
> nothing else running (devd, sshd, syslogd, etc)
>
>
> Any ideas?
Compile ddb into the kernel, and do "ps" from the ddb prompt. If there
are processes hung in the "nbufkv" state, then the patch below might
help.

Index: gnu/fs/xfs/FreeBSD/xfs_buf.c
===================================================================
--- gnu/fs/xfs/FreeBSD/xfs_buf.c (revision 188080)
+++ gnu/fs/xfs/FreeBSD/xfs_buf.c (working copy)
@@ -81,7 +81,7 @@
 {
  struct buf *bp;
 
- bp = geteblk(0);
+ bp = geteblk(0, 0);
  if (bp != NULL) {
  bp->b_bufsize = size;
  bp->b_bcount = size;
@@ -101,7 +101,7 @@
  if (len >= MAXPHYS)
  return (NULL);
 
- bp = geteblk(len);
+ bp = geteblk(len, 0);
  if (bp != NULL) {
  KASSERT(BUF_REFCNT(bp) == 1,
  ("xfs_buf_get_empty: bp %p not locked",bp));
Index: ufs/ffs/ffs_vfsops.c
===================================================================
--- ufs/ffs/ffs_vfsops.c (revision 188080)
+++ ufs/ffs/ffs_vfsops.c (working copy)
@@ -1747,7 +1747,9 @@
     ("bufwrite: needs chained iodone (%p)", bp->b_iodone));
 
  /* get a new block */
- newbp = geteblk(bp->b_bufsize);
+ newbp = geteblk(bp->b_bufsize, GB_NOWAIT_BD);
+ if (newbp == NULL)
+ goto normal_write;
 
  /*
  * set it to be identical to the old block.  We have to
@@ -1787,6 +1789,7 @@
  }
 
  /* Let the normal bufwrite do the rest for us */
+normal_write:
  return (bufwrite(bp));
 }
 
Index: kern/vfs_bio.c
===================================================================
--- kern/vfs_bio.c (revision 188080)
+++ kern/vfs_bio.c (working copy)
@@ -105,7 +105,8 @@
 static void vfs_vmio_release(struct buf *bp);
 static int vfs_bio_clcheck(struct vnode *vp, int size,
  daddr_t lblkno, daddr_t blkno);
-static int flushbufqueues(int, int);
+static int buf_do_flush(struct vnode *vp);
+static int flushbufqueues(struct vnode *, int, int);
 static void buf_daemon(void);
 static void bremfreel(struct buf *bp);
 
@@ -258,6 +259,7 @@
 #define QUEUE_DIRTY_GIANT 3 /* B_DELWRI buffers that need giant */
 #define QUEUE_EMPTYKVA 4 /* empty buffer headers w/KVA assignment */
 #define QUEUE_EMPTY 5 /* empty buffer headers */
+#define QUEUE_SENTINEL 1024 /* not an queue index, but mark for sentinel */
 
 /* Queues for free buffers with various properties */
 static TAILQ_HEAD(bqueues, buf) bufqueues[BUFFER_QUEUES] = { { 0 } };
@@ -1703,21 +1705,23 @@
  */
 
 static struct buf *
-getnewbuf(int slpflag, int slptimeo, int size, int maxsize)
+getnewbuf(struct vnode *vp, int slpflag, int slptimeo, int size, int maxsize,
+    int gbflags)
 {
+ struct thread *td;
  struct buf *bp;
  struct buf *nbp;
  int defrag = 0;
  int nqindex;
  static int flushingbufs;
 
+ td = curthread;
  /*
  * We can't afford to block since we might be holding a vnode lock,
  * which may prevent system daemons from running.  We deal with
  * low-memory situations by proactively returning memory and running
  * async I/O rather then sync I/O.
  */
-
  atomic_add_int(&getnewbufcalls, 1);
  atomic_subtract_int(&getnewbufrestarts, 1);
 restart:
@@ -1949,8 +1953,9 @@
  */
 
  if (bp == NULL) {
- int flags;
+ int flags, norunbuf;
  char *waitmsg;
+ int fl;
 
  if (defrag) {
  flags = VFS_BIO_NEED_BUFSPACE;
@@ -1968,9 +1973,35 @@
  mtx_unlock(&bqlock);
 
  bd_speedup(); /* heeeelp */
+ if (gbflags & GB_NOWAIT_BD)
+ return (NULL);
 
  mtx_lock(&nblock);
  while (needsbuffer & flags) {
+ if (vp != NULL && (td->td_pflags & TDP_BUFNEED) == 0) {
+ mtx_unlock(&nblock);
+ /*
+ * getblk() is called with a vnode
+ * locked, and some majority of the
+ * dirty buffers may as well belong to
+ * the vnode. Flushing the buffers
+ * there would make a progress that
+ * cannot be achieved by the
+ * buf_daemon, that cannot lock the
+ * vnode.
+ */
+ norunbuf = ~(TDP_BUFNEED | TDP_NORUNNINGBUF) |
+    (td->td_pflags & TDP_NORUNNINGBUF);
+ /* play bufdaemon */
+ td->td_pflags |= TDP_BUFNEED | TDP_NORUNNINGBUF;
+ fl = buf_do_flush(vp);
+ td->td_pflags &= norunbuf;
+ mtx_lock(&nblock);
+ if (fl != 0)
+ continue;
+ if ((needsbuffer & flags) == 0)
+ break;
+ }
  if (msleep(&needsbuffer, &nblock,
     (PRIBIO + 4) | slpflag, waitmsg, slptimeo)) {
  mtx_unlock(&nblock);
@@ -2039,6 +2070,35 @@
 };
 SYSINIT(bufdaemon, SI_SUB_KTHREAD_BUF, SI_ORDER_FIRST, kproc_start, &buf_kp);
 
+static int
+buf_do_flush(struct vnode *vp)
+{
+ int flushed;
+
+ flushed = flushbufqueues(vp, QUEUE_DIRTY, 0);
+ /* The list empty check here is slightly racy */
+ if (!TAILQ_EMPTY(&bufqueues[QUEUE_DIRTY_GIANT])) {
+ mtx_lock(&Giant);
+ flushed += flushbufqueues(vp, QUEUE_DIRTY_GIANT, 0);
+ mtx_unlock(&Giant);
+ }
+ if (flushed == 0) {
+ /*
+ * Could not find any buffers without rollback
+ * dependencies, so just write the first one
+ * in the hopes of eventually making progress.
+ */
+ flushbufqueues(vp, QUEUE_DIRTY, 1);
+ if (!TAILQ_EMPTY(
+    &bufqueues[QUEUE_DIRTY_GIANT])) {
+ mtx_lock(&Giant);
+ flushbufqueues(vp, QUEUE_DIRTY_GIANT, 1);
+ mtx_unlock(&Giant);
+ }
+ }
+ return (flushed);
+}
+
 static void
 buf_daemon()
 {
@@ -2052,7 +2112,7 @@
  /*
  * This process is allowed to take the buffer cache to the limit
  */
- curthread->td_pflags |= TDP_NORUNNINGBUF;
+ curthread->td_pflags |= TDP_NORUNNINGBUF | TDP_BUFNEED;
  mtx_lock(&bdlock);
  for (;;) {
  bd_request = 0;
@@ -2067,30 +2127,8 @@
  * normally would so they can run in parallel with our drain.
  */
  while (numdirtybuffers > lodirtybuffers) {
- int flushed;
-
- flushed = flushbufqueues(QUEUE_DIRTY, 0);
- /* The list empty check here is slightly racy */
- if (!TAILQ_EMPTY(&bufqueues[QUEUE_DIRTY_GIANT])) {
- mtx_lock(&Giant);
- flushed += flushbufqueues(QUEUE_DIRTY_GIANT, 0);
- mtx_unlock(&Giant);
- }
- if (flushed == 0) {
- /*
- * Could not find any buffers without rollback
- * dependencies, so just write the first one
- * in the hopes of eventually making progress.
- */
- flushbufqueues(QUEUE_DIRTY, 1);
- if (!TAILQ_EMPTY(
-    &bufqueues[QUEUE_DIRTY_GIANT])) {
- mtx_lock(&Giant);
- flushbufqueues(QUEUE_DIRTY_GIANT, 1);
- mtx_unlock(&Giant);
- }
+ if (buf_do_flush(NULL) == 0)
  break;
- }
  uio_yield();
  }
 
@@ -2136,7 +2174,7 @@
     0, "Number of buffers flushed with dependecies that require rollbacks");
 
 static int
-flushbufqueues(int queue, int flushdeps)
+flushbufqueues(struct vnode *lvp, int queue, int flushdeps)
 {
  struct thread *td = curthread;
  struct buf sentinel;
@@ -2147,20 +2185,37 @@
  int flushed;
  int target;
 
- target = numdirtybuffers - lodirtybuffers;
- if (flushdeps && target > 2)
- target /= 2;
+ if (lvp == NULL) {
+ target = numdirtybuffers - lodirtybuffers;
+ if (flushdeps && target > 2)
+ target /= 2;
+ } else
+ target = 1;
  flushed = 0;
  bp = NULL;
+ sentinel.b_qindex = QUEUE_SENTINEL;
  mtx_lock(&bqlock);
- TAILQ_INSERT_TAIL(&bufqueues[queue], &sentinel, b_freelist);
+ TAILQ_INSERT_HEAD(&bufqueues[queue], &sentinel, b_freelist);
  while (flushed != target) {
- bp = TAILQ_FIRST(&bufqueues[queue]);
- if (bp == &sentinel)
+ bp = TAILQ_NEXT(&sentinel, b_freelist);
+ if (bp != NULL) {
+ TAILQ_REMOVE(&bufqueues[queue], &sentinel, b_freelist);
+ TAILQ_INSERT_AFTER(&bufqueues[queue], bp, &sentinel,
+    b_freelist);
+ } else
  break;
- TAILQ_REMOVE(&bufqueues[queue], bp, b_freelist);
- TAILQ_INSERT_TAIL(&bufqueues[queue], bp, b_freelist);
-
+ /*
+ * Skip sentinels inserted by other invocations of the
+ * flushbufqueues(), taking care to not reorder them.
+ */
+ if (bp->b_qindex == QUEUE_SENTINEL)
+ continue;
+ /*
+ * Only flush the buffers that belong to the
+ * vnode locked by the curthread.
+ */
+ if (lvp != NULL && bp->b_vp != lvp)
+ continue;
  if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT, NULL) != 0)
  continue;
  if (bp->b_pin_count > 0) {
@@ -2208,16 +2263,28 @@
  BUF_UNLOCK(bp);
  continue;
  }
- if (vn_lock(vp, LK_EXCLUSIVE | LK_NOWAIT, td) == 0) {
+ if (vn_lock(vp, LK_EXCLUSIVE | LK_NOWAIT | LK_CANRECURSE, td)
+    == 0) {
  mtx_unlock(&bqlock);
  CTR3(KTR_BUF, "flushbufqueue(%p) vp %p flags %X",
     bp, bp->b_vp, bp->b_flags);
- vfs_bio_awrite(bp);
+ if (curproc == bufdaemonproc)
+ vfs_bio_awrite(bp);
+ else {
+ bremfree(bp);
+ bwrite(bp);
+ }
  vn_finished_write(mp);
  VOP_UNLOCK(vp, 0, td);
  flushwithdeps += hasdeps;
  flushed++;
- waitrunningbufspace();
+
+ /*
+ * Sleeping on runningbufspace while holding
+ * vnode lock leads to deadlock.
+ */
+ if (curproc == bufdaemonproc)
+ waitrunningbufspace();
  numdirtywakeup((lodirtybuffers + hidirtybuffers) / 2);
  mtx_lock(&bqlock);
  continue;
@@ -2599,7 +2666,7 @@
  maxsize = vmio ? size + (offset & PAGE_MASK) : size;
  maxsize = imax(maxsize, bsize);
 
- bp = getnewbuf(slpflag, slptimeo, size, maxsize);
+ bp = getnewbuf(vp, slpflag, slptimeo, size, maxsize, flags);
  if (bp == NULL) {
  if (slpflag || slptimeo)
  return NULL;
@@ -2674,14 +2741,17 @@
  * set to B_INVAL.
  */
 struct buf *
-geteblk(int size)
+geteblk(int size, int flags)
 {
  struct buf *bp;
  int maxsize;
 
  maxsize = (size + BKVAMASK) & ~BKVAMASK;
- while ((bp = getnewbuf(0, 0, size, maxsize)) == 0)
- continue;
+ while ((bp = getnewbuf(NULL, 0, 0, size, maxsize, flags)) == NULL) {
+ if ((flags & GB_NOWAIT_BD) &&
+    (curthread->td_pflags & TDP_BUFNEED) != 0)
+ return (NULL);
+ }
  allocbuf(bp, size);
  bp->b_flags |= B_INVAL; /* b_dep cleared by getnewbuf() */
  KASSERT(BUF_REFCNT(bp) == 1, ("geteblk: bp %p not locked",bp));
Index: sys/proc.h
===================================================================
--- sys/proc.h (revision 188080)
+++ sys/proc.h (working copy)
@@ -378,6 +378,7 @@
 #define TDP_NORUNNINGBUF 0x00040000 /* Ignore runningbufspace check */
 #define TDP_WAKEUP 0x00080000 /* Don't sleep in umtx cond_wait */
 #define TDP_INBDFLUSH 0x00100000 /* Already in BO_BDFLUSH, do not recurse */
+#define TDP_BUFNEED 0x00200000 /* Do not recurse into the buf flush */
 
 /*
  * Reasons that the current thread can not be run yet.
Index: sys/buf.h
===================================================================
--- sys/buf.h (revision 188080)
+++ sys/buf.h (working copy)
@@ -475,6 +475,7 @@
  */
 #define GB_LOCK_NOWAIT 0x0001 /* Fail if we block on a buf lock. */
 #define GB_NOCREAT 0x0002 /* Don't create a buf if not found. */
+#define GB_NOWAIT_BD 0x0004 /* Do not wait for bufdaemon */
 
 #ifdef _KERNEL
 extern int nbuf; /* The number of buffer headers */
@@ -519,7 +520,7 @@
 struct buf *incore(struct bufobj *, daddr_t);
 struct buf *gbincore(struct bufobj *, daddr_t);
 struct buf *getblk(struct vnode *, daddr_t, int, int, int, int);
-struct buf *geteblk(int);
+struct buf *geteblk(int, int);
 int bufwait(struct buf *);
 int bufwrite(struct buf *);
 void bufdone(struct buf *);

attachment0 (202 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 7.1-RELEASE I/O hang

Matt Burke-3
Kostik Belousov wrote:
> Compile ddb into the kernel, and do "ps" from the ddb prompt. If there
> are processes hung in the "nbufkv" state, then the patch below might
> help.

The bonnie++ processes are in state "newbuf" and other hung processes
(bash, newly forked sshds, etc) appear to be in the "ufs" state.

The patch appears to have no effect, although at the last hang I did see
one of the bonnie++ processes in "nbufkv" state. This could be coincidental.


The problem also exhibits itself when running a parallel bonnie++ on a
single array, both with the onboard PERC6/i and the PERC6/e. I have no
access to other controllers.
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 7.1-RELEASE I/O hang

Konstantin Belousov
On Thu, Feb 05, 2009 at 11:26:58AM +0000, Matt Burke wrote:
> Kostik Belousov wrote:
> > Compile ddb into the kernel, and do "ps" from the ddb prompt. If there
> > are processes hung in the "nbufkv" state, then the patch below might
> > help.
>
> The bonnie++ processes are in state "newbuf" and other hung processes
> (bash, newly forked sshds, etc) appear to be in the "ufs" state.
What is the state of the bufdaemon process ?

>
> The patch appears to have no effect, although at the last hang I did see
> one of the bonnie++ processes in "nbufkv" state. This could be coincidental.
>
>
> The problem also exhibits itself when running a parallel bonnie++ on a
> single array, both with the onboard PERC6/i and the PERC6/e. I have no
> access to other controllers.

attachment0 (202 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 7.1-RELEASE I/O hang

Matt Burke-3
Kostik Belousov wrote:
>>> Compile ddb into the kernel, and do "ps" from the ddb prompt. If there
>>> are processes hung in the "nbufkv" state, then the patch below might
>>> help.
>> The bonnie++ processes are in state "newbuf" and other hung processes
>> (bash, newly forked sshds, etc) appear to be in the "ufs" state.
> What is the state of the bufdaemon process ?

qsleep


--
_______________________________________________
[hidden email] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: 7.1-RELEASE I/O hang

Konstantin Belousov
On Thu, Feb 05, 2009 at 12:46:23PM +0000, Matt Burke wrote:
> Kostik Belousov wrote:
> >>> Compile ddb into the kernel, and do "ps" from the ddb prompt. If there
> >>> are processes hung in the "nbufkv" state, then the patch below might
> >>> help.
> >> The bonnie++ processes are in state "newbuf" and other hung processes
> >> (bash, newly forked sshds, etc) appear to be in the "ufs" state.
> > What is the state of the bufdaemon process ?
>
> qsleep

Please, increase the value that is assigned to the target variable in the
line 2193 of the patched sys/kern/vfs_bio.c from 1 to, say, 10 or 100.

attachment0 (202 bytes) Download Attachment