If you've been following p4 submits to //depot/user/jhb/lock/... recently
you've noticed that I've been hacking on an implementation of reader/writer
locks. I've just finished doing an initial set of tests on a 4-cpu box and
am confident that at least the really big races should all be handled.
First, rwlocks are basically read/write mutexes. They cannot be held while
sleeping similar to mutexes (and thus different from sx and lockmgr), but
this means that they can be used in ithreads, etc. Also, they can do some
form of priority propagation. To achieve this latter part, I've patched the
turnstile code to grow the notion of having multiple queues of waiters on a
given turnstile: a queue of shared (read) waiters, and a queue of exclusive
(write) waiters. The modified turnstile code (with suitable updates to the
mutex code) ran without incident for several days on alpha, amd64, i386, and
sparc64 machines running buildworld -j XX in an infinite loop.
Now, as far as limitations, etc. of this reader/writer lock implementation.
This implementation is not meant to be the end-all be-all holy grail of
reader/writer lock implementations. The goals of this version are to have a
_stable_, _working_ implementation that can be used in the tree as well as to
provide a base implementation that people can hack on to try out other
algorithms and ideas. This means that folks like the networking stack guys
can go ahead and have working rwlocks now (though perhaps not 100% optimal or
perfect, but allowing more parallelism than mutexes) and that independently
other folks can play with other ideas for rwlocks. In other words, I don't
want to bikeshed forever about missing features or theoretical changes to the
wakeup algorithm, etc. I'd rather commit this now as a starting point. :)
That said, here are the known limitations, etc.:
- Currently no attempt is made to do propogate priority to threads that hold
read locks. Not even a Solaris-style "owner of record" type deal. For one,
it would require some more hacking on the turnstile code. Secondly, it would
add some sort of expense to read-lock operations and I'm unsure if the extra
atomic ops (and thus penalty on the more common read lock operations) would
be worth it.
- Currently we allow read locks to recurse. For one thing, w/o some way of
tracking which threads hold read locks (which would be expensive) you can't
verify that code doesn't break this rule unless you use WITNESS. Most of our
other simple lock assertions can be verified w/o needing WITNESS, which is
why I allowed this.
- Because of the previous, readers don't block if the lock is read-locked but
there are writers waiting. Otherwise you end up in a trivial deadlock as
expounded upon further in the comments.
- Because of the previous, the read unlock algorithm is quite simple: it wakes
up all write waiters because that's all the waiters there are to wake up.
- The algorithm for write unlock is simplistic and matches sx for now in that
it prefers readers to writers.
- There is no explicit lock handoffs, but our mutexes don't do that either.
- Read lock operations are not currently inlined. I think this might could be
done now though. At first I was worried that read lock operations would be
too complex to inline and so I've only inlined write lock operations. Now
that the implementation is in a working state though, it seems that there are
some simple easy cases that could possibly be inlined.
- Probably more I haven't thought of, etc.
The changes are available as a patch and the two new files (sys/rwlock.h and
One suggestion I have had and haven't acted on yet is to change the turnstile
code to track an internal state (share locked, exclusive locked, unlocked) to
make some of the assertions possibly saner. Also, there is some debugging
stuff in kern_rwlock.c to map KTR_LOCK to KTR_SUBSYS so I could get ktr
traces of just rwlocks w/o the traces muddied by mutex and sx lock traces.
I would appreciate people looking at the turnstile changes and the
implementation of the rwlocks to see if there are races I've missed, etc.