Recently, I released version 0.3 of my HappyLock crate on crates.io. In this blog post, I wanted to explain what I changed, and why it works.
There are four conditions necessary for a deadlock to occur. You only need to prevent one of them in order to prevent all deadlocks:
Let's go through each one, and see what we can do.
This doesn't make all that much sense to prevent. We could provide some sort of
Readonly<T>
type, but Rust already has &T
and
Arc<T>
. So this wouldn't be very useful.
Preventing this would mean that the language can decide to take away your access to a piece of data at any time. Not only would this be incredibly difficult to pull off, it would also just annoying for the programmer to deal with. Something like this would need to happen:
let mutex = Mutex::new(5); let mut_ref = &mutex; let th = thread::spawn(|| { let number = mut_ref.lock(); *number = 6; }); let number = mut_ref.lock(); th.join(); // preempts the lock on mut_ref println!("{}", number); // oops, we can't use number anymore
We could have some sort of LockCell
type, which releases the lock immediately after
performing some action, but there are often times where you need to make sure nothing else can
mutate the value for a section of code. So this won't always work.
From what I can tell, most attempts to prevent deadlock go through this route. The idea is what
every lock must be acquired in the same order. So if you lock l1
and then
l2
on one thread, then you can't do it in the opposite order on the other thread.
This is an interesting approach, but there's no way to enforce it in the language. Some people
have resorted to using macros for this, such as in the recently published
deadlocker crate (which is a very confusing name).
But an attempt to make this happen at compile-time won't be dynamic enough for the general case.
Keep an eye on this one, because we will make use of it later.
This is the hero we've been waiting for. To prevent this, we need to enforce total allocation. So if you want to lock a new mutex, you must first release the locks you've already acquired, and then acquire a new set of locks. Preventing this is possible in Rust, and less possible in languages that don't have a borrow checker. Let me explain how we'll use the borrow checker.
Say we were making mutexes for an operating system. How would we enforce total allocation? We could simply give an error if someone acquires a new set of locks without releasing the already acquired locks. It'd also mean we'd have to check for an error every time we acquire a new set of locks.
mutex_t m1 = new_mutex(); mutex_t m2 = new_mutex(); mutex_t mutices[] = { m1, m2 }; if (lock_mutexes(mutices, 2)) { // oh noes! } // now we can use the data
But we're not C. We have technology! And our technology is the borrow checker.
For those of you who don't know, here's a quick rundown of the borrow checker rules:
As a quick example of these rules:
let s = String::new("Hello, world!"); let r1 = &s; let r2 = &s; // this is allowed because these references aren't mutable let mr = &mut s; // illegal: rule #3 drop(s); // also illegal: rule #1 println!("{r1} {r2}");
So here's the plan: We'll make a new type called ThreadKey
. This type cannot be
cloned, copied, sent to another thread, or mutably referenced from another thread. The Rust type
system is able to enforce all of those things.
// no traits are derived, especially not Clone or Copy pub struct ThreadKey { phantom: PhantomData<*const ()>, // !Send and !Sync }
Then, we can require that when locking a mutex (or a read-write lock), a ThreadKey
,
or a &mut ThreadKey
, must be given.
pub fn lock<'s, 'k: 's, Key: Keyable>(&'s self, key: Key) -> MutexGuard<'_, 'k, T, Key, R>
Wow! That's a lot of lifetimes and generics. We'll worry about those later. For now, let's show how this works.
use happylock::{ThreadKey, Mutex}; fn main() { // each thread can only have one thread key // the call to unwrap panics if this thread already got its key // ThreadKey is not Send, Sync, Copy, or Clone let key = ThreadKey::get().unwrap(); let mutex = Mutex::new(10); // locking a mutex requires either the ThreadKey or a &mut ThreadKey let mut guard = mutex.lock(key); // we no longer have the thread key, so we can't lock anything else anymore println!("{}", *guard); }
I recently learned that I wasn't the only person who thought of this. Over a year ago, Adrian
Taylor posted on Medium,
Can the Rust type system prevent deadlocks?.
I was completely unaware of this when I published the crate, despite my search for other
implementations of the idea. He also worked on getting inner mutexes to work, which is currently
impossible in HappyLock. He didn't attempt to do something more obvious though, which is locking
more than one mutex at a time. He also never considered the possibility of using a
& mut MutexPermissionToken
to make the code more ergonomic.
Speaking of which, how do we lock more than one mutex at a time?
It's actually quite simply, we just lock a collection of things. We can have a type, called
LockCollection
, which takes a collection of locks, and locks them all at the same time.
Then, it'll return a LockGuard
for the collection.
There are different collections we could try to lock, including Vec<T>
,
(A, B)
, and (Vec<A>, Vec<B>)
, so this'll have to be generic. To
enable this, we'll create a trait called Lockable
.
This isn't the current implementation in HappyLock, but here's a proof-of-concept:
unsafe trait Lockable { type Guard<'a>; unsafe fn lock<'a>(&self) -> Self::Guard<'a>; unsafe fn try_lock<'a>(&self) -> Option<Self::Guard<'a>>; }
Then, we just need to implement on every collection that could hold a lock.
use happylock::{ThreadKey, Mutex, LockCollection}; fn main() { let key = ThreadKey::get().unwrap(); let mutex1 = Mutex::new(5); let mutex2 = Mutex::new(String::new()); let collection = LockCollection::new((mutex1, mutex2)); let guard = collection.lock(key); *guard.1 = format!("{}{}", *guard.1, guard.0); *guard.0 += 1; }
That seems to work pretty well.
Let's try something:
use happylock::{ThreadKey, Mutex, LockCollection}; fn main() { let key = ThreadKey::get().unwrap(); let mutex1 = Mutex::new(5); // oh no. this will deadlock us let collection = LockCollection::new((&mutex1, &mutex1)); let guard = collection.lock(key); }
The problem is that we passed the same lock in twice. That'll be a problem if we require it to be
locked twice. The good news is, this code doesn't compile, because LockCollection::new
actually requires a different trait that I haven't mentioned.
// not implemented for &impl Lockable // ergo: the values within are guaranteed to be unique unsafe trait OwnedLockable: Lockable {}
Let me elaborate on this. If the value is owned, then there's no way to pass it twice into the same collection. The same is true for mutable references. So we can safely lock an owned lock without checking for duplicates.
But sometimes, it's useful to lock a referenced mutex. For these, we'll need a runtime check for
duplicates. So we'll need to add another method to our Lockable
trait.
unsafe trait Lockable { // ... snip ... fn get_ptrs(&self) -> Vec<usize>; }
Then we can check for duplicate pointers within a collection. My first implementation of that check was a brute-force O(n²) check. When I posted about this on Reddit, people were quick to point out that this could be made O(nlogn) by sorting the pointers, then checking if the same pointer appeared twice in a row.
fn contains_duplicates<L: Lockable>(data: L) -> bool { let mut pointers = data.get_ptrs(); pointers.sort_unstable(); pointers.windows(2).any(|w| w[0] == w[1]) }
Of course, you could make this an O(n) check by using a HashSet
too.
Now we need a new constructor for LockCollection
.
impl<L: Lockable> LockCollection<L> { pub fn try_new(data: L) -> Option{ let ptrs = data.get_ptrs(); if contains_duplicates(&ptrs) { return None; } Some(Self { data }) } }
Although we have now successfully prevented deadlocks, livelocking can still be an issue.
The first implementation of LockCollection
would try to lock everything in order. If
one of the locks blocked, then it would release everything it had and try again. It would keep
retrying until it successfully locked everything in a row. This resulted in a lot of wasted work. It
also made the LockCollection
start to resemble a spinlock. This also made the
collection prone to livelocking.
Imagine this scenario with two threads trying to lock a set of mutexes, but in a different order:
In practice, this loop would probably end eventually, but it'd be nice if we could prevent it altogether. You could avoid this by locking everything in the same order. But if we could do that, then we would've prevented cyclic wait from before, which would have avoided this whole problem.
Remember when I said we would make use of cyclic wait later? This is where we'll do this. If you
recall from earlier, we already sorted the mutex pointers by their memory address. What if we locked
them in that order? This was suggested by many people on Reddit, but it took a while to work out
exactly how to do this. I can't sort a tuple. So we'll need to lock them in a different order than
the one they're returned in. This will require a large change to the existing Lockable
trait. Let's see what we can do.
unsafe trait Lock { unsafe fn lock(&self); unsafe fn try_lock(&self) -> bool; unsafe fn unlock(&self); } unsafe trait Lockable { type Guard<'g>; fn get_locks<'a>(&'a self, &mut Vec<&'a dyn Lock>); unsafe fn guard<'g>(&'g self) -> Self::Guard<'g>; }
Lock
is implemented on Mutex
and RwLock
. Lockable
is implemented on the same types as before. These traits are even more unsafe than they were before,
now that they can obtain a guard without locking it first. So, we'll just need to be very careful.
Now let's try making our new LockCollection
struct LockCollection<L> { data: L, locks: Vec<&'self.data dyn Lock> // wait, this is a self-referential struct }
Ooh, boy. This is gonna be a problem. We could box the data, but sometimes we don't want to pay the price of memory allocation. We could try using a reference, but then we have an extra lifetime to deal with. It's also sometimes useful to have an owned lock collection. How are we going to do this without creating four lock collection types that do mostly the same thing?
I'll cover each of the four types, but if you're looking for a sensible default, there's a type alias
so that LockCollection
points to BoxedLockCollection
. It's slower and
doesn't allow you to mutate the underlying collection (yet), but it works for most use cases.
I'm pretty sure when this gets posted to Reddit, someone will tell me I could've avoided this somehow.
RefLockCollection
pub struct RefLockCollection<'a, L> { data: &'a L, locks: Vec<&'a dyn RawLock>, }
It needs to sort the locks by address. Yes, that applies even if the collection is owned. Otherwise the locks could be acquired in a different order. We're no longer releasing the locks when we can't acquire them all at once, so they need to be acquired in exactly that order. But it avoids one memory allocation.
BoxedLockCollection
pub struct BoxedLockCollection<L> { data: *const UnsafeCell<L>, locks: Vec<&'static dyn RawLock>, }
This is similar to RefLockCollection
, but it allocates more memory. You may notice that
there's no Box
in this structure, despite the name. That's because Box
needs to be a unique pointer. But because of the references in locks
, it's not unique.
It's const
because of the existing references to the collection, meaning mutating the
underlying collection would cause undefined behavior, especially if it means the locks move (such
as when a Vec
reallocates). It's a raw pointer, because deallocating the data requires
a mutable pointer, and creating a mutable pointer from an immutable reference is undefined behavior.
The UnsafeCell
is used for a similar reason.
This is actually the reason why this release is titled 0.3 and not 0.2. I released 0.2 and found these soundness bugs, so I had to make a fix. This was a breaking change, because you can't mutate the underlying collection anymore.
OwnedLockCollection
pub struct OwnedLockCollection<L> { data: L, }
Since both RefLockCollection
and BoxedLockCollection
require sorting the
locks, it'd be nice if we could sometimes avoid that for owned lock collections. If a set of locks is
in an OwnedLockCollection
, then it really cannot be used anywhere else. That means as
long as references to a OwnedLockCollection
always lock in the same order, then it'll
never deadlock. That's the only order these particular locks can be locked in. So, we don't bother
sorting and just lock them in the order they appear in for the collection.
RetryingLockCollection
This is the collection that was used in Happylock 0.1. This does the releasing of all of the locks
thing. The upside of this collection is that it can contain references, and still not sort the
pointers. Duplicates are checked for by using a HashSet
. Because of the lack of sorting,
it can be fast in the following case, from the docs:
[...] when the first lock in the collection is always the first in any collection, and the other locks in the collection are always locked after that first lock is acquired. This means that as soon as it is locked, there will be no need to unlock it later on subsequent lock attempts, because they will always succeed.
It doesn't take a genius to realize that my lock collection API is not friendly to read-write locks.
In HappyLock 0.1, I had ReadLock
and WriteLock
wrappers for
RwLock
to allow reads in collections.
struct ReadLock<'a, T(&'a RwLock<T>); struct WriteLock<'a, T(&'a RwLock<T>);
This worked fine at the time, but I quickly realized that this would not work for an
OwnedLockCollection
. To solve this problem, I made more changes to the
Lockable
API.
unsafe trait Lock { // ... unsafe fn read(&self); unsafe fn try_read(&self) -> bool; unsafe fn unlock_read(&self); } unsafe trait Lockable { // ... type ReadGuard<'g>; unsafe fn read_guard<'g>(&'g self) -> Self::ReadGuard<'g> } // A NEW TRAIT unsafe trait Sharable: Lockable {}
The Sharable
trait is implemented on collections that only contain readable locks.
All of the lock collection types then have functions for when their locks are all sharable. That
allows us to grant read-access in OwnedLockCollection
s.
After I was done with that, I updated the documentation and published the new version.
ThreadKey
is a zero-cost abstraction. It takes zero space at runtime. The only code is
checking to see if the key has already been acquired for a thread. That is a static,
lazily-initialized, thread local, boolean check and update, so it's fairly fast.
I used lock_api
as backends for the Mutex
and RwLock
types.
By default, this library acts as a thin wrapper around parking_lot
, so it will be
roughly the same speed as parking_lot
. There's an optional feature that can be enabled
to use spin
as well, and you can import your own raw locks if you wish. The standard
library locks cannot be used, because it's impossible to implement the lock_api
traits
on them. If someone adds a Mutex::force_unlock
function to the standard library, then I
can add that support, and I'll even make it the default. For now, we have to deal with the larger
binary size that's given by parking_lot
.
The only place where performance could be an issue is in the lock collection types. The performance
qualities of each type have already been discussed. But to summarize, only the
OwnedLockCollection
is zero-cost. Both RefLockCollection
and
BoxedLockCollection
allocate memory and sort the lock pointers, and sometimes they'll
even need to check for duplicates within the pointers. All of this work is done when the collection
is created though, so at least you don't need to worry about it after that. On the other hand, the
RetryingLockCollection
, when locking, will waste time unlocking and relocking unless the
following condition is met:
[...] when the first lock in the collection is always the first in any collection, and the other locks in the collection are always locked after that first lock is acquired.
RetryingLockCollection
also checks for duplicates using a HashSet
.
In summary, it's pretty fast, but choose the correct lock collection constructor carefully.
This is summarized on the GitHub repo, but I'll try to go into more depth on my plans for the future of this library.
Personally, I'm not a big fan of mutex poisoning, but I can see why someone would want to use it. I
plan on adding it to Happylock soon, by having a Poisonable
wrapper around
Lock
types. And if I implement Lock
on the lock collection types, then
it'll even be able to wrap those, which might be useful.
I originally thought of this as being a must-have for the next version of HappyLock, but after trying
it for a bit, I decided to hold off on it. As I mentioned before, there is a binary size cost to
using parking_lot
. We could save memory by using the locks that are built into the
operating system, which is what the standard library does. It would be very convenient if I could
just use the standard library as a backend for HappyLock.
Unfortunately, there's a reason why the standard library's locks do not implement
lock_api
's traits. There's just no way to unlock one of those mutexes without holding
onto the guard. If we were to put the guard inside of the mutex, and then drop it when unlocking,
that would make it impossible to share the lock across threads, since the guard isn't
Send
.
There are raw mutexes used in the underlying implementation of the standard library, but those aren't public. I've tried simply copying the part of the standard library that implements those traits, but it relies on too much of the rest of the standard library. I don't want importing HappyLock to require recompiling the entire standard library.
The next feasible solution, in my opinion, would be to make a new crate that uses the operating
system mutexes, and implements lock_api
traits on them. I've already started doing this,
and it seems very possible. But don't expect it to properly support every platform that the standard
library supports.
The only way to make duplicate checks free is to run them at compile-time. I doubt I'd be able to
implement this using procedural macros (feel free to prove me wrong). But, as
pacak keeps reminding me, it might be possible using const
functions with a Bloom filter. I don't know
if const Rust is robust enough for this to be implemented yet, but I do want to give it a try. It
wouldn't work for lock collections which require sorting, but it might work for
RetryingLockCollection
.
One other thing I'd like to explore is if there's anything more that can be done with cyclic wait.
I could imagine LockCollection<Vec<L>> where L: OwnedLockable
having some
lock_next
method, or maybe an iterator type that would allow you to only lock the first
few items as they are needed, rather than locking the entire collection at once. I'm not sure how
useful this would be though. I'm sure there are other ideas that I'm not thinking of too.
LockCell
type
If you've seen the Cell
type in Rust, then you know that it's a good zero-cost
abstraction that provides safe interior mutability. The downsides are that it cannot be shared
between threads and its limited API. But I think it would be neat to add some of its methods as
convenience methods for Mutex
and RwLock
, i.e:
Mutex::lock_swap
. This would not remove the need for a ThreadKey
, since
then you'd be able to call lock_swap
on a thread that's already locked the mutex.
However, if we had methods like Mutex::try_swap
, then there'd be no need. The mutex
wouldn't be able to block the current thread, and because the lock is released so quickly, it
wouldn't block any other threads either. No blocking, means no deadlock. I think the same would be
true for try_clone
and try_take
, but they're scarier since the
Clone
and Default
traits could try to lock the same mutex, resulting in a
deadlock. I don't know if it would be possible to access the ThreadKey
safely in these
methods though. Either way, all of these methods would be fallible, so it's not going to be a
perfect solution.
But we could introduce a LockCell
type, which would be the same as Cell
,
but it safely implements Sync
by using a mutex or rwlock internally. This would not
require ThreadKey
at all, as long as all of the implemented methods don't lock the
LockCell
a second time, which is probably impossible.
It is a little surprising that a type like this doesn't already exist in the standard library. That
might be because it would make race conditions very easy to trigger. Imagine checking to see if two
LockCell
values were equal for an if-block, but once you get inside the if-block,
they're no longer equal. I'll have to think more about this.
let a: LockCell<i32>; let b: LockCell<i32>; if a == b { // a and b are no longer equal }
I originally had a different idea of what to do with the Sharable
trait. We could avoid
checking for duplicates in a lock collection if we knew that the only thing we would do with them is
read. At the time, I thought that reading twice on the same thread will never cause a deadlock.
However, that's not true on every platform, because of contention. Luckily, the lock_api
developers thought ahead by designing a RawRwLockRecursive
trait, which I can extend by
creating a Recursive
subtrait of Lockable
.
That's cool, but I don't really want to add three more lock collection types for this feature. So
maybe this could be a wrapper around the existing lock collection types, i.e.
Readonly<
. I'd need to implement a
LockCollection
trait that would have an associated unsafe constructor for creating the
collection without checking for duplicates. Then, creating that retrying lock collection would be
free (but locking it can still sometimes be expensive).
This doesn't only apply to RwLock
. If there was a ReentrantMutex
type, then
we could implement Recursive
on that, and use that in a readonly lock collection as well.
For those of you who don't know, a ReentrantMutex
can be locked several times on the same
thread, but not while it's being locked by a different thread. However, the guard can only access the
data immutably, because multiple mutable references to data on the same thread is still undefined
behavior. It's mostly intended for wrapping Cell
or RefCell
. Unfortunately,
lock_api
doesn't have a trait for recursive mutexes. Instead it provides a struct
wrapper for RawMutex
, called RawReentrantMutex
. This is despite the fact
that Unix's pthread API already supports recursive mutexes. I might be able to sway the
lock_api
developers to support this.
Doing this without the standard library would be very interesting. But currently, the
Lockable
interface requires a Vec
of pointers, so it might not be feasible.
The thread keys are stored in a thread local storage too, but I'll probably come up with some
UnsafeKey
system to deal with that eventually, in order to support async runtimes.
It would be fun to add OnceLock
or LazyLock
, but I don't think it's
necessary. I don't think anybody has ever accidentally deadlocked using those interfaces, if at all.
In the past I've also considered Condvar
and Barrier
, but that's trickier.
They're kind of the opposite of a mutex, so tricks that work for avoiding deadlocks here won't work
on them. I've also never used either of those types before, so I'm probably not the best person to
figure that out.
This has been a very long nerd snipe. But I think this idea could become incredibly useful, which is why I keep pushing for it. It'd be even better if it was a part of the standard libary to begin with, but I'm aware that's not possible anymore.
The fact that this is possible makes me appreciate the borrow checker even more. People often think of it as a limitation (myself included), but it solves so many problems that I'm surprised more languages don't adopt it. Not only does it solve memory corruption, but also resource leaks, and now deadlocks. You spend more time specifying your code, but less time debugging it.
There's still work that needs to be done (I haven't even begun async runtime support yet), but I hope that as this library becomes more developed, there will be more projects which are immune this error.