How HappyLock Works

Recently, I released version 0.3 of my HappyLock crate on crates.io. In this blog post, I wanted to explain what I changed, and why it works.

Background

There are four conditions necessary for a deadlock to occur. You only need to prevent one of them in order to prevent all deadlocks:

  1. Mutual exclusion
  2. Non-preemptive allocation
  3. Circular wait
  4. Partial allocation

Let's go through each one, and see what we can do.

Mutual exclusion

This doesn't make all that much sense to prevent. We could provide some sort of Readonly<T> type, but Rust already has &T and Arc<T>. So this wouldn't be very useful.

Non-preemptive allocation

Preventing this would mean that the language can decide to take away your access to a piece of data at any time. Not only would this be incredibly difficult to pull off, it would also just annoying for the programmer to deal with. Something like this would need to happen:

let mutex = Mutex::new(5);
let mut_ref = &mutex;
let th = thread::spawn(|| {
	let number = mut_ref.lock();
	*number = 6;
});

let number = mut_ref.lock();
th.join(); // preempts the lock on mut_ref
println!("{}", number); // oops, we can't use number anymore

We could have some sort of LockCell type, which releases the lock immediately after performing some action, but there are often times where you need to make sure nothing else can mutate the value for a section of code. So this won't always work.

Circular wait

From what I can tell, most attempts to prevent deadlock go through this route. The idea is what every lock must be acquired in the same order. So if you lock l1 and then l2 on one thread, then you can't do it in the opposite order on the other thread. This is an interesting approach, but there's no way to enforce it in the language. Some people have resorted to using macros for this, such as in the recently published deadlocker crate (which is a very confusing name). But an attempt to make this happen at compile-time won't be dynamic enough for the general case.

Keep an eye on this one, because we will make use of it later.

Partial allocation

This is the hero we've been waiting for. To prevent this, we need to enforce total allocation. So if you want to lock a new mutex, you must first release the locks you've already acquired, and then acquire a new set of locks. Preventing this is possible in Rust, and less possible in languages that don't have a borrow checker. Let me explain how we'll use the borrow checker.

Exploration

Say we were making mutexes for an operating system. How would we enforce total allocation? We could simply give an error if someone acquires a new set of locks without releasing the already acquired locks. It'd also mean we'd have to check for an error every time we acquire a new set of locks.

mutex_t m1 = new_mutex();
mutex_t m2 = new_mutex();

mutex_t mutices[] = { m1, m2 };
if (lock_mutexes(mutices, 2)) {
	// oh noes!
}

// now we can use the data

But we're not C. We have technology! And our technology is the borrow checker.

Borrow Checker Rules

For those of you who don't know, here's a quick rundown of the borrow checker rules:

  1. A value may not be moved while references to it exist.
  2. Only one mutable reference to a value may exist at a time.
  3. A mutable reference and immutable references, must not exist at the same time.

As a quick example of these rules:

let s = String::new("Hello, world!");
let r1 = &s;
let r2 = &s; // this is allowed because these references aren't mutable
let mr = &mut s; // illegal: rule #3
drop(s); // also illegal: rule #1
println!("{r1} {r2}");

A Naive implementation

So here's the plan: We'll make a new type called ThreadKey. This type cannot be cloned, copied, sent to another thread, or mutably referenced from another thread. The Rust type system is able to enforce all of those things.

// no traits are derived, especially not Clone or Copy
pub struct ThreadKey {
	phantom: PhantomData<*const ()>, // !Send and !Sync
}

Then, we can require that when locking a mutex (or a read-write lock), a ThreadKey, or a &mut ThreadKey, must be given.

pub fn lock<'s, 'k: 's, Key: Keyable>(&'s self, key: Key) -> MutexGuard<'_, 'k, T, Key, R>

Wow! That's a lot of lifetimes and generics. We'll worry about those later. For now, let's show how this works.

use happylock::{ThreadKey, Mutex};

fn main() {
	// each thread can only have one thread key
	// the call to unwrap panics if this thread already got its key
	// ThreadKey is not Send, Sync, Copy, or Clone
	let key = ThreadKey::get().unwrap();

	let mutex = Mutex::new(10);

	// locking a mutex requires either the ThreadKey or a &mut ThreadKey
	let mut guard = mutex.lock(key);

	// we no longer have the thread key, so we can't lock anything else anymore

	println!("{}", *guard);
}

I recently learned that I wasn't the only person who thought of this. Over a year ago, Adrian Taylor posted on Medium, Can the Rust type system prevent deadlocks? . I was completely unaware of this when I published the crate, despite my search for other implementations of the idea. He also worked on getting inner mutexes to work, which is currently impossible in HappyLock. He didn't attempt to do something more obvious though, which is locking more than one mutex at a time. He also never considered the possibility of using a & mut MutexPermissionToken to make the code more ergonomic.

Speaking of which, how do we lock more than one mutex at a time?

Lock Collections

It's actually quite simply, we just lock a collection of things. We can have a type, called LockCollection, which takes a collection of locks, and locks them all at the same time. Then, it'll return a LockGuard for the collection.

There are different collections we could try to lock, including Vec<T>, (A, B), and (Vec<A>, Vec<B>), so this'll have to be generic. To enable this, we'll create a trait called Lockable.

This isn't the current implementation in HappyLock, but here's a proof-of-concept:

unsafe trait Lockable {
	type Guard<'a>;

	unsafe fn lock<'a>(&self) -> Self::Guard<'a>;

	unsafe fn try_lock<'a>(&self) -> Option<Self::Guard<'a>>;
}

Then, we just need to implement on every collection that could hold a lock.

use happylock::{ThreadKey, Mutex, LockCollection};

fn main() {
	let key = ThreadKey::get().unwrap();
	let mutex1 = Mutex::new(5);
	let mutex2 = Mutex::new(String::new());
	let collection = LockCollection::new((mutex1, mutex2));
	let guard = collection.lock(key);
	*guard.1 = format!("{}{}", *guard.1, guard.0);
	*guard.0 += 1;
}

That seems to work pretty well.

Duplicate Locks

Let's try something:

use happylock::{ThreadKey, Mutex, LockCollection};

fn main() {
	let key = ThreadKey::get().unwrap();
	let mutex1 = Mutex::new(5);
	// oh no. this will deadlock us
	let collection = LockCollection::new((&mutex1, &mutex1));
	let guard = collection.lock(key);
}

The problem is that we passed the same lock in twice. That'll be a problem if we require it to be locked twice. The good news is, this code doesn't compile, because LockCollection::new actually requires a different trait that I haven't mentioned.

// not implemented for &impl Lockable
// ergo: the values within are guaranteed to be unique
unsafe trait OwnedLockable: Lockable {}

Let me elaborate on this. If the value is owned, then there's no way to pass it twice into the same collection. The same is true for mutable references. So we can safely lock an owned lock without checking for duplicates.

But sometimes, it's useful to lock a referenced mutex. For these, we'll need a runtime check for duplicates. So we'll need to add another method to our Lockable trait.

unsafe trait Lockable {
	// ... snip ...

	fn get_ptrs(&self) -> Vec<usize>;
}

Then we can check for duplicate pointers within a collection. My first implementation of that check was a brute-force O(n²) check. When I posted about this on Reddit, people were quick to point out that this could be made O(nlogn) by sorting the pointers, then checking if the same pointer appeared twice in a row.

fn contains_duplicates<L: Lockable>(data: L) -> bool {
	let mut pointers = data.get_ptrs();
	pointers.sort_unstable();
	pointers.windows(2).any(|w| w[0] == w[1])
}

Of course, you could make this an O(n) check by using a HashSet too.

Now we need a new constructor for LockCollection.

impl<L: Lockable> LockCollection<L> {
	pub fn try_new(data: L) -> Option {
		let ptrs = data.get_ptrs();
		if contains_duplicates(&ptrs) {
			return None;
		}

		Some(Self { data })
	}
}

The Actual Implementation

Although we have now successfully prevented deadlocks, livelocking can still be an issue.

The first implementation of LockCollection would try to lock everything in order. If one of the locks blocked, then it would release everything it had and try again. It would keep retrying until it successfully locked everything in a row. This resulted in a lot of wasted work. It also made the LockCollection start to resemble a spinlock. This also made the collection prone to livelocking.

Imagine this scenario with two threads trying to lock a set of mutexes, but in a different order:

  1. Thread 1 locks mutex 1
  2. Thread 2 locks mutex 2
  3. Thread 1 tries to lock mutex 2 and fails
  4. Thread 2 tries to lock mutex 1 and fails
  5. Thread 1 releases mutex 1
  6. Thread 2 releases mutex 2
  7. Repeat

In practice, this loop would probably end eventually, but it'd be nice if we could prevent it altogether. You could avoid this by locking everything in the same order. But if we could do that, then we would've prevented cyclic wait from before, which would have avoided this whole problem.

Cyclic wait

Remember when I said we would make use of cyclic wait later? This is where we'll do this. If you recall from earlier, we already sorted the mutex pointers by their memory address. What if we locked them in that order? This was suggested by many people on Reddit, but it took a while to work out exactly how to do this. I can't sort a tuple. So we'll need to lock them in a different order than the one they're returned in. This will require a large change to the existing Lockable trait. Let's see what we can do.

unsafe trait Lock {
	unsafe fn lock(&self);
	unsafe fn try_lock(&self) -> bool;
	unsafe fn unlock(&self);
}

unsafe trait Lockable {
	type Guard<'g>;
	fn get_locks<'a>(&'a self, &mut Vec<&'a dyn Lock>);
	unsafe fn guard<'g>(&'g self) -> Self::Guard<'g>;
}

Lock is implemented on Mutex and RwLock. Lockable is implemented on the same types as before. These traits are even more unsafe than they were before, now that they can obtain a guard without locking it first. So, we'll just need to be very careful.

Now let's try making our new LockCollection

struct LockCollection<L> {
	data: L,
	locks: Vec<&'self.data dyn Lock>
	// wait, this is a self-referential struct
}

Ooh, boy. This is gonna be a problem. We could box the data, but sometimes we don't want to pay the price of memory allocation. We could try using a reference, but then we have an extra lifetime to deal with. It's also sometimes useful to have an owned lock collection. How are we going to do this without creating four lock collection types that do mostly the same thing?

We'll just create four Lock Collection Types that do mostly the same thing

I'll cover each of the four types, but if you're looking for a sensible default, there's a type alias so that LockCollection points to BoxedLockCollection. It's slower and doesn't allow you to mutate the underlying collection (yet), but it works for most use cases.

I'm pretty sure when this gets posted to Reddit, someone will tell me I could've avoided this somehow.

RefLockCollection

pub struct RefLockCollection<'a, L> {
	data: &'a L,
	locks: Vec<&'a dyn RawLock>,
}

It needs to sort the locks by address. Yes, that applies even if the collection is owned. Otherwise the locks could be acquired in a different order. We're no longer releasing the locks when we can't acquire them all at once, so they need to be acquired in exactly that order. But it avoids one memory allocation.

BoxedLockCollection

pub struct BoxedLockCollection<L> {
	data: *const UnsafeCell<L>,
	locks: Vec<&'static dyn RawLock>,
}

This is similar to RefLockCollection, but it allocates more memory. You may notice that there's no Box in this structure, despite the name. That's because Box needs to be a unique pointer. But because of the references in locks, it's not unique. It's const because of the existing references to the collection, meaning mutating the underlying collection would cause undefined behavior, especially if it means the locks move (such as when a Vec reallocates). It's a raw pointer, because deallocating the data requires a mutable pointer, and creating a mutable pointer from an immutable reference is undefined behavior. The UnsafeCell is used for a similar reason.

This is actually the reason why this release is titled 0.3 and not 0.2. I released 0.2 and found these soundness bugs, so I had to make a fix. This was a breaking change, because you can't mutate the underlying collection anymore.

OwnedLockCollection

pub struct OwnedLockCollection<L> {
	data: L,
}

Since both RefLockCollection and BoxedLockCollection require sorting the locks, it'd be nice if we could sometimes avoid that for owned lock collections. If a set of locks is in an OwnedLockCollection, then it really cannot be used anywhere else. That means as long as references to a OwnedLockCollection always lock in the same order, then it'll never deadlock. That's the only order these particular locks can be locked in. So, we don't bother sorting and just lock them in the order they appear in for the collection.

RetryingLockCollection

This is the collection that was used in Happylock 0.1. This does the releasing of all of the locks thing. The upside of this collection is that it can contain references, and still not sort the pointers. Duplicates are checked for by using a HashSet. Because of the lack of sorting, it can be fast in the following case, from the docs:

[...] when the first lock in the collection is always the first in any collection, and the other locks in the collection are always locked after that first lock is acquired. This means that as soon as it is locked, there will be no need to unlock it later on subsequent lock attempts, because they will always succeed.

RwLocks

It doesn't take a genius to realize that my lock collection API is not friendly to read-write locks. In HappyLock 0.1, I had ReadLock and WriteLock wrappers for RwLock to allow reads in collections.

struct ReadLock<'a, T(&'a RwLock<T>);
struct WriteLock<'a, T(&'a RwLock<T>);

This worked fine at the time, but I quickly realized that this would not work for an OwnedLockCollection. To solve this problem, I made more changes to the Lockable API.

unsafe trait Lock {
	// ...
	unsafe fn read(&self);
	unsafe fn try_read(&self) -> bool;
	unsafe fn unlock_read(&self);
}

unsafe trait Lockable {
	// ...
	type ReadGuard<'g>;
	unsafe fn read_guard<'g>(&'g self) -> Self::ReadGuard<'g>
}

// A NEW TRAIT
unsafe trait Sharable: Lockable {}

The Sharable trait is implemented on collections that only contain readable locks. All of the lock collection types then have functions for when their locks are all sharable. That allows us to grant read-access in OwnedLockCollections.

After I was done with that, I updated the documentation and published the new version.

Performance

ThreadKey is a zero-cost abstraction. It takes zero space at runtime. The only code is checking to see if the key has already been acquired for a thread. That is a static, lazily-initialized, thread local, boolean check and update, so it's fairly fast.

I used lock_api as backends for the Mutex and RwLock types. By default, this library acts as a thin wrapper around parking_lot, so it will be roughly the same speed as parking_lot. There's an optional feature that can be enabled to use spin as well, and you can import your own raw locks if you wish. The standard library locks cannot be used, because it's impossible to implement the lock_api traits on them. If someone adds a Mutex::force_unlock function to the standard library, then I can add that support, and I'll even make it the default. For now, we have to deal with the larger binary size that's given by parking_lot.

The only place where performance could be an issue is in the lock collection types. The performance qualities of each type have already been discussed. But to summarize, only the OwnedLockCollection is zero-cost. Both RefLockCollection and BoxedLockCollection allocate memory and sort the lock pointers, and sometimes they'll even need to check for duplicates within the pointers. All of this work is done when the collection is created though, so at least you don't need to worry about it after that. On the other hand, the RetryingLockCollection, when locking, will waste time unlocking and relocking unless the following condition is met:

[...] when the first lock in the collection is always the first in any collection, and the other locks in the collection are always locked after that first lock is acquired.

RetryingLockCollection also checks for duplicates using a HashSet.

In summary, it's pretty fast, but choose the correct lock collection constructor carefully.

Future Work

This is summarized on the GitHub repo, but I'll try to go into more depth on my plans for the future of this library.

Poisoning

Personally, I'm not a big fan of mutex poisoning, but I can see why someone would want to use it. I plan on adding it to Happylock soon, by having a Poisonable wrapper around Lock types. And if I implement Lock on the lock collection types, then it'll even be able to wrap those, which might be useful.

OS Locks

I originally thought of this as being a must-have for the next version of HappyLock, but after trying it for a bit, I decided to hold off on it. As I mentioned before, there is a binary size cost to using parking_lot. We could save memory by using the locks that are built into the operating system, which is what the standard library does. It would be very convenient if I could just use the standard library as a backend for HappyLock.

Unfortunately, there's a reason why the standard library's locks do not implement lock_api's traits. There's just no way to unlock one of those mutexes without holding onto the guard. If we were to put the guard inside of the mutex, and then drop it when unlocking, that would make it impossible to share the lock across threads, since the guard isn't Send.

There are raw mutexes used in the underlying implementation of the standard library, but those aren't public. I've tried simply copying the part of the standard library that implements those traits, but it relies on too much of the rest of the standard library. I don't want importing HappyLock to require recompiling the entire standard library.

The next feasible solution, in my opinion, would be to make a new crate that uses the operating system mutexes, and implements lock_api traits on them. I've already started doing this, and it seems very possible. But don't expect it to properly support every platform that the standard library supports.

Compile-Time Duplicate Checks

The only way to make duplicate checks free is to run them at compile-time. I doubt I'd be able to implement this using procedural macros (feel free to prove me wrong). But, as pacak keeps reminding me, it might be possible using const functions with a Bloom filter. I don't know if const Rust is robust enough for this to be implemented yet, but I do want to give it a try. It wouldn't work for lock collections which require sorting, but it might work for RetryingLockCollection.

Expanding Cyclic Wait

One other thing I'd like to explore is if there's anything more that can be done with cyclic wait. I could imagine LockCollection<Vec<L>> where L: OwnedLockable having some lock_next method, or maybe an iterator type that would allow you to only lock the first few items as they are needed, rather than locking the entire collection at once. I'm not sure how useful this would be though. I'm sure there are other ideas that I'm not thinking of too.

A LockCell type

If you've seen the Cell type in Rust, then you know that it's a good zero-cost abstraction that provides safe interior mutability. The downsides are that it cannot be shared between threads and its limited API. But I think it would be neat to add some of its methods as convenience methods for Mutex and RwLock, i.e: Mutex::lock_swap. This would not remove the need for a ThreadKey, since then you'd be able to call lock_swap on a thread that's already locked the mutex.

However, if we had methods like Mutex::try_swap, then there'd be no need. The mutex wouldn't be able to block the current thread, and because the lock is released so quickly, it wouldn't block any other threads either. No blocking, means no deadlock. I think the same would be true for try_clone and try_take, but they're scarier since the Clone and Default traits could try to lock the same mutex, resulting in a deadlock. I don't know if it would be possible to access the ThreadKey safely in these methods though. Either way, all of these methods would be fallible, so it's not ggoing to be a perfect solution.

But we could introduce a LockCell type, which would be the same as Cell, but it safely implements Sync by using a mutex or rwlock internally. This would not require ThreadKey at all, as long as all of the implemented methods don't lock the LockCell a second time, which is probably impossible.

It is a little surprising that a type like this doesn't already exist in the standard library. That might be because it would make race conditions very easy to trigger. Imagine checking to see if two LockCell values were equal for an if-block, but once you get inside the if-block, they're no longer equal. I'll have to think more about this.

let a: LockCell<i32>;
let b: LockCell<i32>;

if a == b {
	// a and b are no longer equal
}

Readonly Lock Collections

I originally had a different idea of what to do with the Sharable trait. We could avoid checking for duplicates in a lock collection if we knew that the only thing we would do with them is read. At the time, I thought that reading twice on the same thread will never cause a deadlock. However, that's not true on every platform, because of contention. Luckily, the lock_api developers thought ahead by designing a RawRwLockRecursive trait, which I can extend by creating a Recursive subtrait of Lockable.

That's cool, but I don't really want to add three more lock collection types for this feature. So maybe this could be a wrapper around the existing lock collection types, i.e. Readonly<RetryingLockCollection<L>>. I'd need to implement a LockCollection trait that would have an associated unsafe constructor for creating the collection without checking for duplicates. Then, creating that retrying lock collection would be free (but locking it can still sometimes be expensive).

This doesn't only apply to RwLock. If there was a ReentrantMutex type, then we could implement Recursive on that, and use that in a readonly lock collection as well. For those of you who don't know, a ReentrantMutex can be locked several times on the same thread, but not while it's being locked by a different thread. However, the guard can only access the data immutably, because multiple mutable references to data on the same thread is still undefined behavior. It's mostly intended for wrapping Cell or RefCell. Unfortunately, lock_api doesn't have a trait for recursive mutexes. Instead it provides a struct wrapper for RawMutex, called RawReentrantMutex. This is despite the fact that Unix's pthread API already supports recursive mutexes. I might be able to sway the lock_api developers to support this, so I'll open an issue for it before this post is published.

No Standard Library

Doing this without the standard library would be very interesting. But currently, the Lockable interface requires a Vec of pointers, so it might not be feasible. The thread keys are stored in a thread local storage too, but I'll probably come up with some UnsafeKey system to deal with that eventually, in order to support async runtimes.

Other Types from the Standard Library

It would be fun to add OnceLock or LazyLock, but I don't think it's necessary. I don't think anybody has ever accidentally deadlocked using those interfaces, if at all. In the past I've also considered Condvar and Barrier, but that's trickier. They're kind of the opposite of a mutex, so tricks that work for avoiding deadlocks here won't work on them. I've also nevcer used either of those types before, so I'm probably not the best person to figure that out.

Conclusion

This has been a very long nerd snipe. But I think this idea could become incredibly useful, which is why I keep pushing for it. It'd be even better if it was a part of the standard libary to begin with, but I'm aware that's not possible anymore.

The fact that this is possible makes me appreciate the borrow checker even more. People often think of it as a limitation (myself included), but it solves so many problems that I'm surprised more languages don't adopt it. Not only does it solve memory corruption, but also resource leaks, and now deadlocks. You spend more time specifying your code, but less time debugging it.

There's still work that needs to be done (I haven't even begun async runtime support yet), but I hope that as this library becomes more developed, there will be more projects which are immune this error.