created | 2025-01-06T15:02:16Z |
---|---|
begin | 2025-01-04T00:00:00Z |
end | 2025-01-04T02:34:11Z |
path | src/sys |
commits | 1 |
date | 2025-01-04T02:34:11Z | |||
---|---|---|---|---|
author | dlg | |||
files | src/sys/kern/kern_rwlock.c | log | diff | annotate |
src/sys/sys/rwlock.h | log | diff | annotate | |
message |
rework rwlocks to reduce pressure on the scheduler and SCHED_LOCK this is src/sys/kern/kern_rwlock.c r1.51 and src/sys/sys/rwlock.h r1.30 again. it was backed because of how vfs_busy was using rwlocks, but that has been changed to allow this diff to go back in. it's become obvious that heavily contended rwlocks put a lot of pressure on the scheduler, and we have enough contended locks that there's a benefit to changing rwlocks to try and mitigate against that pressure. when a thread is waiting on an rwlock, it sets a bit in the rwlock to indicate that when the current owner of the rwlock leaves the critical section, it should wake up the waiting thread to try and take the lock. if there's no waiting thread, the owner can skip the wakeup. the problem is that rwlocks can't tell the difference between one waiting thread and more than one waiting thread. so when the "there's a thread waiting" bit is cleared, all the waiting threads are woken up. one of these woken threads will take ownership of the lock, but also importantly, the other threads will end up setting the "im waiting" bit again, which is necessary for them to be woken up by the 2nd thread that won the race to become the owner of the lock. this is compounded by pending writers and readers waiting on the same wait channel. an rwlock may have one pending writer trying to take the lock, but many readers waiting for it too. it would make sense to wake up only the writer so it can take the lock next, but we end up waking the readers at the same time. the result of this is that contended rwlocks wake up a lot of threads, which puts a lot of pressure on the scheduler. this is noticeable as a lot of contention on the scheduler lock, which is a spinning lock that increases time used by the system. this is a pretty classic thundering herd problem. this change mitigates against these wakeups by adding counters to rwlocks for the number threads waiting to take write and read locks instead of relying on bits. when a thread needs to wait for a rwlock it increments the relevant counter before sleeping. after it is woken up and takes the lock it decrements that counter. this means rwlocks know how many threads are waiting at all times without having to wake everything up to rebuild state every time a thread releases the lock. pending writers and readers also wait on separate wchans. this allows us to prioritise writers and to wake them up one at a time. once there's no pending writers all pending readers can be woken up in one go so they can share the lock as soon as possible. if you are suffering a contended rwlock, this should reduce the amount of time spent spinning on the sched lock, which in turn may also reduce the wall clock time doing that work. the only downside to this change in my opinion is that it grows struct rwlock by 8 bytes. if we can reduce rwlock contention in the future, i reckon i could shrink the rwlock struct again while still avoiding some of the scheduler interactions. work with claudio@ ok claudio@ mpi@ stsp@ testing by many including claudio@ landry@ stsp@ sthen@ phessler@ tb@ and mark patruck |