Priority: Major - P3
Affects Version/s: None
Fix Version/s: Backlog
It would be good to formalize how data structures are accessed in WiredTiger. In particular, for data structures that are protected by locks, we should have a way to verify that we are holding (and releasing) locks around our accesses. We might consider either a static and dynamic approach or both.
A static approach applies analysis to our code before running it. In particular something like LLVM's Thread Safety Analysis. This is a C++ extension, so unsure how it would adapt to our code; at the very least, it could give a start to building our own LLVM plugin to do this.
A dynamic approach might be adding a facility in a diagnostic build like this:
- Introduce a session flag for various locks that can be taken. We already have this for many locks (e.g. WT_SESSION_LOCKED_CHECKPOINT), but make it more general for diagnostic runs. For example, every dhandle has a rwlock, and there are many dhandles that could be locked by a session. However the typical case is that we've locked just one. So WT_SESSION_LOCKED_DHANDLE could mean "this session holds some dhandle lock".
- Then, at every read/write of a data structure's field, we'd need to annotate this with some macro. The macro would assert/check that that kind of data structure is locked in the appropriate way in the session.
- We could prototype the idea the way described, and if successful, try to write a tool or compiler plugin that does these annotations for us.
I expect this approach might only be successful for some data structures, as our locking protocols are often ad hoc. Ad hoc is not always bad, it is typically performant and very flexible. But in some cases, it may be helpful to use a more formal approach to locking so that errors can be prevented.