WiredTiger currently (2.5) has a schema lock to single-thread schema-modifying operations combined with moderately complex locking of data handles to provide shared or exclusive access.
In addition, there is a "table lock" to single-thread schema-modifying operations on the same table that may not exclusively lock the same file (e.g., adding or dropping independent indices). Further, there is a "data handle list list" lock to protect the list of data handles.
This is complicated, fragile and hard to maintain.
Instead, we should move to a situation where:
- all data sources in the system that require locking (including at least tables, indices and LSM trees) become WT_DATA_HANDLEs.
_ all operations start by locking the data handles they need. This involves searching the session for cached data handles, then _protected by the data handle mutex* searching and optionally creating the data handles in the connection cache.
- once a required handle is found, it is cached in the session cache, and only at this stage is there any blocking on the data handle lock (after the handle list mutex is dropped).
- blocking should be made configurable (default blocking=true), rather than the current behavior where read-only operations block but exclusive operations sometimes fail with EBUSY.
- since each operation only operates on a single high-level handle (e.g., a single table, even if that includes multiple btrees or LSM trees underneath), and exclusive locks are never held for longer than a single operation, we can continue to avoid data handle deadlocks.
These changes would mean:
- we can eliminate the schema lock, the table lock and LSM locks: the operations they currently protect would be protected by handle locks on tables, indices or LSM trees.
- no operation ever blocks while holding the data handle list lock: it is purely used to protect the shared handle list
- more concurrency: independent schema-modifying operations can complete independently. This requires some care with regard to metadata changes, particularly with regard to checkpoints and when newly-created data sources become durable.