Note: This ticket does not fully fix the deadlock described. A complete fix was introduced in SERVER-60334.
This is a follow-up to WT-8245.
There's a mutex inside the SizeStorer that serializes access to a global WT session and cursor that we keep open forever. We let multiple threads share it, which is where the mutex comes in. In general, it's not a good idea to hold an exclusive lock and call into the storage engine.
The larger problem is that the SizeStorer uses a WT_SESISON that is not the one owned by the calling operation, which may also have its own WT_SESSION.
In practice, this has only shown up in importCollection. After the operation has performed a catalog write, it gets stuck inside of SizeStorer::load, holds this mutex, and blocks on cache eviction. WiredTiger will roll back transactions that have written data, but it will not roll back read-only transactions. WiredTiger cannot roll-back the SizeStorer::load() because the SizeStorer uses an entirely separate WT_SESSION than the one that importCollection uses. So even though importCollection has written data, it cannot be rolled back even if it is causing cache issues.
Using more than one WT_SESSION per thread is a bug that we've seen before.
We should just get rid of this global session + cursor and require that callers pass their own OperationContext. If that's not possible for some reason, we'll need to use "cache_max_wait_ms" to allow the operation to time itself out.
- is related to
-
SERVER-67514 SizeStorer load() can get stuck in page eviction
- Closed
-
SERVER-98595 Test that using multiple sessions per thread is safe
- Backlog
- related to
-
WT-8245 Fix eviction hang during importCollection
- Closed
-
SERVER-61116 Audit and add assertions against using multiple WT_SESSIONs on the same thread
- Backlog
-
SERVER-60334 Avoid caching the cursor and session in WiredTigerSizeStorer
- Closed