-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
Replication 2021-11-15
-
135
The storage change code takes a global lock, interrupts all opCtxs, and waits for the opCtx to be destroyed. An uninterruptible global lock can deadlock with this, since the global lock won't release until the opCtx is destroyed, and the opCtx will wait forever for it. Most uses of uninterruptible locks do not run during initial sync (e.g. prepared transactions) and aren't a problem, but for some reason the ReplBatcher thread runs continuously even in initial sync, and it requires an uninterruptible lock.
Can be fixed by having the batcher take an interruptible global lock before the uninterruptible section.