ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() holds a collection lock on the oplog while doing a blocking wait. This can cause a hang described below:
1. First, perform an insert into a replicated collection using insertDocuments(). An optime is generated, but not committed. If another write occurs after this at a later optime, a "hole" is created by the timestamped write is that is not yet committed.
2. A reader using readConcern "atClusterTime" or "afterClusterTime" begins a read. This uses ReplicationCoordinatorExternalStateImpl::waitForAllEarlierOplogWritesToBeVisible() to wait for all uncommitted operations to become committed and visible.
- This waits for the uncommitted insert in step 1 to be commited while holding a DBLock("local", MODE_IS)
3. A dropCollection command is received on the "local" database, and enqueues a DBLock("local", MODE_X).
4. The first insert completes the insert in the storage engine and attempts to write the oplog entry at the generated optime. It attempts to acquire a DBLock("local", MODE_IX).
- The previously enqueued dropCollection operation prevents the insert from acquiring the "local" database lock.
- waitForAllEarlierOplogWritesToBeVisible() holds its collection lock while waiting the insert to become visible, which waits behind the dropCollection operation
This method should be redesigned so that a collection lock is not required to be held while waiting for the last oplog entry to become visible.