Replication two-phase drop (4.0 and eMRC=F) finishes dropping collections when the drop oplog entry gets majority committed. drop commands with w:majority wait until their collections finish dropping to return to the user. Tests expect that after executing a drop with w:majority they can safely move onto the next operation, including transactions which have a 5ms lock timeout. If the majority commit point gets advanced multiple times concurrently, multiple notifications can schedule tasks to complete the same two phase drop. The first scheduled task will succeed and will let the w:majority drop return to the user. At this point the user may start a transaction. The other drop-pending notifications that are still scheduled will eventually be run and will acquire a database lock to complete the collection drop. This can cause the transaction to get a lock timeout. It's a TransientTransactionError that could be easily retried, but our tests do not expect to get a TransientTransactionError everywhere this is possible.
Sequence of events
- Thread A gets a drop command and waits on write concern majority
- Thread B gets notified to drop the collection
- Thread B drops the collection
- Thread C gets notified to drop the collection
- Thread A returns to the user
- Thread C acquires an X lock on the 'test' database
- Thread A starts a transaction and conflicts with thread C's lock.