-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
ALL
-
Storage NYC 2019-02-11
-
9
When a transaction is prepared on secondary, a DBHash transaction on the same collection will have to wait for the prepared transaction to either commit or abort, while holding a global lock in IX mode. If a DDL operation comes in at this moment, the DDL operation will be blocked since it needs the global lock in X mode. There might be commit or abort command coming after the DDL in the oplog, however, they don't have a chance to apply.
This isn't a problem on primary, since a concurrent commit / abort command can come in while DBHash is blocked, however secondaries serialize all operations. This won't be solved by acquiring weaker locks for DDL operations either, because DBHash may be holding a IX lock on one collection, but waiting for a prepared transaction in another collection. For example, there are two collection C1 and C2. The sequence of operations on a secondary is:
- (Oplog Application) A transaction with a single write in C1 is prepared.
- Read-only transactional DBHash comes in, holding the collection lock on C1 and C2. DBHash is waiting for the prepared transaction to commit or abort.
- (Oplog Application) A DDL on C2 is applied, waiting after the DBHash for collection lock.
The prepared transaction and the DDL operation should both have succeeded on primary since they don't conflict.
Only snapshot read on secondary causes this problem since they cannot yield. There might be a way to change DBHash to use a special snapshot isolation mechanism instead of read-only transaction. Locking, prepare-waiting and DDL operations (e.g. dropCollection) have to be considered when designing such an approach. In the short term, we perhaps need to disable read-only transaction on secondaries.
- related to
-
SERVER-39372 Make secondary lock acquisition for DDL operations consistent with behavior on primary
- Closed
-
SERVER-40594 Range deleter in prepare conflict retry loop blocks step down
- Closed
-
SERVER-40723 Deadlock between S lock acquisition on secondary and prepare conflict
- Closed
-
SERVER-39139 Remove testing support for secondary transactions
- Closed
-
SERVER-39321 Re-enable the CheckReplDBHashInBackground hook
- Closed