Currently, prepareTransaction commands need to acquire write tickets. And thus this could happen:
On shard0:
- Txn0 is prepared on shard0 but not yet on shard1, shard0 as the coordinator is waiting for shard1 to prepare Txn0
- Transactional reads exhaust all write tickets and are blocked due to prepare conflicts with Txn0
- The prepareTransaction command for Txn1 is blocked behind write ticket acquisition
On shard1:
- Txn1 is prepared on shard1 but not yet on shard0, shard1 as the coordinator is waiting for shard0 to prepare Txn1
- Transactional reads exhaust all write tickets and are blocked due to prepare conflicts with Txn1
- The prepareTransaction command for Txn0 is blocked behind write ticket acquisition
One way to resolve this naturally is to rely on the transaction lifetime limit (default 60s). But I don't think it's ideal to rely on operation timeout to resolve a deadlock that we allow the system to form.
One idea to fix this problem is to skip ticket acquisition for prepareTransaction as prepareTransaction is not going to acquire any more storage resources anyways.
- is related to
-
SERVER-41980 Non-transactional commands can deadlock with prepared transactions when the tickets are exhausted by the non-transactional write commands.
- Closed
-
SERVER-42398 abortTransaction and commitTransaction commands should not acquire ticket irrespective of the prepared state.
- Closed