|
The `ShardSplitOpObserver` removes access blockers when the state document is removed due to the ttl index (`ShardSplitDonorOpObserver::onDelete`). However it does not check if the access blocker is currently "used" by another shard split operation for the same tenant. Therefore we can have a race condition where a previous aborted shard split removes blocker for `tenant1` that is used by a currently ongoing shard split.
Scenario :
- commitShardSplit started for tenant1 for UUID 1
- commitShardSplit fails and the document becomes "aborted"
- forgetShardSplit called for UUID 1, ttl index activated
- commitShardSplit started for tenant1 for UUID 2
- ttl index removes state document for commitShardSplit UUID 1. It also removes the access blocker for tenant1 in the same operation.
- commitShardSplit UUID 2 crashes due to an invariant failure (or other UB behavior) as it expects to have an access blocker.
This leads to a crash, but it can also lead to data inconsistency before the crash happens (writes succeed when they shouldn't as the blocker as been removed).
|