Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65478

Fix race condition when removing tenant migration blockers in shard split

    • Type: Icon: Task Task
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Serverless

      The `ShardSplitOpObserver` removes access blockers when the state document is removed due to the ttl index (`ShardSplitDonorOpObserver::onDelete`). However it does not check if the access blocker is currently "used" by another shard split operation for the same tenant. Therefore we can have a race condition where a previous aborted shard split removes blocker for `tenant1` that is used by a currently ongoing shard split.

      Scenario :

      • commitShardSplit started for tenant1 for UUID 1
      • commitShardSplit fails and the document becomes "aborted"
      • forgetShardSplit called for UUID 1, ttl index activated
      • commitShardSplit started for tenant1 for UUID 2
      • ttl index removes state document for commitShardSplit UUID 1. It also removes the access blocker for tenant1 in the same operation.
      • commitShardSplit UUID 2 crashes due to an invariant failure (or other UB behavior) as it expects to have an access blocker.

      This leads to a crash, but it can also lead to data inconsistency before the crash happens (writes succeed when they shouldn't as the blocker as been removed).

            Assignee:
            backlog-server-serverless [DO NOT USE] Backlog - Server Serverless (Inactive)
            Reporter:
            didier.nadeau@mongodb.com Didier Nadeau
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: