Do not remove blockers for aborted shard split when deleting the state document

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Fixed
    • Priority: Major - P3
    • 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • Server Serverless 2022-08-08
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The tenant access blockers are removed at three locations for shard split :

      This opens up a race condition as split tries to remove a blocker for a tenant twice for aborted migration (when setting expireAt and when deleting the document). Therefore we can have the following scenario :

       

      1. Starting split for tenantA with id 1
      2. Blockers are installed for tenantA for split 1
      3. Split 1 aborts due to an error
      4. forgetShardSplit is called for split 1. It sets expireAt and remove blocker for tenantA
      5. A new split is started with id 2
      6. Blockers are installed for tenantA for split 2
      7. The state document is removed for split 1. The blocker for tenantA is removed in onDelete (this blocker is owned by split 2)
      8. Split 2 triggers an invariant as it expects to have a blocker for tenantA

            Assignee:
            Didier Nadeau
            Reporter:
            Didier Nadeau
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: