Race Condition in Balancer Force-Jumbo Migrations Causes Duplicate Key Errors on Recipient Shard

XMLWordPrintableJSON

    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v8.2, v8.0, v7.0
    • ClusterScalability Jan5-Jan19, ClusterScalability 19Jan-2Feb
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During shard draining, balancer-initiated force-jumbo migrations (kForceBalancer) delay entering the critical section until cloning finishes. When the jumbo cloning path uses an index scan (forceJumbo = true), concurrent shard key updates can move documents within the scan range. This movement allows the same document to be cloned twice, resulting in _id duplicate key errors on the recipient shard and aborting the migration.

       

      The fastest mitigation would be to stop attaching forceJumbo during draining. In this case, the customer would need to move the chunk manually, similar to how chunks marked as jumbo are handled.

      Potential alternative solutions:

      • Move the critical section earlier for kForceBalancer jumbo migrations.
      • Add deduplication logic to the jumbo index scan path.

            Assignee:
            Kruti Shah
            Reporter:
            Kruti Shah
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: