shardCollection can fail on config server failover if primary shard finishes _shardsvrShardCollection before the stepdown thread kills ops

XMLWordPrintableJSON

    • Sharding EMEA
    • Fully Compatible
    • 21
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      A shardCollection command can fail in the following scenario:
      1. Config server primary sends _shardsvrShardColection to primary shard
      2. The stepdown thread starts running on the config server
      3. _shardsvrShardCollection writes to config.chunks and config.collections on the new primary config server
      4. _shardsvrShardCollection finishes and returns back to the original config primary before the stepdown thread began killing operations, so the config server will read a stale routing table

      In this case, the primary shard wrote the new chunks to config.chunks and marked the collection as sharded in config.collections successfully on the new primary config, so a user can retry and the command should succeed immediately.

              Assignee:
              [DO NOT USE] Backlog - Sharding EMEA
              Reporter:
              Janna Golden
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: