Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12515

Unable to move hashed shard key chunks created by numInitialChunks

    • Fully Compatible
    • ALL

      Issue Status as of March 31, 2014

      ISSUE SUMMARY

      A bug in the sharding logic for hashed shard keys causes issues when sharding a collection on a hashed shard key and specifying the numInitialChunks option. Some chunks cannot be moved with the moveChunk command immediately after the collection was created.

      USER IMPACT

      This issue can lead to imbalanced data and issues during balancing in a sharded collection with a hashed shard key.

      SOLUTION

      Chunk splits now set the correct lower bound in the cached metadata within the shard.

      WORKAROUNDS

      A restart of mongod on the primary nodes between the shardCollection and moveChunk commands clears out the chunk manager cache and resolves the issue.

      AFFECTED VERSIONS

      Versions 2.4.0 to 2.4.9 are affected by this bug.

      PATCHES

      The fix is included in the 2.4.10 production release and the 2.6.0-rc0 release candidate, which will evolve into the 2.6.0 production release.

      Original Description

      When sharding a collection with a hashed shard key, and specifying numInitialChunks, some of these initial chunks are unable to be moved immediately afterwards.

      jstests are attached.

      In 2.4.9, the characterisation is:

      • Only chunks on the last shard are affected.
      • All but the final chunk are affected.
      • Before a successful chunk move, attempting to move problem chunks gives errors such as:
        {
                "cause" : {
                        "errmsg" : "exception: ranges differ, requested: { x: 0 } -> { x: 1152921504606846974 } existing: { x: 0 } -> { x: 8070450532247928818 }",
                        "code" : 13587,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        {
                "cause" : {
                        "errmsg" : "exception: ranges differ, requested: { x: 1152921504606846974 } -> { x: 2305843009213693948 } existing: { x: 1152921504606846974 } -> { x: MaxKey }",
                        "code" : 13587,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        {
                "cause" : {
                        "errmsg" : "exception: ranges differ, requested: { x: 2305843009213693948 } -> { x: 3458764513820540922 } existing: { x: 2305843009213693948 } -> { x: MaxKey }",
                        "code" : 13587,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        ...
        
      • After a successful chunk move, attempting to move a problem chunk gives a different error:
        { "ok" : 0, "errmsg" : "no chunk found with those upper and lower bounds" }
        

      In 2.5.1+, the characterisation is:

      • All shards are affected.
      • All chunks are affected.
      • Attempting to move a chunk gives errors such as:
        {
                "cause" : {
                        "errmsg" : "exception: cannot remove chunk [{ x: 0 }, { x: 1152921504606846974 }), this shard does not contain the chunk and it overlaps [{ x: 0 }, { x: 8070450532247928818 })",
                        "code" : 16855,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        {
                "cause" : {
                        "errmsg" : "exception: cannot remove chunk [{ x: 1152921504606846974 }, { x: 2305843009213693948 }), this shard does not contain the chunk and it overlaps [{ x: 0 }, { x: 8070450532247928818 }), [{ x: 1152921504606846974 }, { x: MaxKey })",
                        "code" : 16855,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        {
                "cause" : {
                        "errmsg" : "exception: cannot remove chunk [{ x: 2305843009213693948 }, { x: 3458764513820540922 }), this shard does not contain the chunk and it overlaps [{ x: 1152921504606846974 }, { x: MaxKey }), [{ x: 2305843009213693948 }, { x: MaxKey })",
                        "code" : 16855,
                        "ok" : 0
                },
                "ok" : 0,
                "errmsg" : "move failed"
        }
        ...
        

      The chunks look fine in config.chunks. Restarting the affected shard server between shardCollection and moveChunk allows the chunks to be moved sucessfully, so this is likely to be a bug in ChunkManager that causes it to get confused about chunk bounds. Specifically, it looks like the upper bound is not being set properly.

        1. hash_shard_num_chunks_move1.js
          1 kB
        2. hash_shard_num_chunks_move2.js
          1 kB
        3. hash_shard_num_chunks_move3.js
          1 kB

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: