Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24892

"Creating first chunks failed: Data inconsistency detected amongst config servers" when using 3.2.3+ without replica set config servers

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.2.3
    • Fix Version/s: 3.2.9
    • Component/s: Sharding
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Use versions 3.2.3 - 3.2.7 (<= 3.2.2 not tested yet)

      The sharded cluster must have the <= v3.0 standard setup of three config servers that are not running as a replica set. Number or type of shards, or number of mongos nodes, seems to be irrelevant.

      The collection being sharded should have a large initial number of chunks, to make the shardCollection command run a for long time. 30 secs or more seems to make it very probable; maybe 60 secs or higher will guarantee it. My repro was done with a 1GB collection in combination with a 1MB chunkSize, i.e. having > 1,000 chunks.

      Show
      Use versions 3.2.3 - 3.2.7 (<= 3.2.2 not tested yet) The sharded cluster must have the <= v3.0 standard setup of three config servers that are not running as a replica set. Number or type of shards, or number of mongos nodes, seems to be irrelevant. The collection being sharded should have a large initial number of chunks, to make the shardCollection command run a for long time. 30 secs or more seems to make it very probable; maybe 60 secs or higher will guarantee it. My repro was done with a 1GB collection in combination with a 1MB chunkSize, i.e. having > 1,000 chunks.
    • Sprint:
      Sharding 17 (07/15/16)
    • Case:

      Description

      During an initial sharding of collection the error shown below can occur. During the insert of the initial chunk documents to config.chunks the asynchronously-running data consistency checking thread can observe an inconsistent view of the config db, and throws the "Data inconsistency detected amongst config servers" error up.

      mongos> sh.shardCollection("test.foo", key: { "x": 1 })
      {
          "ok" : 0,
          "errmsg" : "Creating first chunks failed: Data inconsistency detected amongst config servers",
          "code" : 132
      }
      

      This stops the shardCollection command at the point of having inserted some fraction of chunk documents into config.chunks, but no document into config.collections. So if the same command is attempted again then the following error appears:

      { "ok" : 0, "errmsg" : "collection test.foo already sharded with 834 chunks.", "code" : 23 }
      

      The likelihood of having this timing collision with the consistency checking action seems to be very low if you only have a few chunks to insert. I could only reproduce when I had > 1,000 chunks, which in the environment I was using caused the shardCollection command to run for > 30 secs.

      Changing the config servers to a replica set (per these instructions) was the only way I could consistently avoid this error while sharding the large collections in my test.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: