Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11929

MongoS allows chunk moves/splits when config servers inconsistent

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 2.4.8, 2.5.4
    • Fix Version/s: 2.5.5
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      MongoS will continue to issue chunk split commands on insert even when config servers are inconsistent.

      These splits will appear as failed in the log and no entry is found in the changelog, however the chunks collection is updated.

      Logs from the MongoS. Note the chunk ranges.

      Tue Dec  3 14:41:23.346 [conn5] about to initiate autosplit: ns:test.testshard: shard0000:Pixl.local:30000lastmod: 2|985||000000000000000000000000min: { _id: -7767558900086237716 }max: { _id: -7177852217467415019 } dataWritten: 209766 splitThreshold: 1048576
      Tue Dec  3 14:41:23.476 [Balancer] distributed lock 'balancer/Pixl.local:30005:1386041035:16807' acquired, ts : 529d52e3a397e550c993feab
      Tue Dec  3 14:41:23.476 [Balancer] warning: Skipping balancing round because data inconsistency was detected amongst the config servers.
      Tue Dec  3 14:41:23.929 [conn5] warning: splitChunk failed - cmd: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" } result: { errmsg: "exception: write $cmd failed on a node: { "got" : { "_id" : "test.test-_id_0", "lastmod" : { "$timestamp" : { "t" : 2, "i" : 4 } }, "lastmodEpoch" : {...", code: 13105, ok: 0.0 }

      Find of the left half of the split chunk:

      >db.getSiblingDB("config").chunks.find({"min._id":-7767558900086237716});
      { "_id" : "test.test-_id_-7767558900086237716", "lastmod" : Timestamp(2, 1006), "lastmodEpoch" : ObjectId("529d4efea397e550c993f69e"), "ns" : "test.test", "min" : { "_id" : NumberLong("-7767558900086237716") }, "max" : { "_id" : NumberLong("-7751497167210968041") }, "shard" : "shard0000" }

      Find of the right half of the split chunk:

      db.getSiblingDB("config").chunks.find({"min._id":-7751497167210968041});
      { "_id" : "test.test-_id_-7751497167210968041", "lastmod" : Timestamp(2, 1007), "lastmodEpoch" : ObjectId("529d4efea397e550c993f69e"), "ns" : "test.test", "min" : { "_id" : NumberLong("-7751497167210968041") }, "max" : { "_id" : NumberLong("-7177852217467415019") }, "shard" : "shard0000" }

      MongoD log for splitchunk:

      Tue Dec  3 14:41:23.346 [conn5] request split points lookup for chunk test.test { : -7767558900086237716 } -->> { : -7177852217467415019 }
      Tue Dec  3 14:41:23.348 [conn5] max number of requested split points reached (2) before the end of chunk test.test { : -7767558900086237716 } -->> { : -7177852217467415019 }
      Tue Dec  3 14:41:23.348 [conn5] received splitChunk request: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" }
      Tue Dec  3 14:41:23.649 [conn5] distributed lock 'test.test/Pixl.local:30000:1386041087:1349921075' acquired, ts : 529d52e302b120a57cca2e2b
      Tue Dec  3 14:41:23.649 [conn5] SyncClusterConnection connecting to [Pixl.local:30002]
      Tue Dec  3 14:41:23.650 [conn5] SyncClusterConnection connecting to [Pixl.local:30003]
      Tue Dec  3 14:41:23.650 [conn5] SyncClusterConnection connecting to [Pixl.local:30004]
      Tue Dec  3 14:41:23.652 [conn5] splitChunk accepted at version 2|1005||529d4efea397e550c993f69e
      Tue Dec  3 14:41:23.789 [conn5] scoped connection to Pixl.local:30002,Pixl.local:30003,Pixl.local:30004 not being returned to the pool
      Tue Dec  3 14:41:23.928 [conn5] distributed lock 'test.test/Pixl.local:30000:1386041087:1349921075' unlocked.
      Tue Dec  3 14:41:23.928 [conn5] command admin.$cmd command: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" } ntoreturn:1 keyUpdates:0 locks(micros) r:2 reslen:1459 579ms

        Attachments

        1. inconsistent-shard.js
          0.7 kB
        2. inconsistent-split.js
          0.7 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: