[SERVER-11929] MongoS allows chunk moves/splits when config servers inconsistent Created: 03/Dec/13  Updated: 11/Jul/16  Resolved: 13/Dec/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.8, 2.5.4
Fix Version/s: 2.5.5

Type: Bug Priority: Critical - P2
Reporter: David Hows Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: corrupt, crash, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File inconsistent-shard.js     File inconsistent-split.js    
Issue Links:
Related
is related to SERVER-10015 balancer should stop when ConfigServe... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

MongoS will continue to issue chunk split commands on insert even when config servers are inconsistent.

These splits will appear as failed in the log and no entry is found in the changelog, however the chunks collection is updated.

Logs from the MongoS. Note the chunk ranges.

Tue Dec  3 14:41:23.346 [conn5] about to initiate autosplit: ns:test.testshard: shard0000:Pixl.local:30000lastmod: 2|985||000000000000000000000000min: { _id: -7767558900086237716 }max: { _id: -7177852217467415019 } dataWritten: 209766 splitThreshold: 1048576
Tue Dec  3 14:41:23.476 [Balancer] distributed lock 'balancer/Pixl.local:30005:1386041035:16807' acquired, ts : 529d52e3a397e550c993feab
Tue Dec  3 14:41:23.476 [Balancer] warning: Skipping balancing round because data inconsistency was detected amongst the config servers.
Tue Dec  3 14:41:23.929 [conn5] warning: splitChunk failed - cmd: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" } result: { errmsg: "exception: write $cmd failed on a node: { "got" : { "_id" : "test.test-_id_0", "lastmod" : { "$timestamp" : { "t" : 2, "i" : 4 } }, "lastmodEpoch" : {...", code: 13105, ok: 0.0 }

Find of the left half of the split chunk:

>db.getSiblingDB("config").chunks.find({"min._id":-7767558900086237716});
{ "_id" : "test.test-_id_-7767558900086237716", "lastmod" : Timestamp(2, 1006), "lastmodEpoch" : ObjectId("529d4efea397e550c993f69e"), "ns" : "test.test", "min" : { "_id" : NumberLong("-7767558900086237716") }, "max" : { "_id" : NumberLong("-7751497167210968041") }, "shard" : "shard0000" }

Find of the right half of the split chunk:

db.getSiblingDB("config").chunks.find({"min._id":-7751497167210968041});
{ "_id" : "test.test-_id_-7751497167210968041", "lastmod" : Timestamp(2, 1007), "lastmodEpoch" : ObjectId("529d4efea397e550c993f69e"), "ns" : "test.test", "min" : { "_id" : NumberLong("-7751497167210968041") }, "max" : { "_id" : NumberLong("-7177852217467415019") }, "shard" : "shard0000" }

MongoD log for splitchunk:

Tue Dec  3 14:41:23.346 [conn5] request split points lookup for chunk test.test { : -7767558900086237716 } -->> { : -7177852217467415019 }
Tue Dec  3 14:41:23.348 [conn5] max number of requested split points reached (2) before the end of chunk test.test { : -7767558900086237716 } -->> { : -7177852217467415019 }
Tue Dec  3 14:41:23.348 [conn5] received splitChunk request: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" }
Tue Dec  3 14:41:23.649 [conn5] distributed lock 'test.test/Pixl.local:30000:1386041087:1349921075' acquired, ts : 529d52e302b120a57cca2e2b
Tue Dec  3 14:41:23.649 [conn5] SyncClusterConnection connecting to [Pixl.local:30002]
Tue Dec  3 14:41:23.650 [conn5] SyncClusterConnection connecting to [Pixl.local:30003]
Tue Dec  3 14:41:23.650 [conn5] SyncClusterConnection connecting to [Pixl.local:30004]
Tue Dec  3 14:41:23.652 [conn5] splitChunk accepted at version 2|1005||529d4efea397e550c993f69e
Tue Dec  3 14:41:23.789 [conn5] scoped connection to Pixl.local:30002,Pixl.local:30003,Pixl.local:30004 not being returned to the pool
Tue Dec  3 14:41:23.928 [conn5] distributed lock 'test.test/Pixl.local:30000:1386041087:1349921075' unlocked.
Tue Dec  3 14:41:23.928 [conn5] command admin.$cmd command: { splitChunk: "test.test", keyPattern: { _id: "hashed" }, min: { _id: -7767558900086237716 }, max: { _id: -7177852217467415019 }, from: "shard0000", splitKeys: [ { _id: -7751497167210968041 } ], shardId: "test.test-_id_-7767558900086237716", configdb: "Pixl.local:30002,Pixl.local:30003,Pixl.local:30004" } ntoreturn:1 keyUpdates:0 locks(micros) r:2 reslen:1459 579ms



 Comments   
Comment by Githook User [ 13/Dec/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-11929 MongoS allows chunk moves/splits when config servers inconsistent
Branch: master
https://github.com/mongodb/mongo/commit/323a7a919278c261b6ebfc485ad9cdb71f4dcf44

Comment by David Hows [ 03/Dec/13 ]

Added inconsistent-shard.js

This test confirms that we will shard a new collection when config servers are inconsistent.

The test ran with a hashed shard key and we automatically pre-split and moved chunks.

Comment by David Hows [ 03/Dec/13 ]

Attached a JS test.

Generated at Thu Feb 08 03:27:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.