we have seen numerous occurrences of the balancer not removing the old chunk once it has been migrated to a new shard. this manifests in strange inconsistencies for queries broadcast to all shards.
mongos> db.coll.find({"some_field":"x"}).count()
14
mongos> db.coll.find({"some_field":"x"}).next()
Thu Mar 22 13:41:24 uncaught exception: error hasNext: false
mongos> db.coll.find({"some_field":"x"}).explain()
{
"clusteredType" : "ParallelSort",
"shards" : {
"shard2/10.176.163.134:27022,10.176.164.146:27021" : [
{
"cursor" : "BtreeCursor some_field_1",
"nscanned" : 14,
"nscannedObjects" : 14,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 14,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"some_field" : [
[
"x",
"x"
]
]
}
}
],
"shard3/10.177.210.46:27017,10.177.210.47:27017" : [
{
"cursor" : "BtreeCursor some_field_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"some_field" : [
[
"x",
"x"
]
]
}
}
],
"shard4/10.176.64.155:27017,10.177.205.133:27017" : [
{
"cursor" : "BtreeCursor some_field_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"some_field" : [
[
"x",
"x"
]
]
}
}
]
},
"n" : 0,
"nChunkSkips" : 14,
"nYields" : 0,
"nscanned" : 14,
"nscannedObjects" : 14,
"millisTotal" : 0,
"millisAvg" : 0,
"numQueries" : 3,
"numShards" : 3
}
going to shard2 directly:
PRIMARY> db.reviews.find({"some_field":"x"}).count()
14
PRIMARY> db.reviews.find({"some_field":"dirty"},{my_shard_id:true})
{ "_id" : 31576707, "my_shard_id" : 13181 }
{ "_id" : 31489421, "my_shard_id" : 13187 }
{ "_id" : 31596862, "my_shard_id" : 13179 }
{ "_id" : 31616772, "my_shard_id" : 13186 }
{ "_id" : 31565191, "my_shard_id" : 13193 }
{ "_id" : 31574087, "my_shard_id" : 13184 }
{ "_id" : 31468296, "my_shard_id" : 13179 }
{ "_id" : 31434373, "my_shard_id" : 13192 }
{ "_id" : 31629660, "my_shard_id" : 13192 }
{ "_id" : 31777042, "my_shard_id" : 13184 }
{ "_id" : 31626661, "my_shard_id" : 13179 }
{ "_id" : 31344196, "my_shard_id" : 13184 }
{ "_id" : 31786861, "my_shard_id" : 13192 }
{ "_id" : 31808323, "my_shard_id" : 13188 }
and from mongos:
{ "my_shard_id" : 13165 } -->> { "my_shard_id" : 13200 } on : shard4 { "t" : 68000, "i" : 0 }
my only hypothesis for how this could have occurred is that the balancer migrated this chunk from shard2 to shard4 but did not remove it from shard2.
- depends on
-
SERVER-3645 Sharded collection counts (on primary) can report too many results
-
- Closed
-