[SERVER-8362] mongos does not update chunk info before trying to balance Created: 28/Jan/13  Updated: 06/Dec/22  Resolved: 19/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.3.2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kristina Chodorow (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding
Participants:

 Description   

I'm not sure if this is by design or not, but it leads to kind of weird behavior.

If I have 2 mongos processes and move a chunk on one, then try to move the chunk again from the other, the chunk info doesn't update before the second move attempt so it fails with unintuitive messages.

For example, on mongos #1, I do:

test> sh.moveChunk("test.not.hashed", {user_id:"user438"}, "test-rs2")
{ "millis" : 3441, "ok" : 1 }

On mongos #2, I then do:

config> // first, checking it was actually moved:
config> db.chunks.find({min:{user_id:"user438"}})
{ "_id" : "test.not.hashed-user_id_\"user438\"", "lastmod" : { "t" : 52000, "i" : 0 }, "lastmodEpoch" : ObjectId("510681b502471f5418db3e35"), "ns" : "test.not.hashed", "min" : { "user_id" : "user438" }, "max" : { "user_id" : "user441" }, "shard" : "test-rs2" }
config>
config> // yup, it's in the right place, but this mongos doesn't think so:
config> sh.moveChunk('test.not.hashed', {user_id:'user438'}, 'test-rs1')
{ "ok" : 0, "errmsg" : "that chunk is already on that shard" }
config>
config> // let's try another shard
config> sh.moveChunk('test.not.hashed', {user_id:'user438'}, 'test-rs4')
{
        "cause" : {
                "from" : "test-rs1",
                "official" : "test-rs2",
                "ok" : 0,
                "errmsg" : "location is outdated (likely balance or migrate occurred)"
        },
        "ok" : 0,
        "errmsg" : "move failed"
}
config>
config> // and now it finally works
config> sh.moveChunk('test.not.hashed', {user_id:'user438'}, 'test-rs1')
{ "millis" : 3859, "ok" : 1 }



 Comments   
Comment by Scott Hernandez (Inactive) [ 02/Mar/13 ]

The workaround is to restart mongos or flushRouterConfig

Generated at Thu Feb 08 03:17:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.