[SERVER-11268] Allow configuring write concern in moveChunk Created: 18/Oct/13  Updated: 10/Dec/14  Resolved: 07/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Kihyun Kim Assignee: Randolph Tan
Resolution: Done Votes: 1
Labels: sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-16357 Chunk migration pre-commit write conc... Closed
Operating System: ALL
Steps To Reproduce:

Simply, Deploy sharded cluster with some replica sets. I used 2 replica set with 2 standard nodes and 1 arbiter.

$ mkdir -p /Users/k2hyun/Database/mongodb/26101; mongod --port 26101 --dbpath /Users/k2hyun/Database/mongodb/26101 --logpath /Users/k2hyun/Database/mongodb/26101/log --fork --logappend --replSet rs1 --oplogSize 1000
$ mkdir -p /Users/k2hyun/Database/mongodb/26102; mongod --port 26102 --dbpath /Users/k2hyun/Database/mongodb/26102 --logpath /Users/k2hyun/Database/mongodb/26102/log --fork --logappend --replSet rs1 --oplogSize 1000
$ mkdir -p /Users/k2hyun/Database/mongodb/26103; mongod --port 26103 --dbpath /Users/k2hyun/Database/mongodb/26103 --logpath /Users/k2hyun/Database/mongodb/26103/log --fork --logappend --replSet rs1 --oplogSize 1000
$ mongo localhost:26101
> rs.initiate({"_id":"rs1", members:[{"_id":1, "host":"localhost:26101"}, {"_id":2, "host":"localhost:26102"}, {"_id":3, "host":"localhost:26103", "arbiterOnly":true}]})
rs1:PRIMARY>
 
$ mkdir -p /Users/k2hyun/Database/mongodb/26201; mongod --port 26201 --dbpath /Users/k2hyun/Database/mongodb/26201 --logpath /Users/k2hyun/Database/mongodb/26201/log --fork --logappend --replSet rs2 --oplogSize 1000
$ mkdir -p /Users/k2hyun/Database/mongodb/26202; mongod --port 26202 --dbpath /Users/k2hyun/Database/mongodb/26202 --logpath /Users/k2hyun/Database/mongodb/26202/log --fork --logappend --replSet rs2 --oplogSize 1000
$ mkdir -p /Users/k2hyun/Database/mongodb/26203; mongod --port 26203 --dbpath /Users/k2hyun/Database/mongodb/26203 --logpath /Users/k2hyun/Database/mongodb/26203/log --fork --logappend --replSet rs2 --oplogSize 1000
$ mongo localhost:26201
> rs.initiate({"_id":"rs2", members:[{"_id":1, "host":"localhost:26201"}, {"_id":2, "host":"localhost:26202"}, {"_id":3, "host":"localhost:26203", "arbiterOnly":true}]})
rs1:PRIMARY>
 
$ mkdir -p /Users/k2hyun/Database/mongodb/26001; mongod --port 26001 --dbpath /Users/k2hyun/Database/mongodb/26001 --logpath /Users/k2hyun/Database/mongodb/26001/log --fork --logappend --configsvr
$ mkdir -p /Users/k2hyun/Database/mongodb/26002; mongod --port 26002 --dbpath /Users/k2hyun/Database/mongodb/26002 --logpath /Users/k2hyun/Database/mongodb/26002/log --fork --logappend --configsvr
$ mkdir -p /Users/k2hyun/Database/mongodb/26003; mongod --port 26003 --dbpath /Users/k2hyun/Database/mongodb/26003 --logpath /Users/k2hyun/Database/mongodb/26003/log --fork --logappend --configsvr
 
$ mkdir -p /Users/k2hyun/Database/mongodb/26400; mongos --port 26400 --logpath /Users/k2hyun/Database/mongodb/26400/log --fork --logappend --configdb "localhost:26001,localhost:26002,localhost:26003"
$ mongos localhost:26400
mongos> sh.addShard("rs1/localhost:26101,localhost:26102")
{ "shardAdded" : "rs1", "ok" : 1 }
mongos> sh.addShard("rs2/localhost:26201,localhost:26202")
{ "shardAdded" : "rs2", "ok" : 1 }

Then just kill one of standard node. (I killed the mongd using port 26202)
The final step is just create a hashed collection. It NEVER ends. The moveChunk command is same.

mongos> db.runCommand({"enableSharding":"foo"})
{ "ok" : 1 }
mongos> db.runCommand({"shardCollection":"foo.bar", "key":{"name":"hashed"}})

Participants:

 Description   

In a sharded cluster with replica sets, if any standard node goes down, when run command "moveChunk" or create sharded collection with "hashed" key, the operation does not end.

The weak replica set's primary node shows below on log file.

Sat Oct 19 01:34:28.611 [rsHealthPoll] couldn't connect to localhost:26202: couldn't connect to server localhost:26202
Sat Oct 19 01:34:29.292 [migrateThread] Waiting for replication to catch up before entering critical section
Sat Oct 19 01:34:29.293 [migrateThread] warning: migrate commit waiting for 2 slaves for 'foo.bar' { name: 0 } -> { name: MaxKey } waiting for: 526162d9:2

I tried this for 2.4.2 but It works well. This happens only 2.4.6.



 Comments   
Comment by Greg Studer [ 07/Mar/14 ]

There is an option to change the behavior of migrations for non-critical writes (secondaryThrottle), but changing the write concern in this way is dangerous - rollbacks can affect migrated data if it isn't replicated to a majority of members at the critical section.

> I think the sharded cluster must work in spite of replica set weakness because there is an opLog.
This is why replication constraints are needed - to ensure oplogs are up-to-date.

Comment by Kihyun Kim [ 22/Oct/13 ]

Thank you for your comment.
I know that message is what meaning. but I cannot understand why the only moveChunk command must keep the majority without any extra option. As you comment, an insert operation can be called with w = majority option or 1.
I think the sharded cluster must work in spite of replica set weakness because there is an opLog.

I just found a document "chunk migration write concern". http://docs.mongodb.org/manual/core/sharding-chunk-migration/#chunk-migration-write-concern
But still want to know any options to ignore the majority for moveChunk command.

Comment by Randolph Tan [ 21/Oct/13 ]

Hi,

The moveChunk command is waiting for the updates to be updated to the majority of the members that stores data in the set. The example you gave above does not satisfy this condition. This is basically the same as doing an insert and calling getLastError with w = majority.

Hope that helps.

Generated at Thu Feb 08 03:25:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.