Loading...

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.6
Component/s: Sharding
Labels:

Assigned Teams:

Sharding EMEA
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a shard is currently unable to do w:majority writes, it should not be considered as a target for balancer moves. Such migrations are highly likely to stall for an extended period (up to 10 hours) at pre-commit, causing orphans and wasted time (during which other migrations cannot occur).

While the shard could become healthy and able to support w:majority during the migration, it would usually be better to wait for that to be known before starting the migration. If the problematic shard is avoided, the balancer can instead be balancing chunks amongst the remaining shards, since these migrations will have a much better chance of completing successfully.

This is only for chunk moves by the balancer. If a user knows that the target shard will be able to catch up a majority within 10 hours, then they should be able to run a manual moveChunk. By contrast, the balancer has no way of knowing this state of affairs. In the absence of this knowledge, it should assume the state of the system won't change, ie. the migration won't succeed.

To repro this (2.6.6):

$ mlaunch init --replicaset --nodes 2 --arbiter --smallfiles --oplogSize 100 --sharded 2 --config 1 --mongos 1 --port 33333
launching: mongod on port 33334
launching: mongod on port 33335
launching: mongod on port 33336
launching: mongod on port 33337
launching: mongod on port 33338
launching: mongod on port 33339
launching: config server on port 33340
replica set 'shard01' initialized.
replica set 'shard02' initialized.
launching: mongos on port 33333
adding shards. can take up to 30 seconds...
$ mlaunch stop shard02 secondary
1 node stopped.
$ mlaunch list

PROCESS          PORT     STATUS     PID

mongos           33333    running    5919

config server    33340    running    5826

shard01
    primary      33334    running    5463
    secondary    33335    running    5483
    arbiter      33336    running    5530

shard02
    primary      33337    running    5582
    mongod       33338    down       -
    arbiter      33339    running    5757

$ mongo --port 33333
MongoDB shell version: 2.6.6
connecting to: 127.0.0.1:33333/test
> sh.enableSharding("test")
{ "ok" : 1 }
> sh.shardCollection("test.test", {_id:1})
{ "collectionsharded" : "test.test", "ok" : 1 }
> db.test.insert({_id:-1})
WriteResult({ "nInserted" : 1 })
> db.test.insert({_id:1})
WriteResult({ "nInserted" : 1 })
> sh.splitAt("test.test", {_id:0})
{ "ok" : 1 }
> sleep(60*1000)
null
> db.getSiblingDB("config").locks.find()
{ "_id" : "configUpgrade", "state" : 0, "who" : "genique:33333:1421043589:1804289383:mongosMain:846930886", "ts" : ObjectId("54b3678501891d25c2a0de29"), "process" : "genique:33333:1421043589:1804289383", "when" : ISODate("2015-01-12T06:19:49.271Z"), "why" : "upgrading config database to new format v5" }
{ "_id" : "balancer", "state" : 2, "who" : "genique:33333:1421043589:1804289383:Balancer:1681692777", "ts" : ObjectId("54b3743d01891d25c2a0e051"), "process" : "genique:33333:1421043589:1804289383", "when" : ISODate("2015-01-12T07:14:05.817Z"), "why" : "doing balance round" }
{ "_id" : "test.test", "state" : 2, "who" : "genique:33334:1421046843:292657564:conn8:1901975678", "ts" : ObjectId("54b3743d26b588f243d1e4a7"), "process" : "genique:33334:1421046843:292657564", "when" : ISODate("2015-01-12T07:14:05.830Z"), "why" : "migrate-{ _id: MinKey }" }

Logfile on shard02 primary confirms

$ grep migrateThread data/shard02/rs1/mongod.log | head -10
2015-01-12T18:14:05.843+1100 [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard01/genique:33334,genique:33335 at epoch 54b370c601891d25c2a0dfbc
2015-01-12T18:14:05.843+1100 [migrateThread] starting new replica set monitor for replica set shard01 with seeds genique:33334,genique:33335
2015-01-12T18:14:05.845+1100 [migrateThread] allocating new ns file /home/kev/testing/balancer-majority/data/shard02/rs1/db/test.ns, filling with zeroes...
2015-01-12T18:14:05.966+1100 [migrateThread] build index on: test.test properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.test" }
2015-01-12T18:14:05.967+1100 [migrateThread]     added index to empty collection
2015-01-12T18:14:05.978+1100 [migrateThread] Helpers::removeRangeUnlocked time spent waiting for replication: 0ms
2015-01-12T18:15:05.979+1100 [migrateThread] warning: secondaryThrottle on, but doc insert timed out after 60 seconds, continuing
2015-01-12T18:15:05.980+1100 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-12T18:15:06.980+1100 [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-12T18:15:07.980+1100 [migrateThread] Waiting for replication to catch up before entering critical section

In 2.8.0-rc4 the only difference is a 1 hour timeout for the initial rangeDeleter, after which the migration receive fails and is retried (I also disabled secondaryThrottle in this test):

$ grep migrateThread data/shard02/rs1/mongod.log | head -20
2015-01-12T21:31:51.871+1100 I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard01/genique:23334,genique:23335 at epoch 54b3a02aaea4b148f49e0882
2015-01-12T21:31:51.871+1100 I NETWORK  [migrateThread] starting new replica set monitor for replica set shard01 with seeds genique:23334,genique:23335
2015-01-12T21:31:51.874+1100 I INDEX    [migrateThread] allocating new ns file /home/kev/testing/balancer-majority-2.8/data/shard02/rs1/db/test.ns, filling with zeroes...
2015-01-12T21:31:51.993+1100 I INDEX    [migrateThread] build index on: test.test properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.test" }
2015-01-12T21:31:51.993+1100 I INDEX    [migrateThread]          building index using bulk method
2015-01-12T21:31:51.993+1100 I INDEX    [migrateThread] build index done.  scanned 0 total records. 0 secs
2015-01-12T21:31:51.993+1100 I SHARDING [migrateThread] Deleter starting delete for: test.test from { _id: MinKey } -> { _id: 0.0 }, with opId: 749
2015-01-12T21:31:51.993+1100 I SHARDING [migrateThread] rangeDeleter deleted 0 documents for test.test from { _id: MinKey } -> { _id: 0.0 }
2015-01-12T22:31:51.995+1100 I SHARDING [migrateThread] rangeDeleter timed out after 0 seconds while waiting for deletions to be replicated to majority nodes
2015-01-12T22:31:51.995+1100 W SHARDING [migrateThread] Failed to queue delete for migrate abort: rangeDeleter timed out after 0 seconds while waiting for deletions to be replicated to majority nodes
2015-01-12T22:31:51.996+1100 I NETWORK  [migrateThread] scoped connection to shard01/genique:23334,genique:23335 not being returned to the pool
2015-01-12T22:31:51.996+1100 I SHARDING [migrateThread] about to log metadata event: { _id: "genique-2015-01-12T11:31:51-54b3b0a75e62b2a409dceac3", server: "genique", clientAddr: ":27017", time: new Date(1421062311996), what: "moveChunk.to", ns: "test.test", details: { min: { _id: MinKey }, max: { _id: 0.0 }, step 1 of 5: 122, note: "aborted" } }
2015-01-12T22:31:52.001+1100 W SHARDING [migrateThread] cannot remove pending chunk [{ _id: MinKey }, { _id: 0.0 }), this shard does not contain the chunk
2015-01-12T22:31:52.001+1100 W SHARDING [migrateThread] cannot remove pending chunk [{ _id: MinKey }, { _id: 0.0 }), this shard does not contain the chunk
2015-01-12T22:32:03.032+1100 I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: MinKey } -> { _id: 0.0 } for collection test.test from shard01/genique:23334,genique:23335 at epoch 54b3a02aaea4b148f49e0882
2015-01-12T22:32:03.033+1100 I SHARDING [migrateThread] Deleter starting delete for: test.test from { _id: MinKey } -> { _id: 0.0 }, with opId: 6925
2015-01-12T22:32:03.033+1100 I SHARDING [migrateThread] rangeDeleter deleted 0 documents for test.test from { _id: MinKey } -> { _id: 0.0 }
2015-01-12T22:32:03.034+1100 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-12T22:32:04.034+1100 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2015-01-12T22:32:05.034+1100 I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section

Details

Description

Attachments

Activity

People

Dates