Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
-
None
-
ALL
Description
I've been having a problem all day with my MongoDB cluster getting stuck while migrating chunks.
This is a 8 shard cluster, MongoDB 2.4.6, with each shard a 5 member replica set. I have 2 shards both trying to migrate chunks:
first primary:
Feb 25 20:46:55 terranova mongod.10001[19778]: Tue Feb 25 20:46:55.986 [conn4194] command admin.$cmd command: { moveChunk: "current.reviews", from: "rs1/mongors1-2.redacted.com:10001,mongors1-3.redacted.com:10001,terranova.redacted.com:10001", to: "rs3/bastoni.redacted.com:10003,mongors3-2.redacted.com:10003,mongors3-3.redacted.com:10003", fromShard: "rs1", toShard: "rs3", min:
{ location_id: ObjectId('52a75b4738fc9d23e88ab516') }, max:
{ location_id: ObjectId('52a75b4814b0f61b64acffc8') }, maxChunkSizeBytes: 67108864, shardId: "current.reviews-location_id_ObjectId('52a75b4738fc9d23e88ab516')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false } ntoreturn:1 keyUpdates:0 locks(micros) W:3854 r:85284 reslen:343 559ms
8th primary:
Feb 25 20:47:40 MongoRS8-1 mongod.10008[31876]: Tue Feb 25 20:47:40.430 [conn1993] received moveChunk request: { moveChunk: "current.citations", from: "rs8/mongors8-1.redacted.com:10008,mongors8-2.redacted.com:10008,mongors8-3.redacted.com:10008", to: "rs4/mongors4-1.redacted.com:10004,mongors4-2.redacted.com:10004,mongors4-3.redacted.com:10004", fromShard: "rs8", toShard: "rs4", min:
{ location_id: ObjectId('4f2703f0bc0f367032000000') }, max:
{ location_id: ObjectId('4f3533ffbc0f36d419000002') }, maxChunkSizeBytes: 67108864, shardId: "current.citations-location_id_ObjectId('4f2703f0bc0f367032000000')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false }
On both primaries I am seeing:
warning: moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 39
In the last 6 hours, 3 chunks have managed to migrate.
I have bounced mongos's, bounced mongod's. Stopped the balancer, removed the {_id: "balancer"} lock from the config db, bounced the mongos's, reenabled the balancer.
Not seeing anything out of the ordinary on the receivers.
Strange thing, is that 3 chunks have been able to migrate amidst this issue. The errors disappear, there is some chunk migration log messages, it finished, and then the errors start up again.