[SERVER-12900] moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks Created: 25/Feb/14  Updated: 26/Sep/14  Resolved: 26/Sep/14

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Coutu Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Operating System: ALL
Participants:

 Description   

I've been having a problem all day with my MongoDB cluster getting stuck while migrating chunks.

This is a 8 shard cluster, MongoDB 2.4.6, with each shard a 5 member replica set. I have 2 shards both trying to migrate chunks:

first primary:

Feb 25 20:46:55 terranova mongod.10001[19778]: Tue Feb 25 20:46:55.986 [conn4194] command admin.$cmd command: { moveChunk: "current.reviews", from: "rs1/mongors1-2.redacted.com:10001,mongors1-3.redacted.com:10001,terranova.redacted.com:10001", to: "rs3/bastoni.redacted.com:10003,mongors3-2.redacted.com:10003,mongors3-3.redacted.com:10003", fromShard: "rs1", toShard: "rs3", min:

{ location_id: ObjectId('52a75b4738fc9d23e88ab516') }

, max:

{ location_id: ObjectId('52a75b4814b0f61b64acffc8') }

, maxChunkSizeBytes: 67108864, shardId: "current.reviews-location_id_ObjectId('52a75b4738fc9d23e88ab516')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false } ntoreturn:1 keyUpdates:0 locks(micros) W:3854 r:85284 reslen:343 559ms

8th primary:

Feb 25 20:47:40 MongoRS8-1 mongod.10008[31876]: Tue Feb 25 20:47:40.430 [conn1993] received moveChunk request: { moveChunk: "current.citations", from: "rs8/mongors8-1.redacted.com:10008,mongors8-2.redacted.com:10008,mongors8-3.redacted.com:10008", to: "rs4/mongors4-1.redacted.com:10004,mongors4-2.redacted.com:10004,mongors4-3.redacted.com:10004", fromShard: "rs8", toShard: "rs4", min:

{ location_id: ObjectId('4f2703f0bc0f367032000000') }

, max:

{ location_id: ObjectId('4f3533ffbc0f36d419000002') }

, maxChunkSizeBytes: 67108864, shardId: "current.citations-location_id_ObjectId('4f2703f0bc0f367032000000')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false }

On both primaries I am seeing:

warning: moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 39

In the last 6 hours, 3 chunks have managed to migrate.

I have bounced mongos's, bounced mongod's. Stopped the balancer, removed the {_id: "balancer"} lock from the config db, bounced the mongos's, reenabled the balancer.

Not seeing anything out of the ordinary on the receivers.

Strange thing, is that 3 chunks have been able to migrate amidst this issue. The errors disappear, there is some chunk migration log messages, it finished, and then the errors start up again.



 Comments   
Comment by Eric Coutu [ 26/Feb/14 ]

#4 is hidden and #5 is an arbiter. Not sure if this is the proper link for you, but https://mms.mongodb.com/host/list/4fe1eaeb87d1d86fa8bb6d7b

Seems like the problem has resolved itself overnight. The 2 collections that were way out of balance have transferred their chunks to other shards.

Comment by Asya Kamsky [ 26/Feb/14 ]

Eric,

You mention 5 member replica sets but the shards are listed with three members - are the other two hidden? Arbiters? or something else? Is there replication lag in the "TO" shards at all?

(If this cluster is in MMS and you can provide a link I could look this up myself)

Asya

Generated at Thu Feb 08 03:29:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.