[SERVER-12900] moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks Created: 25/Feb/14 Updated: 26/Sep/14 Resolved: 26/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Coutu | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
I've been having a problem all day with my MongoDB cluster getting stuck while migrating chunks. This is a 8 shard cluster, MongoDB 2.4.6, with each shard a 5 member replica set. I have 2 shards both trying to migrate chunks: first primary: Feb 25 20:46:55 terranova mongod.10001[19778]: Tue Feb 25 20:46:55.986 [conn4194] command admin.$cmd command: { moveChunk: "current.reviews", from: "rs1/mongors1-2.redacted.com:10001,mongors1-3.redacted.com:10001,terranova.redacted.com:10001", to: "rs3/bastoni.redacted.com:10003,mongors3-2.redacted.com:10003,mongors3-3.redacted.com:10003", fromShard: "rs1", toShard: "rs3", min: { location_id: ObjectId('52a75b4738fc9d23e88ab516') }, max: { location_id: ObjectId('52a75b4814b0f61b64acffc8') }, maxChunkSizeBytes: 67108864, shardId: "current.reviews-location_id_ObjectId('52a75b4738fc9d23e88ab516')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false } ntoreturn:1 keyUpdates:0 locks(micros) W:3854 r:85284 reslen:343 559ms 8th primary: Feb 25 20:47:40 MongoRS8-1 mongod.10008[31876]: Tue Feb 25 20:47:40.430 [conn1993] received moveChunk request: { moveChunk: "current.citations", from: "rs8/mongors8-1.redacted.com:10008,mongors8-2.redacted.com:10008,mongors8-3.redacted.com:10008", to: "rs4/mongors4-1.redacted.com:10004,mongors4-2.redacted.com:10004,mongors4-3.redacted.com:10004", fromShard: "rs8", toShard: "rs4", min: { location_id: ObjectId('4f2703f0bc0f367032000000') }, max: { location_id: ObjectId('4f3533ffbc0f36d419000002') }, maxChunkSizeBytes: 67108864, shardId: "current.citations-location_id_ObjectId('4f2703f0bc0f367032000000')", configdb: "terranova.redacted.com:30000,giordano.redacted.com:30000,bastoni.redacted.com:30000", secondaryThrottle: true, waitForDelete: false } On both primaries I am seeing: warning: moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 39 In the last 6 hours, 3 chunks have managed to migrate. I have bounced mongos's, bounced mongod's. Stopped the balancer, removed the {_id: "balancer"} lock from the config db, bounced the mongos's, reenabled the balancer. Not seeing anything out of the ordinary on the receivers. Strange thing, is that 3 chunks have been able to migrate amidst this issue. The errors disappear, there is some chunk migration log messages, it finished, and then the errors start up again. |
| Comments |
| Comment by Eric Coutu [ 26/Feb/14 ] |
|
#4 is hidden and #5 is an arbiter. Not sure if this is the proper link for you, but https://mms.mongodb.com/host/list/4fe1eaeb87d1d86fa8bb6d7b Seems like the problem has resolved itself overnight. The 2 collections that were way out of balance have transferred their chunks to other shards. |
| Comment by Asya Kamsky [ 26/Feb/14 ] |
|
Eric, You mention 5 member replica sets but the shards are listed with three members - are the other two hidden? Arbiters? or something else? Is there replication lag in the "TO" shards at all? (If this cluster is in MMS and you can provide a link I could look this up myself) Asya |