[SERVER-7765] Draining a shard stalled due to writebacksQueued stalled Created: 26/Nov/12 Updated: 08/Mar/13 Resolved: 19/Feb/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.0.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Justin Patrin | Assignee: | Barrie Segal |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Linux |
| Participants: |
| Description |
|
I started draining the last of four shards in a live sharded mongo cluster (v2.0.7), with each shard being a 3-node replset, and it went fine until it got to 16 chunks remaining. Now the draining has been stuck there for more than four hours. mongos> db.runCommand( {removeShard:"mongo-live-d"}) , The mongos log shows this: Wed Nov 21 22:10:26 [Balancer] distributed lock 'balancer/mongo-live-a-1:27017:1350073653:1804289383' acquired, ts : 50adc1d2538fcedc6aa3cf93 When I check writebacksQueued the total ops never goes down but is increasing over time: PRIMARY> db.adminCommand("writeBacksQueued") , , , , }, The "totalOpsQueued" and various "n" values keep going up. I don't see anything interesting in the troublesome shard's mongod log. I'd try restarting everything but I'm worried that this queued data would be lost. |
| Comments |
| Comment by Barrie Segal [ 08/Feb/13 ] | |
|
Justin, Just checking in-- what was the outcome of draining the shard? Barrie | |
| Comment by Justin Patrin [ 30/Nov/12 ] | |
|
Well, we tried restarting the mongod processes on the secondaries of the misbehaving shard/repl. No change. We stepped down the primary. No change. Stopping the former primary once it was a secondary, however, took quite a while (~5 minutes). Once that was down we started it back up and the cluster is draining the shard like I originally had it doing again. I hope we haven't lost any data.... | |
| Comment by Justin Patrin [ 27/Nov/12 ] | |
|
All of the mongos logs have the same Balancer entries as listed above, nothing different. Looking at the mongod logs I'm seeing a fair number of entries like this:
| |
| Comment by Eliot Horowitz (Inactive) [ 27/Nov/12 ] | |
|
Can you check the mongos logs? |