[SERVER-20150] Chunk migration locks are constantly blocking map/reduce Created: 26/Aug/15 Updated: 11/Sep/15 Resolved: 11/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alex | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi. Please see We're seeing another problem with shard balancer. After upgrade our app failed to perform map/reduce on sharded collections with following error:
I've checked the status of the shard, and found that there a chunk migration is pending from Shard3 to Shard1, since there a imbalance in chunks distribution:
I've checked the db.opStatus on shard1 primary node and found out that migration process is blocked by secondary node, because it's doing the initial sync. We've decided to stop the initial sync to give primary node time to accept chunks from shard3. But after ~2 hours only 2 chunks are actually migrated and our collection was still locked. So we decided to stop the balancer, to allow our app to run again. Is this by design or something really went wrong during upgrade proccess? Because we haven't seen this issue before on 2.4 installation. |
| Comments |
| Comment by Alex [ 11/Sep/15 ] |
|
Hi Sam, 1) Are you sure this is the expected behaviour for the Shard Balancer to lock a collection for hours (2 hours in our case)? Otherwise, this means that regardless of what is written in the docs we simply can't use a sharded collection as map/reduce output since it will likely be locked by the balancer. |
| Comment by Sam Kleinman (Inactive) [ 11/Sep/15 ] |
|
Sorry for the delay in getting back to you. After discussing this with the teams that work on sharding, it looks like this is in fact the expected behavior: mapReduce require the distributed lock to prevent chunk migrations when running with sharded output to prevent chunk migrations from interfering with the output of the map reduce operation. I hope this makes sense, and sorry for any confusion. Regards, |
| Comment by Alex [ 26/Aug/15 ] |
|
Oopsie, task should be |