[SERVER-21446] Mergecursors performance issue when handling targeted query results Created: 13/Nov/15 Updated: 03/Dec/15 Resolved: 03/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.6.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stephen Dalby | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | 1. Launch a sharded cluster with 2 shards:
2. Insert the data in the attached file:
3. Shard the colleciton and set a split point to force all of the inserted data to the secondary shard.
4. sh.status should show there are 2 chunks and all chunks with '_id.id: 1' are not on the primary shard.
5. Log into each of the mongoDB shards, and enable the profiler.
6. Run the following aggregation query.
7. On each shard look at the generated profile data. It should look similar to the following. Secondary shard:
Primary shard:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: |
| Description |
|
An issue has been found where queries which were targeted to a single shard suffered significant performance degradation relative to running the same query against an unsharded collection. The issue was tracked down to an unnecessary mergecursors operation being performed on the primary shard ( In order to assist engineering with debugging this issue, I have identified a series of steps than can be followed in order to reliable reproduce this issue on MongoDB 2.6.10 (see 'Steps to reproduce'). The |
| Comments |
| Comment by Charlie Swanson [ 13/Nov/15 ] |
|
I don't think the extra latency is coming just from the extra merge cursors stage. The extra latency comes from what is implied by having a merge cursors stage, which is that the mongos has to coordinate each shard getting its data back to the merging shard. This process incurs a lot of overhead compared to an aggregation running on one shard. For your example, the data of interest is all on shard2. Before
After
As you can see, there are a couple additional trips over the network in the process before Is that a satisfactory explanation? |