[SERVER-24978] Second batches in aggregation framework are asked synchronously Created: 11/Jul/16 Updated: 20/Apr/18 Resolved: 06/Mar/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 3.7.3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Antoine Hom | Assignee: | Charlie Swanson |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Sprint: | Query 2018-01-29, Query 2018-02-12, Query 2018-02-26, Query 2018-03-12 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||
| Description |
|
In the case of an aggregation framework query the first batch of results asked from the mergerPart to the shards are done asynchronously, however when it needs to get a second batch to fullfill the query it is virtually mono-threaded as the call to getNext is synchronous. I think it would be good to be able to overcome this limitation and make all subsequent fetches to be done in the background. You can see here (https://github.com/ahom/jupyter-notebooks/blob/master/mongo_cs32044/notebook.ipynb) a representation of this behavior. On the Y axis are the shards, on the X axis is the time. Starting from second batch seems is completely synchronous. Cheers, |
| Comments |
| Comment by Charlie Swanson [ 06/Mar/18 ] |
|
The recently committed fix should mitigate the described problem. Before, the $mergeCursors stage would getMore each cursor, one by one, waiting for a response from one cursor before iterating the next. Now each getMore will be scheduled independently through an asynchronous interface, which should improve the performance here. |
| Comment by Githook User [ 06/Mar/18 ] |
|
Author: {'email': 'charlie.swanson@mongodb.com', 'name': 'Charlie Swanson', 'username': 'cswanson310'}Message: |
| Comment by Charlie Swanson [ 05/Mar/18 ] |
|
A patch to fix this has made it through code review, but exposed an issue tracked in |
| Comment by David Storch [ 13/Dec/17 ] |
|
charlie.swanson pointed out that it would be easier to replace DocumentSourceMergeCursors with the AsyncResultsMerger once we have implemented |
| Comment by Charlie Swanson [ 12/Dec/17 ] |
|
I think one good way to do this would be to replace the merging machinery within DocumentSourceMergeCursors to utilize the AsyncResultsMerger utility which is used on mongos. This should resolve this issue since the AsyncResultsMerger schedules the getMores independently of waiting for the responses, so can schedule multiple getMores at a time. |
| Comment by Charlie Swanson [ 11/Jul/16 ] |
|
david.storch, we had some discussion earlier about the strategy of issuing the getMore requests. It looks like the cursor merging initializes with the 'better' strategy of issuing all requests, then waiting for all responses: But that it's not smart at all about issuing getMores: It looks like this will just block whenever you happen to exhaust a batch, which probably is exactly the same time for each cursor. I'm not sure how easy this will be to fix though, since I'm not comfortable with relying on all cursors having the same batch size. In particular, I think it should be fine if there was a batchSize specified to the aggregation, but I'm worried about doing such an optimization for a default batch size. In particular, if we ever change the default batch size, then things could go awry in a mixed-version cluster. |