[SERVER-24978] Second batches in aggregation framework are asked synchronously Created: 11/Jul/16  Updated: 20/Apr/18  Resolved: 06/Mar/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 3.7.3

Type: Improvement Priority: Major - P3
Reporter: Antoine Hom Assignee: Charlie Swanson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File getmore.png    
Issue Links:
Depends
depends on SERVER-33660 Once getMores include lsid, sharded a... Closed
depends on SERVER-32307 Make AsyncResultsMerger kill sequence... Closed
is depended on by SERVER-33280 Add a test to ensure that an error on... Closed
is depended on by SERVER-33323 Refactor $mergeCursors stage to allow... Closed
Duplicate
is duplicated by SERVER-32014 scanAndOrder serverStatus metric shou... Closed
Related
related to SERVER-24981 $project-$limit optimization has bad ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2018-01-29, Query 2018-02-12, Query 2018-02-26, Query 2018-03-12
Participants:
Case:
Linked BF Score: 0

 Description   

In the case of an aggregation framework query the first batch of results asked from the mergerPart to the shards are done asynchronously, however when it needs to get a second batch to fullfill the query it is virtually mono-threaded as the call to getNext is synchronous.

I think it would be good to be able to overcome this limitation and make all subsequent fetches to be done in the background.

You can see here (https://github.com/ahom/jupyter-notebooks/blob/master/mongo_cs32044/notebook.ipynb) a representation of this behavior.

On the Y axis are the shards, on the X axis is the time. Starting from second batch seems is completely synchronous.

Cheers,
Antoine



 Comments   
Comment by Charlie Swanson [ 06/Mar/18 ]

The recently committed fix should mitigate the described problem. Before, the $mergeCursors stage would getMore each cursor, one by one, waiting for a response from one cursor before iterating the next. Now each getMore will be scheduled independently through an asynchronous interface, which should improve the performance here.

Comment by Githook User [ 06/Mar/18 ]

Author:

{'email': 'charlie.swanson@mongodb.com', 'name': 'Charlie Swanson', 'username': 'cswanson310'}

Message: SERVER-24978 Use AsyncResultsMerger in $mergeCursors
Branch: master
https://github.com/mongodb/mongo/commit/a7106b407cecdcfa8ba6c8765c9874bce65a6d5a

Comment by Charlie Swanson [ 05/Mar/18 ]

A patch to fix this has made it through code review, but exposed an issue tracked in SERVER-33660.

Comment by David Storch [ 13/Dec/17 ]

charlie.swanson pointed out that it would be easier to replace DocumentSourceMergeCursors with the AsyncResultsMerger once we have implemented SERVER-32307. Therefore, I'm adding a "depends on" link and unscheduling this ticket. We should start on it as soon as SERVER-32307 is completed.

Comment by Charlie Swanson [ 12/Dec/17 ]

I think one good way to do this would be to replace the merging machinery within DocumentSourceMergeCursors to utilize the AsyncResultsMerger utility which is used on mongos. This should resolve this issue since the AsyncResultsMerger schedules the getMores independently of waiting for the responses, so can schedule multiple getMores at a time.

Comment by Charlie Swanson [ 11/Jul/16 ]

david.storch, we had some discussion earlier about the strategy of issuing the getMore requests. It looks like the cursor merging initializes with the 'better' strategy of issuing all requests, then waiting for all responses:
https://github.com/mongodb/mongo/blob/r3.3.9/src/mongo/db/pipeline/document_source_merge_cursors.cpp#L112-L134

But that it's not smart at all about issuing getMores:
https://github.com/mongodb/mongo/blob/r3.3.9/src/mongo/db/pipeline/document_source_merge_cursors.cpp#L136-L169

It looks like this will just block whenever you happen to exhaust a batch, which probably is exactly the same time for each cursor.

I'm not sure how easy this will be to fix though, since I'm not comfortable with relying on all cursors having the same batch size. In particular, I think it should be fine if there was a batchSize specified to the aggregation, but I'm worried about doing such an optimization for a default batch size. In particular, if we ever change the default batch size, then things could go awry in a mixed-version cluster.

Generated at Thu Feb 08 04:07:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.