[SERVER-21784] Track 'nReturned' and 'executionTimeMillis' execution stats for each aggregation pipeline stage and expose via explain Created: 07/Dec/15 Updated: 07/Sep/20 Resolved: 02/Dec/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Kalan | Assignee: | Mihai Andrei |
| Resolution: | Done | Votes: | 25 |
| Labels: | former-quick-wins, qopt-team, storch, usability | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||
| Sprint: | Query 2019-12-02, Query 2019-12-16 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
For the same reason explain exists for .find(), it would be helpful to know how long an aggregation pipeline takes to run and within each stage. In many ways it is even more important with .aggregate() because of potentially multiple pipeline expressions for the same result. |
| Comments |
| Comment by Githook User [ 02/Dec/19 ] |
|
Author: {'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com'}Message: |
| Comment by Asya Kamsky [ 18/Dec/18 ] |
|
mattmessmer sorry about delay responding - explain(true) on aggregation in the current version should show you how many documents are being sent through the pipeline already. Without knowing exact aggregation you are trying to test I can't be 100% certain though. |
| Comment by Matt Messmer [ 17/Sep/18 ] |
|
I'm also hoping for this to be implemented. My current use case is comparing a distinct query to an aggregate query with a `$limit`. I'm hoping the aggregate has to iterate over fewer keys/documents, but I can't prove this without this information. Thanks! |
| Comment by Asya Kamsky [ 30/Jul/18 ] |
|
I gave a talk about this last year - probably the most detailed version of it was in Europe Nov '17 and Sydney March '18, I couldn't find a recording but the slides and all the code I used are in a github repo, here's one of the decks: https://github.com/asya999/mdbw17/blob/master/Sydney%20MDB.local.pdf |
| Comment by Chris Lusby [ 30/Jul/18 ] |
|
That's really useful information thanks Asya – is this documented in the online docs somewhere, along with other optimisations that we should be aware of? |
| Comment by Asya Kamsky [ 30/Jul/18 ] |
|
chris.lusby@macquarie.com We definitely would like to expose more details about the full pipeline in explain, however, I want to point out some misconceptions in your "for instance" bullet points - adding $project to reduce size of pipeline is not necessary as the pipeline itself will optimize the documents flowing through it to only the fields it will need in later stages. $unwind also does not contribute to the 100MBs limit. The only stages that can exceed the 100MBs limit are "blocking" stages ($sort that's not supported by an index, and $group, as well as $graphLookup which is special), all other stages are processed in a streaming fashion and cannot exceed that limit. Asya |
| Comment by Chris Lusby [ 20/Jul/18 ] |
|
This would be extremely useful information. Given constraints within the aggregation pipeline (eg 100MB limits), certain operations might really impact the performance of the aggregation. In order to tune this, we need information such as:
For instance:
Do we have an ETA for when this can begin development? |
| Comment by David Storch [ 11/May/17 ] |
|
Note that this has been partially implemented in |