[DOCS-15221] Indicate that ONLY specific aggregation stages an utilize indexes Created: 08/Apr/22 Updated: 30/Oct/23 Due: 02/Sep/22 Resolved: 20/Oct/22 |
|
| Status: | Closed |
| Project: | Documentation |
| Component/s: | manual, Server |
| Affects Version/s: | None |
| Fix Version/s: | Server_Docs_20231030 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alex Bevilacqua | Assignee: | Nick Villahermosa |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: | |
| Days since reply: | 1 year, 8 weeks, 2 days ago |
| Epic Link: | DOCSP-11701 |
| Description |
|
In https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/#std-label-aggregation-pipeline-optimization-indexes-and-filters the instructions are not explicit, and the resulting ambiguity can confuse users who are unsure as to why their pipeline didn't use an index.
This should be reworded as:
|
| Comments |
| Comment by Githook User [ 12/Dec/22 ] |
|
Author: {'name': 'Nick Villahermosa', 'email': 'nick.villahermosa@mongodb.com', 'username': 'nvillahermosa-mdb'}Message: |
| Comment by Githook User [ 18/Oct/22 ] |
|
Author: {'name': 'Nick Villahermosa', 'email': 'nick.villahermosa@mongodb.com', 'username': 'nvillahermosa-mdb'}Message: |
| Comment by Githook User [ 18/Oct/22 ] |
|
Author: {'name': 'Nick Villahermosa', 'email': 'nick.villahermosa@mongodb.com', 'username': 'nvillahermosa-mdb'}Message: |
| Comment by Githook User [ 18/Oct/22 ] |
|
Author: {'name': 'Nick Villahermosa', 'email': 'nick.villahermosa@mongodb.com', 'username': 'nvillahermosa-mdb'}Message: |
| Comment by Githook User [ 11/Oct/22 ] |
|
Author: {'name': 'Nick Villahermosa', 'email': 'nick.villahermosa@mongodb.com', 'username': 'nvillahermosa-mdb'}Message: |
| Comment by Nick Villahermosa [ 08/Sep/22 ] |
|
Thanks christopher.harris@mongodb.com and alex.bevilacqua@mongodb.com for diving into this. I'll incorporate the suggested changes and submit another PR once that's done. |
| Comment by Chris Harris [ 08/Sep/22 ] |
|
To clear up the "not only the first stage can benefit" piece, perhaps wording along these lines?
And then to help address the fact that other stages later in the pipeline can also use indexes, perhaps append some additional text afterwards along the lines of:
|
| Comment by Alex Bevilacqua [ 08/Sep/22 ] |
|
christopher.harris@mongodb.com, how would you recommend adjusting the documentation? The information shared is extremely useful, however based on the nuances you've described it might be best if there were something actionable that could be applied to the current proposed changes in https://github.com/10gen/docs-mongodb-internal/pull/1774/files |
| Comment by Chris Harris [ 07/Sep/22 ] |
|
Hmm, this section of the documentation might be particularly tricky to get right. To attempt to help clear up some of the initial confusion here - the clarification that index eligibility is predicated on the first stage of the pipeline is not a claim that using an index can only benefit the first stage of the pipeline. So the suggested text in the description of "Only the first pipeline stage can benefit from indexes" is not correct. A simple example of this could be a pipeline that consists of [ $match, $sort, $group ]. If an appropriate index is present, then it could be used to do all of the following:
So generally speaking, index eligibility is determined by the first stage of the pipeline but usage of the index can benefit subsequent stages. What we're probably trying to clarify here is that a user can't, for example, perform some transformation of their data in the first stages of the pipeline and then expect the database to be able to use an index to service a $match or $sort that is found at the end of their pipeline. There's a bunch of nuance here though when it comes to "first" and when it comes to explicitly listing the stages. The two things that come to mind are:
|
| Comment by Nick Villahermosa [ 07/Sep/22 ] |
|
The second statement is about $group using an index for optimization. If the $sort comes first, doesn't that invalidate "$group can potentially use an index..."? Regarding reordering, I don't see $group mentioned in the linked article, and it looks like $sort is only moved in relation to $match, not $group. You'd still have:
Just trying to get this right. |
| Comment by Alex Bevilacqua [ 07/Sep/22 ] |
I think the current guidance is misleading. The initial $sort CAN utilize an index if it is the FIRST stage of the pipeline, which this ticket is trying to clarify. I believe part of the current documentation might stem from the pipeline optimizations that can reorder stages under certain circumstances.
The sentence reads "if $group is preceded by $sort", so in this case the $sort would still come first christopher.harris@mongodb.com can you give https://github.com/10gen/docs-mongodb-internal/pull/1774/files a quick look to make sure the appropriate messaging is being provided via this ticket? |
| Comment by Nick Villahermosa [ 07/Sep/22 ] |
|
alex.bevilacqua@mongodb.com, need a quick tech review: "only the first pipeline stage" is at odds with the documented behavior for the $group and $sort stages. Is that documentation incorrect? I haven't yet gone through enough of the tech training to rely on my own testing for this. > $sort can use an index if $sort is not preceded by a $project, $unwind, or $group stage. (Indicates it can be preceded by another type of stage, and wouldn't be first) > $group can potentially use an index to find the first document in each group if: $group is preceded by $sort that sorts the field to group by, and... (specifically requires that it isn't the first stage) |