[DOCS-11102] Docs for SERVER-24981: $project-$limit optimization has bad repercussion on pipeline splitting Created: 08/Dec/17  Updated: 29/Oct/23  Resolved: 24/Jul/18

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: 3.7.1

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Allison Reinheimer Moore
Resolution: Fixed Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File IMG_0531.JPG    
Issue Links:
Documented
documents SERVER-24981 $project-$limit optimization has bad ... Closed
Participants:
Days since reply: 5 years, 29 weeks, 1 day ago
Epic Link: DOCS: 4.0 Server

 Description   

Documentation Request Summary:

Description

$sort now searches the entire pipeline for a $limit and if found, coalesces the $limit into itself. If there is a stage in between the $sort and $limit that changes the number of documents in the pipeline (i.e. $group, $unwind, etc.), the $sort aborts its search for a $limit. An exception to this rule is the case where one or multiple $skip stages are in between a $sort and $limit. In this case, $sort will still coalesce the $limit, but the $limit value increases by the total of the amounts of all of the $skip stages in between. This means that neither $project or $skip swap with $limit anymore if $sort is not present.

Scope of changes (files that need work and how much)

  • /core/aggregation-pipeline-optimization
    • Pipeline Optimization section: $skip + $limit and $project + $skip + $limit sequence optimization NO LONGER REORDER
    • Pipeline Coalescence section: rewrite $sort + $limit coalescence
    • Update $sort + $skip + $limit example to NO LONGER REORDER
    • Update $limit + $skip + $limit + $skip example to NO LONGER REORDER
    • (meh) Add $sort + $unwind + $limit example
  • /reference/operator/aggregation/sort
    • Clarify behaviour change, specifically in the $sort optimization + memory section
  • /reference/operator/aggregation/limit
    • Update note at the bottom of the page

Resources (e.g. Scope Docs, Invision)

My flowchart (attached)

Engineering Ticket Description:

The new $project-$limit optimization in 3.2 might make the pipeline to be split much earlier than before (because it will split the pipeline at the limit step).

I'm attaching two explain plan of queries, one which uses the optimization and one that doesn't because I added a $redact: $$KEEP just before the $limit.
In the case of this query much more fields are sent to the mergerPart because of the splitting and is triggering a very bad behavior with second batches of aggregation queries which will be described in another ticket.

I think it would be good to take into consideration pipeline splitting when doing those optimization (in addition there is no $sort stage which would benefit from having the $limit moved up)

Cheers,
Antoine



 Comments   
Comment by Githook User [ 24/Jul/18 ]

Author:

{'username': 'schmalliso', 'name': 'Allison Reinheimer Moore', 'email': 'allison.moore@10gen.com'}

Message: DOCS-11102: clarify sort-limit coalescence behavior
Branch: master
https://github.com/mongodb/docs/commit/48b2aa352ff4cf4af3bcbaafd24a491606fbe15b

Generated at Thu Feb 08 08:02:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.