Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-11102

Docs for SERVER-24981: $project-$limit optimization has bad repercussion on pipeline splitting

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.7.1
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Documentation Request Summary:

      Description

      $sort now searches the entire pipeline for a $limit and if found, coalesces the $limit into itself. If there is a stage in between the $sort and $limit that changes the number of documents in the pipeline (i.e. $group, $unwind, etc.), the $sort aborts its search for a $limit. An exception to this rule is the case where one or multiple $skip stages are in between a $sort and $limit. In this case, $sort will still coalesce the $limit, but the $limit value increases by the total of the amounts of all of the $skip stages in between. This means that neither $project or $skip swap with $limit anymore if $sort is not present.

      Scope of changes (files that need work and how much)

      • /core/aggregation-pipeline-optimization
        • Pipeline Optimization section: $skip + $limit and $project + $skip + $limit sequence optimization NO LONGER REORDER
        • Pipeline Coalescence section: rewrite $sort + $limit coalescence
        • Update $sort + $skip + $limit example to NO LONGER REORDER
        • Update $limit + $skip + $limit + $skip example to NO LONGER REORDER
        • (meh) Add $sort + $unwind + $limit example
      • /reference/operator/aggregation/sort
        • Clarify behaviour change, specifically in the $sort optimization + memory section
      • /reference/operator/aggregation/limit
        • Update note at the bottom of the page

      Resources (e.g. Scope Docs, Invision)

      My flowchart (attached)

      Engineering Ticket Description:

      The new $project-$limit optimization in 3.2 might make the pipeline to be split much earlier than before (because it will split the pipeline at the limit step).

      I'm attaching two explain plan of queries, one which uses the optimization and one that doesn't because I added a $redact: $$KEEP just before the $limit.
      In the case of this query much more fields are sent to the mergerPart because of the splitting and is triggering a very bad behavior with second batches of aggregation queries which will be described in another ticket.

      I think it would be good to take into consideration pipeline splitting when doing those optimization (in addition there is no $sort stage which would benefit from having the $limit moved up)

      Cheers,
      Antoine

            Assignee:
            allison.moore@mongodb.com Allison Reinheimer Moore
            Reporter:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              5 years, 40 weeks, 1 day ago