Consider pushing limit into AsyncResultsMerger and trimming sort buffers

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Consider a pipeline:

      [..., {sort: {_id: 1}}, {$limit: 1000}] 

      Current it will produce the following merge part: 

        mergeType: 'router',
        splitPipeline: {
          shardsPart: [
            ...,
            { '$sort': { sortKey: { _id: 1 }, limit: Long('1000') } }
          ],
          mergerPart: [
            {
              '$mergeCursors': {
                sort: { _id: 1 },
                ...
              }
            },
            { '$limit': Long('1000') }
          ]
        }, 

      As you can see, we don't leverage limit in $mergeCursors.

      $mergeCursors (AsyncResultsMerge under the hood) will buffer up to 16 MB from each shard and perform a merge sort.

      If the amount of shards is large enough, it can consume a significant amount of RAM on a router.

      However, if we know the limit, we can detect documents that are already guaranteed not to fit into a limit, so we can drop them (and even close corresponding remotes) saving up some RAM.

            Assignee:
            Unassigned
            Reporter:
            Ivan Fefer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: