Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-99562

Don't spill the last batch in sorter-based spilling

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      When sort or classic's group spill to disk, they will also spill to disk the last batch when they are done:

      https://github.com/mongodb/mongo/blob/a3b93f233e6d6742b783a92c64697dd08e7a80ba/src/mongo/db/pipeline/group_processor.cpp#L167

      https://github.com/mongodb/mongo/blob/a3b93f233e6d6742b783a92c64697dd08e7a80ba/src/mongo/db/sorter/sorter.cpp#L850

      And in the PR for TEXT_OR stage.

       

      This makes reading the results pretty straight-forward, but we pay the price of spilling an extra chunk, which can pretty big and even if it is small, we pay a lot of conversion overheads.

      However sort merge iterator accepts a span of abstract iterators:

      https://github.com/mongodb/mongo/blob/a3b93f233e6d6742b783a92c64697dd08e7a80ba/src/mongo/db/sorter/sorter.cpp#L450

      So it should work perfectly well if we pass a bunch of file iterators as well as an in-memory iterators for the last batch that was not spilled to disk.

      It may complicate the code a little bit, but should come with some performance improvements. Especially for the cases where the stage spilled only once or twice.

            Assignee:
            Unassigned Unassigned
            Reporter:
            ivan.fefer@mongodb.com Ivan Fefer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: