Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-74293

Improve spilling algorithm for HashLookupStage

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Query Execution
    • None
    • Query Execution
    • 0

      It looks to me like HashLookupStage suffers from a similar performance problem as was corrected for HashAggStage in related ticket SERVER-70395. That is, the data that is spilled either to HashLookupStage::_recordStoreHt and HashLookupStage::_recordStoreBuf may be repeatedly deserialized when consuming the input from the outer child.

      My thinking after discussing with anna.wawrzyniak@mongodb.com is that we should probably choose to do nothing about this problem right now, and instead just close this ticket as "Won't Do". I'm filing the ticket so that we have this conversation and make an explicit decision about it as a team. The primary reason to avoid scheduling this improvement is that we currently only use HashLookupStage to support the pushdown of $lookup to SBE. There are heuristics in place so that we only choose a hash-based plan for $lookup when the data size is sufficiently small. This makes spilling possible but extremely unlikely. For this reason, we probably don't want to invest lots of engineering effort into spilling performance for hash-based $lookup plans in SBE right now. Furthermore, the regular HashJoinStage in SBE does not support spilling yet. It would be wise to design and implement a good spilling algorithm for regular hash join before we try to implement something similar for the special case of HashLookupStage.

            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            david.storch@mongodb.com David Storch
            0 Vote for this issue
            2 Start watching this issue