Reorder HJ and DILJ lookup strategies in SBE

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • Fully Compatible
    • QE 2025-07-21, QE 2025-08-04
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In SERVER-81555, we introduced a new lookup strategy called Dynamic Index Loop Join (DILJ). This strategy is used when there is an index with incompatible collation and decides whether the index can be used at runtime after checking the data type of the field. The index can be used when the data type is collation independent (i.e. numeric). If the index cannot be used, the algorithm falls back to Nested Loop Join (NLJ) and performs a scan on the foreign collection.

      DILJ was designed to improve performance when there is an index with incompatible collation on a join key that is collation independent. Previously, the classic engine could leverage such an index while SBE could not, causing a performance drop when migrating to SBE. However, SBE's Hash Join (HJ) strategy, which dynamically creates a Hash Index, is significantly faster than NLJ. Prioritizing DILJ over HJ leads to performance regressions for SBE users when the join field is not collation-independent because the join strategy changes from HJ to NLJ.

      I suggest reordering the lookup strategies to place HJ before DILJ. This change would prevent regressions for existing SBE users. For users transitioning from the classic engine to SBE, if an index with an incompatible collation exists on a collation-independent join field, HJ would be the preferred strategy over DILJ. Even though the index will not be used, results show that the performance of HJ in SBE is better than the performance of INLJ in classic so users switching from classic to SBE will not be affected and the performance gap between HJ and DILJ (when the index is unusable) is substantial so users of SBE will be affected if the order is not updated.

      For testing, I used a local collection with 100K docs and I varied the number of docs in the foreign collection. The duration is measured in milliseconds. The runtimes were measured locally using a standalone mongod.

      Join key is int . The index can be used.

      For example, in a foreign collection with 7500 docs, in classic the index would be used and the runtime will be 2416 msecs. In SBE, if HJ can be used the runtime will be 429.3 msecs which is better than the runtime in classic. If HJ cannot be used the strategy will be DILJ and the runtime will be 345.4 msecs.

      #docs DILJ HJ CLASSIC NLJ
      1500 266.6 158.9 2297 32050
      3500 293 247.1 2328 74420
      5500 319.8 337 2370 117200
      7500 345.4 429.3 2416 159800
      9500 377.7 496.3 2453 202600

      Join key is str. The index cannot be used.

      #docs DILJ HJ CLASSIC NLJ
      1500 50720 223.5 48500 49210
      3500 118100 356.6 110100 114600
      5500 185400 488.3 171700 180300
      7500 253000 625 233200 245300
      9500 300100 734.2 294800 300100

              Assignee:
              Foteini Alvanaki
              Reporter:
              Foteini Alvanaki
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: