Use sparse indexes in SBE lookup

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Execution
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In MongoDB, sparse indexes omit documents where the indexed field is missing. This characteristic creates an issue when using them in SBE $lookup queries, because $lookup is designed to match documents where the field is missing with those where the field's value is null.

      Consider two collections, a and b, with documents:

      • Collection a: [ { _id: 1, a: 'apple' }, { _id: 2 }, { _id: 3, a: null } ]
      • Collection b: [ { _id: 10, a: 'apple', details: 'red fruit' }, { _id: 11, a: null, details: 'unknown value' }, { _id: 12, details: \"missing 'a' field entirely\" } ]

      The $lookup query is:

      db.a.aggregate([{
          $lookup:{
              from: "b",
              localField: "a",
              foreignField: "a",
              as: "matchedDocs"
          }
      }])

      Expected Result:

      [
         { _id: 1, a: 'apple', 
           matchedDocs: [ { _id: 10, a: 'apple', details: 'red fruit' }] },
         { _id: 2, 
           matchedDocs: [ 
              { _id: 11, a: null, details: 'unknown value' }, 
              { _id: 12, details: \"missing 'a' field entirely\" }
         ]},
         { _id: 3, a: null, 
           matchedDocs: [
              { _id: 11, a: null, details: 'unknown value' },
              { _id: 12, details: \"missing 'a' field entirely\" }
         ]}
      ]
      

       

      Note that the documents from a with a missing field (_id: 2) or a null field (_id: 3) correctly match both the null document (_id: 11) and the missing field document (_id: 12) in b.

      If collection b has a sparse index on field a and the SBE engine uses it, the result is incorrect:

      [
         { _id: 1, a: 'apple', 
           matchedDocs: [
              { _id: 10, a: 'apple', details: 'red fruit' }
         ]},
         { _id: 2,
           matchedDocs: [ 
              { _id: 11, a: null, details: 'unknown value' }
         ]},
         { _id: 3, a: null, 
           matchedDocs: [ 
              { _id: 11, a: null, details: 'unknown value' }
         ]}
      ]
      

       

      The document with _id: 12 from collection b is missing from the results. This is because the sparse index does not contain this document, which is problematic since the expected behaviour of $lookup is to match missing fields.

       

      Proposed Solution for SBE:

      The Classic engine avoids this issue because it re-plans the lookup for each local field value. The inability of SBE to correctly handle this scenario with sparse indexes represents a regression for MongoDB users (SERVER-64082).

      To resolve this and allow SBE to use sparse indexes, we should adopt a strategy similar to how SBE handles index collation incompatibility (SERVER-81555):

      1. If the local field exists and its value is NOT null: Use the sparse index.
      2. If the local field value IS null or null-ish (i.e., the field is missing): Perform a collection scan, as the sparse index cannot guarantee completeness for these values.

            Assignee:
            Unassigned
            Reporter:
            Foteini Alvanaki
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: