Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-102490

SBE: support filter in multikey ixscan

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      An INEXACT_COVERED filter can theoretically be pushed down into a multikey index scan, but the planner does not allow this right now because it would result in incorrect results if the SBE engine is used. The reason is that when an IXSCAN stage is translated to an SBE plan, the `unique_roaring` stage runs before the `filter stage`.

      ```

            queryPlan: {
              stage: 'FETCH',
              planNodeId: 2,
              inputStage: {
                stage: 'IXSCAN',
                planNodeId: 1,
                filter: { amenities:

      { '$regex': 'friendly' }

      },
                keyPattern: { amenities: 1 },
                indexName: 'amenities_1',
                isMultiKey: true,
                multiKeyPaths: { amenities: [ 'amenities' ] },
                isUnique: false,
                isSparse: false,
                isPartial: false,
                indexVersion: 2,
                direction: 'forward',
                indexBounds: { amenities: [ '["", {})', '[/friendly/, /friendly/]' ] }
              }
            },

            slotBasedPlan: {
              slots: '$$RESULT=s7 env: { s6 =

      {"amenities" : 1}

      }',
              stages: '[2] nlj inner [] [s1, s5, s3, s4, s6] \n' +
                '    left \n' +
                '        [1] filter {(((s2 == /friendly/) ?: false) || (regexMatch(PcreRegex(/friendly/), s2) ?: false))} \n' +

                '        [1] unique_roaring s1 \n' +
                `        [1] ixscan_generic IndexBounds("field #0['amenities']: [CollationKey(0x), {}), [/friendly/, /friendly/]") s4 s1 s5 s3 [s2 = 0] @"12af590b-fe5c-4603-b656-7c965ec732df" @"amenities_1" true \n` +
                '    right \n' +
                '        [2] limit 1ll \n' +
                '        [2] seek s1 s7 s8 s5 s3 s4 s6 none none [] @"12af590b-fe5c-4603-b656-7c965ec732df" true false \n'
            }

      ```

      (The plan below requires changing the planner to reproduce. See https://github.com/10gen/mongo/pull/31342)

      See https://jira.mongodb.org/browse/SERVER-29967?focusedCommentId=1614462&focusedId=1614462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1614462 for more details.

      While we can get correct results by swapping the order of the `filter` and `unique_roaring`, the performance is not great because the filter will run on every key even if the corresponding record is already being returned. 

      Locust performance results here: https://docs.google.com/document/d/1ddEDe6g_u2dk4w6LhuD07wCXVFrfBmlKGUtGzsZuWb4/edit?tab=t.0

      Alternatively, this ticket can be closed if there was a way for plans involving multikey index scans with filters to always use the classic engine.

            Assignee:
            Unassigned Unassigned
            Reporter:
            andi.wang@mongodb.com Andi Wang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None