Modify SplitPipeline code to accommodate special case of [$pluginVector, $_internalSearchIdLookup]

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The requirements we have for $vectorSearch are:

      1. We must be able to defer $vectorSearch’s split point beyond its subsequent  $_internalSearchIdLookup stage.
      2. Results must be sorted by $vectorSearchScore. This is already guaranteed by mongot, but in a sharded cluster, we must perform a streaming merge sort on the sharded results. This is achieved by specifying a sort pattern on $vectorSearch’s DPL.

      We also know the following:

      • $vectorSearch is guaranteed to be a Source stage (i.e a stage that always occurs at the beginning of a pipeline and produces a document stream). 
      • After desugaring, $vectorSearch is desugars into:
      •  [ $pluginVectorSearch, $_internalSearchIdLookup, $limit].
      • $pluginVectorSeach is guaranteed to be a Source stage
      • $_internalSearchIdLookup occurs immediately after $pluginVectorSearch.

       

      When determining the split point, we propose the following:

      1. Inspect the pipeline’s first stage, determine if it is a Source stage.
      2. If the first stage in the pipeline is a Source stage, check if the next stage is an $_internalSearchIdLookup stage.
      3. If Source stage has DistributedPlanLogic with a sort pattern, keep the sort pattern for performing the merge at the split point. 
      4. If next stage is $_internalSearchIdLookup, perform split point at $_internalSearchIdLookup.
      5. If next stage is not $_internalSearchIdLookup, bail out of the special case handling and perform split using existing logic.

       

      TLDR:
      Add a special case to SplitPipeline::findSplitPoint() which detects if we have a Source stage followed by an $_internalSearchIdLookup stage, which should result in the split point happening at the $_internalSearchIdLookup stage.

      As part of this ticket, add unit tests (if applicable) and integration tests to ensure we split at the point at which we'd expect. 

            Assignee:
            Unassigned
            Reporter:
            Santiago Roche
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: