Implement/validate well defined behavior for multiple stages that provide scoreDetails in a $scoreFusion input pipeline

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Integration
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • 0

      There are two categories of cases where a $scoreFusion input pipeline could have multiple stages that all provide their own scoreDetails:

      1. Starts with a stage that provides scoreDetails ($search / $vectorSearch), then after has one or more $score stages
      2. The pipeline has multiple $score stages (2+), not starting with a stage that provides scoreDetails

       

      We could consider banning case 2 (arguably there should only ever need to be a single $score stage per input pipeline), but case 1 seems valid (For example, a user may want to run a $search or $vectorSearch in a sub-pipeline, but then modify the order of that input pipeline with their own custom expression).

       

      Regardless, we need to define and implement/validate the behavior of these cases where more than one stage provides scoreDetails in the same input pipeline, and $scoreFusion is requested to produce scoreDetails.

       

      There are a couple of approaches here:

      1. Only the last stage that provides scoreDetails has its details propagated to $scoreFusion
        1. This is likely the current behavior as we read the input pipeline's scoreDetails metadata, which should be overwritten by the last stage. This may also be our only option without further changes outside of $scoreFusion.
      2. Somehow try to incorporate all stages scoreDetails for the input pipeline (likely in an array)

       

      Unless (2) is easy to implement, we decided that were content with (1) at least in the initial release. There may be no code changes to get to the behavior of case (1), we just want to make sure we understand exactly what the behavior is, and have test cases committed.

       

      The same cases should be considered for $rankFusion.

       

      Testing:

      Regardless of the behavior and approach we take, we should have tests (either asserting on query results, or asserting the query uasserts) for both case (1) and (2) defined at the top of the ticket. So include tests for queries like:

      $rank/scoreFusion: {
        pipelines: {
          p1: {$search/$vectorSearch, ..., $score, ...},
          p2: ...
        }
      } 
      $rank/scoreFusion: {
        pipelines: {
          p1: {$search, ..., $score, ..., $score, ...},
          p2: ...
        }
      }
      $rank/scoreFusion: {
        pipelines: {
          p1: {..., $score, ..., $score, ...},
          p2: ...
        }
        normalization: "none"
      }

            Assignee:
            Mariano Shaar
            Reporter:
            Joe Shalabi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: