Make $scoreFusion support $vectorSearch-like extension stages

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      SERVER-128285 added featureFlagExtensionsInsideHybridSearch and shipped an initial batch of tests in jstests/extensions/extension_in_hybrid_search.js, jstests/noPassthrough/query/hybrid_search_on_sharded_view.js, and jstests/noPassthrough/query/hybrid_search_in_unionwith_on_sharded_view.js. Those tests cover the happy path and basic rejection cases for extensions inside \$rankFusion/\$scoreFusion on an unsharded plain collection and on sharded views.

      This ticket is the \$scoreFusion parallel to SERVER-115791 and covers the remaining \$scoreFusion-specific gaps: the multi-stage-non-selection-tail rejection that was tested only for \$rankFusion, view combinations, \$lookup, multi-level views, and cross-cutting concerns.


      What already exists (do not duplicate)

      File What it covers
      jstests/extensions/extension_in_hybrid_search.js Selection ext (\$matchTopN) in \$scoreFusion (allowed); transforming ext (\$addFieldsMatch) rejected. \$nativeVectorSearch rejection is tested only for \$rankFusion. All unsharded, plain collection.
      jstests/noPassthrough/query/hybrid_search_on_sharded_view.js \$scoreFusion top-level on a sharded view. No unsharded equivalent.
      jstests/noPassthrough/query/hybrid_search_in_unionwith_on_sharded_view.js \$scoreFusion and \$rankFusion inside \$unionWith on a sharded view, with plain stages in the input pipelines (no extensions).
      jstests/with_mongot/e2e/hybridSearch/score_fusion_on_view.js \$scoreFusion on a view with mongot \$search/\$vectorSearch pipelines. Requires a real mongot.
      jstests/with_mongot/e2e/hybridSearch/score_fusion_in_union_with_lookup_view.js \$scoreFusion inside \$unionWith/\$lookup with views, using mongot.

      Tests to implement

      All new tests should use mocha-lite style (describe/it/before from jstests/libs/mochalite.js) matching the style of extension_in_hybrid_search.js.

      Required tags for all new tests:

      /**
      * @tags: [
      * featureFlagExtensionsAPI,
      * featureFlagExtensionsInsideHybridSearch,
      * featureFlagSearchHybridScoringFull,
      * requires_fcv_82,
      * ]
      */
      

      Note: featureFlagRankFusionFull is not required for \$scoreFusion-only tests.


      Test 1 — \$nativeVectorSearch rejection in \$scoreFusion

      File: Add a new it block to jstests/extensions/extension_in_hybrid_search.js inside the existing describe.

      What to test: \$nativeVectorSearch expands to a pipeline whose tail contains a non-selection stage (\$setMetadata). The existing test covers this rejection for \$rankFusion (error code 12108704). The \$scoreFusion equivalent uses error code 12108713 and must be validated symmetrically.

      Setup: Reuse the existing nativeVectorSearchStage constant already defined in the file.

      Test case:

      assertRejectedAsNonSelection(
      [{
      $scoreFusion: {
      input: {
      pipelines: {
      a: [nativeVectorSearchStage, \{ $score: { score: "$x", normalization: "minMaxScaler" } }],
      b: [\{ $score: { score: "$y", normalization: "minMaxScaler" } }],
      },
      normalization: "none",
      },
      combination: \{ method: "avg" },
      },
      }],
      12108713,
      "$nativeVectorSearch",
      );
      

      Assert the command fails with code 12108713 and the error message names \$nativeVectorSearch.


      Test 2 — \$scoreFusion top-level on an unsharded view

      File: jstests/extensions/score_fusion_on_unsharded_view.js

      What to test: Run \$scoreFusion with \$score-based input pipelines directly against a view namespace on a standalone/unsharded mongod. This is the unsharded counterpart to hybrid_search_on_sharded_view.js.

      Setup:

      • Create a collection coll with documents {_id, x, y}.
      • Create a view collView over coll with [\{$match: \{x: \{$gte: 0\}\}\}].

      Test case: Run \$scoreFusion with pipelines a: [\{$score: \{score: "$x", normalization: "minMaxScaler"\}\}, \\{$sort: \{x: -1\}\}] and b: [\{$score: \{score: "$y", ...\}\}, \\{$sort: \{y: -1\}\}] against collView. Assert the command succeeds and returns all documents satisfying the view filter.


      Test 3 — \$scoreFusion inside a \$lookup subpipeline targeting a view

      File: jstests/extensions/score_fusion_in_lookup_on_view.js

      What to test: \$scoreFusion placed inside a \$lookup subpipeline where from: names a view. No mongot required; use \$score-only input pipelines.

      Setup:

      • Collection outer with documents {_id}.
      • Collection base with documents {_id, x, y}.
      • View baseView over base with a simple \$match filter.

      Test cases:

      1. Unsharded: db.outer.aggregate([{$lookup: {from: "baseView", as: "scored", pipeline: [\{$scoreFusion: ...\}]])}}. Assert the command succeeds and each outer document has a scored array whose length equals the number of documents in baseView.
      2. Sharded: Same pipeline using ShardingTest with 2 shards. Shard base by {_id: 1}. Place this variant in jstests/noPassthrough/query/ and add requires_sharding to the tags.

      Test 4 — Selection extension in \$scoreFusion input pipeline when running against a view

      File: jstests/extensions/score_fusion_extension_on_view.js

      What to test: The combination of (a) a selection extension (\$matchTopN) in a \$scoreFusion input pipeline and (b) the overall query running against a view namespace. This exercises the view-resolution + LP-desugaring code path introduced in SERVER-128285 end-to-end for \$scoreFusion.

      Setup: Collection coll with documents {_id, x, y}. View testView over coll with [\{$match: \{x: \{$gte: 0\}\}\}].

      Test cases:

      1. \$matchTopN in \$scoreFusion against a view — allowed: Run \$scoreFusion with pipeline a: [\{$matchTopN: \{filter: \{x: \{$gt: 2\}\}, sort: \\{x: -1\}, limit: 3\}\}, \\{$score: ...\}] against testView. Assert it succeeds and returns a non-empty result. Optionally assert the result matches replacing \$matchTopN with its manual expansion [\{$match\}, \\{$sort\}, \\{$limit\}].
      2. \$addFieldsMatch (transforming extension) in \$scoreFusion against a view — rejected: Assert the command fails with code 12108713 and the error message names \$addFieldsMatch.

      Test 5 — Selection extension inside \$scoreFusion that is itself inside a \$unionWith

      File: jstests/extensions/score_fusion_with_extension_in_unionwith.js

      What to test: The full composition: \$matchTopN in a \$scoreFusion input pipeline, where that \$scoreFusion lives inside a \$unionWith targeting a view. This is the combination that hybrid_search_in_unionwith_on_sharded_view.js (no extensions) and extension_in_hybrid_search.js (no \$unionWith) each cover half of — for \$scoreFusion.

      Setup:

      • Collection outer with one document {_id: 0}.
      • Collection base with {_id, x, y} documents.
      • View baseView over base with a \$match filter.

      Test cases:

      1. Unsharded: db.outer.aggregate([{$unionWith: {coll: "baseView", pipeline: [{$scoreFusion: {input: {pipelines: {a: [\{$matchTopN...\}, \\{$score...\}], b: \\{$score...}}}]}}])}}. Assert result count = 1 (outer doc) + however many docs the \$scoreFusion returns from the view. Assert the \$unionWith results match running the same \$scoreFusion directly against baseView.
      2. Sharded: Same pipeline with ShardingTest (2 shards, base sharded by {_id: 1}). Place in jstests/noPassthrough/query/score_fusion_with_extension_in_unionwith_sharded.js.

      Test 6 — Both \$scoreFusion input pipelines contain selection extensions simultaneously

      File: Add a new it block to jstests/extensions/extension_in_hybrid_search.js inside the existing describe.

      What to test: All input pipelines, not just pipeline "a", contain \$matchTopN. Validates that the all_of check across all input pipelines does not short-circuit after the first pipeline.

      Test case:

      coll.aggregate([{
      $scoreFusion: {
      input: {
      pipelines: {
      a: [
      { $matchTopN: \{ filter: {x: {$gt: 2}}, sort: \{x: -1}, limit: 3 } },
      { $score: \{ score: "$x", normalization: "minMaxScaler" } },
      ],
      b: [
      { $matchTopN: \{ filter: {y: {$gt: 20}}, sort: \{y: -1}, limit: 3 } },
      { $score: \{ score: "$y", normalization: "minMaxScaler" } },
      ],
      },
      normalization: "none",
      },
      combination: \{ method: "avg" },
      },
      }])
      

      Assert it succeeds and returns a non-empty result.


      Test 7 — \$scoreFusion on a multi-level view chain (view-on-view)

      File: jstests/extensions/score_fusion_on_nested_view.js

      What to test: \$scoreFusion run against a view whose viewOn is itself another view (2-level chain). Exercises the recursive view-resolution code path introduced in SERVER-128285.

      Setup:

      db.createView("level1View", collName, [\{ $match: { x: { $gte: 1 } } }]);
      db.createView("level2View", "level1View", [\{ $addFields: { fromLevel2: true } }]);
      

      Test case: Run \$scoreFusion (with \$score-based pipelines) against level2View and assert:

      • The command succeeds.
      • All returned documents have fromLevel2: true.
      • Result count matches documents in coll satisfying both view filters.

      Test 8 — Transforming extension rejection in \$scoreFusion on a sharded cluster

      File: jstests/noPassthrough/query/score_fusion_extension_rejection_sharded.js

      What to test: The rejection tests in extension_in_hybrid_search.js are unsharded. Verify that LP-time validation fires correctly on a sharded cluster (at the router, not per-shard).

      Setup: ShardingTest with 2 shards. Shard coll by {_id: 1}.

      Test cases:

      1. \$addFieldsMatch in a \$scoreFusion input pipeline → assert fails with code 12108713 and error message names \$addFieldsMatch.
      2. \$nativeVectorSearch in a \$scoreFusion input pipeline → assert fails with code 12108713 and error message names \$nativeVectorSearch.

      Assert both rejections occur at parse/LP time (error should not reference a shard name).

      Additional tags: {

      {requires_sharding}

      }

            Assignee:
            Finley Lau
            Reporter:
            Mariano Shaar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: