Currently both hybrid search stages ($rankFusion & $scoreFusion) analyze the desugared version of there input pipelines (as Pipeline object).
Â
This is not ideal because we have constraints for both input pipeline (that is that they must both be selection pipelines, and also be ranked/scored pipelines); but analyzing the desugared version makes it difficult to recognize these rules after some valid stages desugar into other stages (like $score).
Â
Instead we should analyze the pre-desugared version of the input pipelines (that is the input pipeline parsed as a LiteParsedPipeline), and then later parse to a full Pipeline.
So for both $rankFusion and $scoreFusion we should:
- Parse the input pipeline to LiteParsedPipeline (here / here)
- Modify the input pipeline validator functions to analyze the LiteParsedPipeline
- After validation passes, then parse to full Pipeline, and all downstream logic for both stages should remain unaffected (ideally we would "upgrade" from a LiteParsedPipeline to a full Pipeline, but that functionality does not currently exist).
Â
Furthermore, we should look for all opportunities to consolidate logic between $rankFusion and $scoreFusion in the document_source_hybrid_scoring_util files. Here is my suggestion:
- Have 3 util functions that all analyze a LiteParsedPipeline
Â
Then the input pipeline validation functions can look like:
- $rankFusion: isSelectionPipeline && isRankedPipeline
- $scoreFusion: isSelectionPipeline && isScoredPipeline
Â
Also remember to retain the generatesMetadataType(DocumentMetadataFields::kScore) check once a Pipeline is created (after LiteParsedPipeline input pipeline analysis).
This change should also set us up to support nested hybrid searches, if we later choose to support these. It might end up being trivial to support if this goes well.
Â
Testing:
All existing tests in our hybrid search suites should continue to pass.
Â
Add tests for placing a $score with $minMaxScaler normalization inside of $scoreFusion input pipeline. This should have already worked, but doesn't because $score with $minMaxScaler desugars into stages that don't currently pass input pipeline validation - so this change should also fix this case. Make sure to add tests with both $scoreFusion and $score requesting scoreDetails and not.
- is depended on by
-
SERVER-104730 Explicitly ban nested $rankFusions and $scoreFusions
-
- Closed
-
- is related to
-
SERVER-104636 Do not serialize _internalOutputSortKeyMetadata in query shape hash
-
- Closed
-
-
SERVER-97189 $rankFusion and $scoreFusion are missing parsing assertions
-
- Closed
-
-
SERVER-107409 View definitions with stages that include $rankFusion with subpipelines are not getting disallowed
-
- Closed
-
-
SERVER-108052 DocumentSourceInternalSetWindowFields doesn't have sort key metadata in bounded sorts
-
- Closed
-
-
SERVER-101702 Add tests for $rankFusion in $lookup/$unionWith subpipelines
-
- Closed
-
-
SERVER-101781 Reject $rankFusion in a view definiton
-
- Closed
-
-
SERVER-102728 Audit rankFusion's scoreDetails testing
-
- Closed
-
- related to
-
SERVER-104636 Do not serialize _internalOutputSortKeyMetadata in query shape hash
-
- Closed
-
-
SERVER-96764 Create geoNear index in rankFusion auth test
-
- Closed
-
-
SERVER-97189 $rankFusion and $scoreFusion are missing parsing assertions
-
- Closed
-
-
SERVER-98343 $rankFusion seg faults if given empty pipeline
-
- Closed
-
-
SERVER-99887 $setWindowFields can fail when spilling to disk
-
- Closed
-
-
SERVER-100394 Validation of score $meta field is skipped for mongot queries
-
- Closed
-
-
SERVER-101653 Do not allow rankFusion to run on views
-
- Closed
-
-
SERVER-105677 $skip is incorrectly disallowed inside $rankFusion
-
- Closed
-
-
SERVER-107409 View definitions with stages that include $rankFusion with subpipelines are not getting disallowed
-
- Closed
-
-
SERVER-108052 DocumentSourceInternalSetWindowFields doesn't have sort key metadata in bounded sorts
-
- Closed
-
-
SERVER-94669 Implement 'scoreDetails' for $rankFusion
-
- Closed
-
-
SERVER-94841 Implement 'score' for $score
-
- Closed
-
-
SERVER-91200 Add end-to-end ranked fusion test using existing syntax
- Closed
-
SERVER-91201 Add end-to-end score fusion test using existing syntax
- Closed
-
SERVER-82019 Create feature flag
-
- Closed
-
-
SERVER-91278 Allow sorting by more kinds of metadata
-
- Closed
-
-
SERVER-91279 Add example rank fusion tests which use $setWindowFields
-
- Closed
-
-
SERVER-91281 Allow $rank and $denseRank window functions to operate without a SortKeyPattern
-
- Closed
-
-
SERVER-91907 Create skeleton of DocumentSourceRankFusion
-
- Closed
-
-
SERVER-91909 Implement basic parsing of $rankFusion
-
- Closed
-
-
SERVER-91911 Add validation that $rankFusion subpipelines are valid
-
- Closed
-
-
SERVER-91912 Test auth for $rankFusion stage
-
- Closed
-
-
SERVER-92213 Implement desugaring of $rankFusion
-
- Closed
-
-
SERVER-92244 Create Feature Flag for Milestones 1 and 2
-
- Closed
-
-
SERVER-92357 Create js testing strategy for search scoring differences in sharded vs non-sharded configurations
-
- Closed
-
-
SERVER-94603 Add 'weights' argument to $rankFusion
-
- Closed
-
-
SERVER-94660 Test $rankFusion query shape and stable API restrictions
-
- Closed
-
-
SERVER-94668 POC 'scoreDetails' for $score
-
- Closed
-
-
SERVER-96064 Optimize away $sort directly after $vectorSearch for single node environments
-
- Closed
-
-
SERVER-96736 Switch $rankFusion to use 'featureFlagSearchHybridScoringPrerequisites'
-
- Closed
-
-
SERVER-97339 Implement serialization and query shape testing for $score
-
- Closed
-
-
SERVER-97645 Refactor serializeTransformation to remove redundant explain parameter
-
- Closed
-
-
SERVER-97915 Create feature flag for earlier $rankFusion milestone
-
- Closed
-
-
SERVER-97917 Move $rankFusion to be guarded under featureFlagRankFusionBasic
-
- Closed
-
-
SERVER-97919 Enable featureFlagRankFusionBasic by default
-
- Closed
-
-
SERVER-98453 Refactor $sort to make it easier to add a new option
-
- Closed
-
-
SERVER-98994 Make featureFlagRankFusionBasic FCV-gated
-
- Closed
-
-
SERVER-99153 $rankFusion scoreDetails should error if not requested by top-level pipeline (but specified in inner)
-
- Closed
-
-
SERVER-99169 score cannot be used when not defined
-
- Closed
-
-
SERVER-99210 $rankFusion should reject duplicate input.pipelines names
-
- Closed
-
-
SERVER-99335 Support projecting textScore with $meta: score
-
- Closed
-
-
SERVER-99589 Consolidate metadata dependency tracking of search and non-search metadata
-
- Closed
-
-
SERVER-99596 Refactor how metadata dependencies are validated
-
- Closed
-
-
SERVER-100045 Add property-based test for metadata field dependency validation
-
- Closed
-
-
SERVER-100107 Add "description" and "weight" to $rankFusion scoreDetails
-
- Closed
-
-
SERVER-100203 Change scoreDetails to use array rather than object
-
- Closed
-
-
SERVER-100546 Handle deps tracking generically in DocumentSourceFacet::getDependencies()
-
- Closed
-
-
SERVER-100678 scoreDetails cannot be used when not defined
-
- Closed
-
-
SERVER-100799 Guard meta_dependency_validation.js on featureFlagRankFusionFull
-
- Closed
-
-
SERVER-100948 Re-enable scoreDetails "value" field
-
- Closed
-
-
SERVER-101155 Run search e2e metadata tests in the search passthroughs
-
- Closed
-
-
SERVER-101701 $rankFusion must be the first stage of the pipeline
-
- Closed
-
-
SERVER-101702 Add tests for $rankFusion in $lookup/$unionWith subpipelines
-
- Closed
-
-
SERVER-101781 Reject $rankFusion in a view definiton
-
- Closed
-
-
SERVER-107693 [v8.0] Backport Hybrid Search Rank Fusion
-
- Closed
-
-
SERVER-99505 Refactor string building in document_source_rank_fusion.cpp
-
- Closed
-
-
SERVER-101568 $rankFusion should reject scoreDetails: true unless FF is on
-
- Closed
-
-
SERVER-88046 Support $vectorSearch execution within unionWith subpipeline
-
- Closed
-
-
SERVER-93391 Remove rankConstant field from $rankFusion
-
- Closed
-
-
SERVER-93576 Remove vector embeddings from $vectorSearch explain
-
- Closed
-
-
SERVER-95162 Refactor multiversion query test to make it more re-usable
-
- Closed
-
-
SERVER-95164 Allow more than 2 input pipelines for $rankFusion
-
- Closed
-
-
SERVER-95168 Implement $setWindowFields version of desugaring for $rankFusion
-
- Closed
-
-
SERVER-95169 Add multiversion query test which stresses sharded scenarios
-
- Closed
-
-
SERVER-96127 Adjust $rankFusion syntax to adopt 'input.pipelines' revision
-
- Closed
-
-
SERVER-96154 Validate field names for 'inputs.pipelines' and 'weights'
-
- Closed
-
-
SERVER-96792 Allow {$meta: "score"} to return any kind of score metadata
-
- Closed
-
-
SERVER-96793 Allow {$meta: "scoreDetails"} to return any kind of scoreDetails metadata
-
- Closed
-
-
SERVER-97103 Allow sorting by {$meta: "score"}
-
- Closed
-
-
SERVER-97104 Restrict {$meta: "scoreDetails"} to featureFlagRankFusionFull and apiStrict=false
-
- Closed
-
-
SERVER-98322 Improve field path validation error messages
-
- Closed
-
-
SERVER-99674 Change $rankFusion weights object to accept a subset of pipelines specified
-
- Closed
-
-
SERVER-99675 Improve feedback for misspelled $rankFusion pipelines in weights object
-
- Closed
-
-
SERVER-99732 Switch sort key metadata assertion to tassert (from invariant)
-
- Closed
-
-
SERVER-100752 $rankFusion should output score metadata
-
- Closed
-
-
SERVER-102728 Audit rankFusion's scoreDetails testing
-
- Closed
-
-
SERVER-96835 Update commands_lib.js rankFusion pipeline to new syntax.
-
- Closed
-
-
SERVER-97102 Stress test dependency tracking for {$meta: "scoreDetails"} for pipelines spanning across sharded network split
-
- Closed
-