-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
Hybrid search stages ($rankFusion + $scoreFusion) have multiple restrictions on which stages can be present (or must be present) in their input pipelines. See here in the Technical Design Document.
Originally, the input pipeline analysis / validation was done with a parsed Pipeline. Later it was switched over to BSON, because there were input pipeline stages that desugared into stages that themselves would not pass validation, but in combination did not violate the restrictions (specifically a $score with $minMaxScaler normalization was a problem). So the validation was easier to perform in BSON, to analyze the each stage independently before desugar.
The validation strategy settled on writing a set of hardcoded rules of which stages can / must be present, with no other stages being present.
While this works as the TDD dictates, this strategy is not as extensible as it could be. For example, if we add a new stage that should be allowed in a hybrid search input pipeline - we would have to manually update the validation rules each time to reflect this. Ideally, we could analyze the properties (i.e. it does not modify documents, it may reorder a document set, it orders a document set, it produces score metadata, etc) of any stage abstractly, and ensure that the input pipeline meets the needed criteria of hybrid search (i.e. the pipeline is ranked, is scored, is a selection pipeline), without hardcoding specific stages which are allowed/disallowed. With this strategy, we could write a single input validation logic for hybrid search that does not need to change, as the set of stages supported in MQL changes.
We still however, want to analyze the pre-desugared version of he input pipeline (to handle the problem addressed initially about why we switched input pipeline analysis away from a parsed Pipeline).
These factors lead us to a pretty specific solution to address this problem: introduce a StageConstraints-like concept on the LiteParsedDocumentSource, then analyze the constraints in the hybrid search input pipeline validation logic.
Note a couple of things:
- A ticket is already written to introduce StageConstraints on the LiteParsedDocumentSource (SERVER-101722)
- This ticket should probably addressed first before this one
- finley.lau@mongodb.com attempted something like this here, introducing "properties" on the LPDS/LPP, when cutting over the hybrid search input pipeline validation to the pre-desugared stages
- This could act as a guide / template when implementing this change
The types of properties/constraints we would likely need to know on the LiteParsedDocumentSource are (may not be exhaustive):
- Does the stage modify documents (for selection critera)
- Does the stage produce a ordering (for ranked critera)
- Does the stage potentially re/un-order documents (for ranked critera)
- For example, a $sort -> $group is potentially not ranked after the $group
- Does the stage produce score metadata (for scoring critera)
- is related to
-
SERVER-101722 Add StageConstraints to LiteParsedDocumentSource
-
- Backlog
-