Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- qi-search
- qi-vector-search

Assigned Teams:

Query Integration
Confidence Status:
None
Work Order:
3
Size Category:
TBD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hybrid search stages ($rankFusion + $scoreFusion) have multiple restrictions on which stages can be present (or must be present) in their input pipelines. See here in the Technical Design Document.

Originally, the input pipeline analysis / validation was done with a parsed Pipeline. Later it was switched over to BSON, because there were input pipeline stages that desugared into stages that themselves would not pass validation, but in combination did not violate the restrictions (specifically a $score with $minMaxScaler normalization was a problem). So the validation was easier to perform in BSON, to analyze the each stage independently before desugar.

The validation strategy settled on writing a set of hardcoded rules of which stages can / must be present, with no other stages being present.

While this works as the TDD dictates, this strategy is not as extensible as it could be. For example, if we add a new stage that should be allowed in a hybrid search input pipeline - we would have to manually update the validation rules each time to reflect this. Ideally, we could analyze the properties (i.e. it does not modify documents, it may reorder a document set, it orders a document set, it produces score metadata, etc) of any stage abstractly, and ensure that the input pipeline meets the needed criteria of hybrid search (i.e. the pipeline is ranked, is scored, is a selection pipeline), without hardcoding specific stages which are allowed/disallowed. With this strategy, we could write a single input validation logic for hybrid search that does not need to change, as the set of stages supported in MQL changes.

We still however, want to analyze the pre-desugared version of he input pipeline (to handle the problem addressed initially about why we switched input pipeline analysis away from a parsed Pipeline).

These factors lead us to a pretty specific solution to address this problem: introduce a StageConstraints-like concept on the LiteParsedDocumentSource, then analyze the constraints in the hybrid search input pipeline validation logic.

Note a couple of things:

A ticket is already written to introduce StageConstraints on the LiteParsedDocumentSource (SERVER-101722)
- This ticket should probably addressed first before this one
finley.lau@mongodb.com attempted something like this here, introducing "properties" on the LPDS/LPP, when cutting over the hybrid search input pipeline validation to the pre-desugared stages
- This could act as a guide / template when implementing this change

The types of properties/constraints we would likely need to know on the LiteParsedDocumentSource are (may not be exhaustive):

Does the stage modify documents (for selection critera)
Does the stage produce a ordering (for ranked critera)
Does the stage potentially re/un-order documents (for ranked critera)
- For example, a $sort -> $group is potentially not ranked after the $group
Does the stage produce score metadata (for scoring critera)

is related to

SERVER-101722 Add StageConstraints to LiteParsedDocumentSource

Backlog

Assignee:: Unassigned
Reporter:: Joe Shalabi
Participants:: Joe Shalabi
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 10 2025 10:20:11 PM UTC
Updated:: Jun 12 2025 06:47:29 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates