-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The requirements we have for $vectorSearch are:
- We must be able to defer $vectorSearch’s split point beyond its subsequent $_internalSearchIdLookup stage.
- Results must be sorted by $vectorSearchScore. This is already guaranteed by mongot, but in a sharded cluster, we must perform a streaming merge sort on the sharded results. This is achieved by specifying a sort pattern on $vectorSearch’s DPL.
We also know the following:
- $vectorSearch is guaranteed to be a Source stage (i.e a stage that always occurs at the beginning of a pipeline and produces a document stream).
- After desugaring, $vectorSearch is desugars into:
- [ $pluginVectorSearch, $_internalSearchIdLookup, $limit].
- $pluginVectorSeach is guaranteed to be a Source stage
- $_internalSearchIdLookup occurs immediately after $pluginVectorSearch.
When determining the split point, we propose the following:
- Inspect the pipeline’s first stage, determine if it is a Source stage.
- If the first stage in the pipeline is a Source stage, check if the next stage is an $_internalSearchIdLookup stage.
- If Source stage has DistributedPlanLogic with a sort pattern, keep the sort pattern for performing the merge at the split point.
- If next stage is $_internalSearchIdLookup, perform split point at $_internalSearchIdLookup.
- If next stage is not $_internalSearchIdLookup, bail out of the special case handling and perform split using existing logic.
TLDR:
Add a special case to SplitPipeline::findSplitPoint() which detects if we have a Source stage followed by an $_internalSearchIdLookup stage, which should result in the split point happening at the $_internalSearchIdLookup stage.
As part of this ticket, add unit tests (if applicable) and integration tests to ensure we split at the point at which we'd expect.
- is depended on by
-
SERVER-110278 Modify Split Pipeline logic to allow pushdown of sharding stages to shards pipeline after a split point
-
- Backlog
-