-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
ALL
The scenario is actually pretty specific and hard to reproduce. We have not yet reproduced it but figured by code inspection that the following bug must exist.
Setup:
- Collections 'c' and 'target' exist and at least 'target' is sharded.
- The sharded cluster has a single shard.
- A user issues a query similar to the following:
c.aggregate([{$unionWith: {coll: 'target', pipeline: [{$search: {}}]}}])
-
- For the following steps to happen in parallel, this is probably going to have to be a particularly slow query, though there's nothing to guarantee any query like this is safe from observing the below events.
- A second shard is created and added to the cluster
- A chunk or chunks from 'target' is/are migrated to the second shard.
- The migrated chunk(s) data is deleted from the original shard.
If that query is running long enough, it will start to miss results from the migrated chunk(s) from 'target'.
A quick note: this seems like a query correctness result. However, this kind of problem is possible with any normal $search in the face of concurrent shard migrations. (TODO separate ticket - couldn't find one, but documented here a bit: https://github.com/10gen/mongot/blob/master/docs/consistency/read-isolation-consistency-recency.md#sharded-cluster)
This would be preventable if the logic which constructed the sub-pipeline instantiated a ScopedCollectionFilter and kept it in the pipeline for the duration of the execution. At the time of this writing, that would be something like adding this line to the body of this helper.
If that theory is correct, the fix is easy but testing this is going to be quite tricky.
- is related to
-
SERVER-96412 tassert tripped on 1-shard sharded $unionWith + $search
- In Code Review