-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Aggregation Framework
-
Query Optimization
-
ALL
-
-
Query 2019-07-29, Query 2019-08-12, Query 2019-08-26, Query 2019-09-09, Query 2019-09-23, Query 2019-10-07, Query 2019-10-21, Query 2019-12-30, Query 2020-03-23, Query 2020-04-06, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QO 2022-02-21, QO 2022-03-07
$group stages that leverage a DISTINCT_SCAN in their execution do not produce a SHARDING_FILTER stage in their query plans. This presents a problem when orphan documents persist at entry to such a stage and potentially effect the result of the $group.
Interestingly, because $groups using a DISTINCT_SCAN need to examine only one document for each value of the group key to execute, the problem is generally hidden; such $group's naturally "deduplicate" the orphan documents by selecting only one document for each value of the group key. However, there are (at least) three cases where this still presents problems:
1) If documents are updated before a $group using a DISTINCT_SCAN runs, and before orphan documents are otherwise purged, there is a potential for the $group to use and pass forward stale values that it takes from the orphan documents rather than the "live", updated ones.
2) If https://jira.mongodb.org/browse/SERVER-5477 is merged, then $groups using DISTINCT_SCANS will run exclusively on the shards when the $group keys superset the shard keys. This will exacerbate this issue because it means orphans won't be deduplicated "accidentally" by the $group, as in this case the $group runs under the assumption that all documents with the same value for the group keys are on the same shard.
3) If orphan documents persist after their "parent" documents have been deleted, the distinct scan can return stale values from these documents.
- is depended on by
-
SERVER-5477 when sharded, no need to merge groups if $group _id is the shard key or original document _id
- Backlog
- is related to
-
SERVER-55200 DISTINCT_SCAN not used for $sort+$match+$group+$first on sharded collection
- Backlog
- related to
-
SERVER-13116 distinct isn't sharding aware
- Backlog