Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.2.0-rc0
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
- qexec-team
- query-44-grooming

Assigned Teams:

Query Optimization
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:
Hide

Assuming d.data is a sharded collection with orphan documents

db.data.explain().aggregate({$group: {_id: '$_id', i: {$sum: '$i'}}})

The query plan returned here uses a sharding filter stage as this group cannot leverage a distinct scan

db.data.explain().aggregate({$group: {_id: '$_id', i: {$first: '$i'}}})

The query plan returned here does not use a sharding filter stage as this group can leverage a distinct scan

This was determined mostly by code/query plan inspection as reproducing an error resulting from this is timing-dependent.
Show
Assuming d.data is a sharded collection with orphan documents db.data.explain().aggregate({$group: {_id: '$_id' , i: {$sum: '$i' }}}) The query plan returned here uses a sharding filter stage as this group cannot leverage a distinct scan db.data.explain().aggregate({$group: {_id: '$_id' , i: {$first: '$i' }}}) The query plan returned here does not use a sharding filter stage as this group can leverage a distinct scan This was determined mostly by code/query plan inspection as reproducing an error resulting from this is timing-dependent.
Sprint:
Query 2019-07-29, Query 2019-08-12, Query 2019-08-26, Query 2019-09-09, Query 2019-09-23, Query 2019-10-07, Query 2019-10-21, Query 2019-12-30, Query 2020-03-23, Query 2020-04-06, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QO 2022-02-21, QO 2022-03-07
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

$group stages that leverage a DISTINCT_SCAN in their execution do not produce a SHARDING_FILTER stage in their query plans. This presents a problem when orphan documents persist at entry to such a stage and potentially effect the result of the $group.

Interestingly, because $groups using a DISTINCT_SCAN need to examine only one document for each value of the group key to execute, the problem is generally hidden; such $group's naturally "deduplicate" the orphan documents by selecting only one document for each value of the group key. However, there are (at least) three cases where this still presents problems:
1) If documents are updated before a $group using a DISTINCT_SCAN runs, and before orphan documents are otherwise purged, there is a potential for the $group to use and pass forward stale values that it takes from the orphan documents rather than the "live", updated ones.
2) If https://jira.mongodb.org/browse/SERVER-5477 is merged, then $groups using DISTINCT_SCANS will run exclusively on the shards when the $group keys superset the shard keys. This will exacerbate this issue because it means orphans won't be deduplicated "accidentally" by the $group, as in this case the $group runs under the assumption that all documents with the same value for the group keys are on the same shard.

3) If orphan documents persist after their "parent" documents have been deleted, the distinct scan can return stale values from these documents.

depends on

SERVER-72748 Enable feature flag

Closed

is depended on by

SERVER-5477 when sharded, no need to merge groups if $group _id is the shard key or original document _id

Backlog

is related to

SERVER-55200 DISTINCT_SCAN not used for $sort+$match+$group+$first on sharded collection

Closed

related to

SERVER-13116 distinct isn't sharding aware

Closed

Assignee:: Alya Berciu
Reporter:: George Wangensteen (Inactive)
Participants:: Alya Berciu, Asya Kamsky, Charlie Swanson, George Wangensteen, Justin Seyster
Votes:: 0 Vote for this issue
Watchers:: 23 Start watching this issue

Created:: Jul 11 2019 02:21:03 PM UTC
Updated:: May 22 2025 05:34:42 PM UTC
Resolved:: Apr 22 2025 08:16:45 AM UTC
Confidence Status Last Update:: 10/Feb/22 8:57 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates