[SERVER-42160] $group stages that use a DISTINCT_SCAN do not use a SHARDING_FILTER on sharded collections Created: 11/Jul/19 Updated: 24/Jul/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | George Wangensteen | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qexec-team, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: | Assuming d.data is a sharded collection with orphan documents
The query plan returned here uses a sharding filter stage as this group cannot leverage a distinct scan
The query plan returned here does not use a sharding filter stage as this group can leverage a distinct scan This was determined mostly by code/query plan inspection as reproducing an error resulting from this is timing-dependent. |
||||||||||||||||||||
| Sprint: | Query 2019-07-29, Query 2019-08-12, Query 2019-08-26, Query 2019-09-09, Query 2019-09-23, Query 2019-10-07, Query 2019-10-21, Query 2019-12-30, Query 2020-03-23, Query 2020-04-06, QE 2021-10-18, QE 2021-11-01, QE 2021-11-15, QO 2022-02-21, QO 2022-03-07 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
$group stages that leverage a DISTINCT_SCAN in their execution do not produce a SHARDING_FILTER stage in their query plans. This presents a problem when orphan documents persist at entry to such a stage and potentially effect the result of the $group. Interestingly, because $groups using a DISTINCT_SCAN need to examine only one document for each value of the group key to execute, the problem is generally hidden; such $group's naturally "deduplicate" the orphan documents by selecting only one document for each value of the group key. However, there are (at least) three cases where this still presents problems: 3) If orphan documents persist after their "parent" documents have been deleted, the distinct scan can return stale values from these documents. |
| Comments |
| Comment by Charlie Swanson [ 07/Mar/22 ] |
|
After discussion, we think that some of the solutions outlined above are worth pursuing, but they are expensive enough that we won't schedule this ticket on its own. With a fix on the horizon, we're not interested in risking a performance change to fix this bug at the moment. After confirming that some of the correct use cases (e.g. using shard key in the index) are still working, I'm moving this ticket back to the backlog and linking it to a future project to improve grouping performance on sharded clusters where this work will fit in well. |
| Comment by Charlie Swanson [ 14/Feb/22 ] |
|
An update to this ticket after picking it up and dusting it off:
I'm going to explore option 2 and option #4.1 since they should be relatively simple to test. cc justin.seyster since you provided the most recent patch for this which just banned DISTINCT_SCAN in cases where SHARD_FILTER would make it incorrect. cc asya since you were part of the decision crew for "we need to do better than just preventing DISTINCT_SCAN optimization when a SHARD_FILTER stage would also need to be present" |
| Comment by Justin Seyster [ 03/Sep/19 ] |
|
Update on this: The fix is triggering some test failures, which I plan to diagnose before sending to code review. |
| Comment by Justin Seyster [ 29/Aug/19 ] |
|
The most likely fix here is to ban DISTINCT_SCAN on shards when executing a sharded query, and I have some code written up that will do that. This change feels a little fraught, so I want to consider all the cases. Note that this will also affect the distinct command, which uses the same code path to generate its query plan. 1) Distinct command on a sharded collection: I have an integration test written that gets an incorrect result from a distinct command, because it returns a result from an orphaned document. (I didn't look into it, but this but might have been around since before
2) Pipelines with $sort followed by $group: The pipeline gets split at the $sort stage before the I'm going to spend a little more time on testing and then put my proposed fix up for code review. |
| Comment by Justin Seyster [ 11/Jul/19 ] |
|
asya This bug is possible in almost all cases when there are orphan documents living in shards targeted by the $group. If a shard finds an index that it can use with the distinct scan optimization, it generates a plan with no shard filter, and there is a possibility that it will return an orphan document to the merging node. To fix this bug, we will make the distinct scan optimization aware of the shard filter, but that will mean that shard filtering makes this filtering impossible in almost all cases (meaning it will only benefit non-sharded collections). The distinct scan is only compatible with the shard filter if the shard key is a prefix of the field we are grouping by. |
| Comment by Asya Kamsky [ 11/Jul/19 ] |
|
This is only the case when the distinct value is not on the shard key, right? |
| Comment by George Wangensteen [ 11/Jul/19 ] |
|
After a good conversation about this with ian.boros, we've come to the unfortunate conclusion that (as currently implemented) distinct scans are never guaranteed to be correct on sharded collections. Consider the case where we attempted to have a distinct scan perform a shard filter. If the distinct scan found an orphan document and filtered it, it would have no means to reach other documents with the same index key value as the orphan, so it would jump to the next index key value. This would result in incorrect results, and the only way to fix it would be to change the distinct scan interface so that it has the means to access more than one document with the same group key, and scan documents with the same group key value until it finds one that isn't an orphan. This, though, would make the distinct scan as hypothetically slow as an index scan, although in theory there shouldn't be too many orphan documents to make it so. In any case this will require a not-insignificant change to the implementation of distinct scans to fix. |
| Comment by George Wangensteen [ 11/Jul/19 ] |
|
CC justin.seyster I believe you're most familiar with $group using distinct scans from https://jira.mongodb.org/browse/SERVER-9507 so let me know if you think the description here can be improved. |