[SERVER-17806] AND_HASH plan where first child is a multikey index scan de-duplicates twice Created: 30/Mar/15 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
The IndexScan stage will automatically turn on its de-duplication logic if it finds that the index is multikey. The AND_HASH index intersection stage will also de-duplicate by default. It is wasteful to de-duplicate twice, as this requires keeping unnecessary in-memory state describing which query results have been seen so far. We should consider adding de-duplication analysis to the query planner's analysis phase. In particular, whether or not duplication is possible could be added as a property of a QuerySolutionNode, and could be computed via QuerySolutionNode::computeProperties(). The analysis phase would ensure that a plan de-duplicates at most once. No need to duplicate the de-duplication! |
| Comments |
| Comment by Mathias Stearn [ 30/Mar/15 ] |
|
An additional example is the UpdateStage which has it's own deduping logic so doesn't need it's child to dedup. |