[SERVER-17806] AND_HASH plan where first child is a multikey index scan de-duplicates twice Created: 30/Mar/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Participants:

 Description   

The IndexScan stage will automatically turn on its de-duplication logic if it finds that the index is multikey. The AND_HASH index intersection stage will also de-duplicate by default. It is wasteful to de-duplicate twice, as this requires keeping unnecessary in-memory state describing which query results have been seen so far.

We should consider adding de-duplication analysis to the query planner's analysis phase. In particular, whether or not duplication is possible could be added as a property of a QuerySolutionNode, and could be computed via QuerySolutionNode::computeProperties(). The analysis phase would ensure that a plan de-duplicates at most once. No need to duplicate the de-duplication!



 Comments   
Comment by Mathias Stearn [ 30/Mar/15 ]

An additional example is the UpdateStage which has it's own deduping logic so doesn't need it's child to dedup.

Generated at Thu Feb 08 03:45:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.