[SERVER-37417] Plans using $** wildcard indices can return duplicate results Created: 01/Oct/18 Updated: 29/Oct/23 Resolved: 15/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Yuta Arai |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||
| Operating System: | ALL | ||||||||||
| Steps To Reproduce: |
|
||||||||||
| Sprint: | Query 2018-10-08, Query 2018-10-22 | ||||||||||
| Participants: | |||||||||||
| Description |
|
The query planner instructs an IXSCAN (or other index access stage) to deduplicate based on whether the index is multikey. $** indices, however, may contain multiple keys for a document in the absence of arrays. Consider the example of a collection which contains the document {a: {b: 1, c: 1}} with the index {"$**": 1}. The index will contain the following keys, both referring to the same document:
The planner, however, will generate an IndexEntry which is not marked as multikey, since there are no array paths: As a result, a $** IXSCAN used to answer a query for which both of the index keys are in bounds will fail to deduplicate. The only known predicate for which this can happen is $exists. See the repro steps below for an example query that returns the same document twice. |
| Comments |
| Comment by Githook User [ 15/Oct/18 ] |
|
Author: {'name': 'yarai', 'email': 'yuta.arai@10gen.com', 'username': 'yarai'}Message: |
| Comment by Yuta Arai [ 02/Oct/18 ] |
|
david.storch Our approach is pretty much what you said. When initializing the index scan stage in the execution level, we'll set the dedup flag to be true if there are more than one unique index bounds for wildcard indexes. |
| Comment by David Storch [ 02/Oct/18 ] |
|
yuta.arai james.wahlin what's your plan for solving this? I suppose we may need some special logic to instruct an index access stage to dedup if the bounds may contain keys with multiple $_path values. |