[SERVER-37417] Plans using $** wildcard indices can return duplicate results Created: 01/Oct/18  Updated: 29/Oct/23  Resolved: 15/Oct/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Yuta Arai
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-36198 Enable $** index builds by default in... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

(function() {
    "use strict";
 
    db.c.drop();
    assert.commandWorked(db.c.insert({a: {b: 1, c: 1}}));
    assert.eq(1, db.c.find({a: {$exists: true}}).itcount());
 
    assert.commandWorked(db.c.createIndex({"$**": 1}));
    assert.eq(1, db.c.find({a: {$exists: true}}).itcount());
}());

Sprint: Query 2018-10-08, Query 2018-10-22
Participants:

 Description   

The query planner instructs an IXSCAN (or other index access stage) to deduplicate based on whether the index is multikey. $** indices, however, may contain multiple keys for a document in the absence of arrays. Consider the example of a collection which contains the document {a: {b: 1, c: 1}} with the index {"$**": 1}. The index will contain the following keys, both referring to the same document:

  • {$_path: "a.b", "a.b": 1}
  • {$_path: "a.c", "a.c": 1}

The planner, however, will generate an IndexEntry which is not marked as multikey, since there are no array paths:

https://github.com/mongodb/mongo/blob/175f5e3c25ddba439b7d28254a4af5504aded0d8/src/mongo/db/query/planner_ixselect.cpp#L103-L110

As a result, a $** IXSCAN used to answer a query for which both of the index keys are in bounds will fail to deduplicate. The only known predicate for which this can happen is $exists. See the repro steps below for an example query that returns the same document twice.



 Comments   
Comment by Githook User [ 15/Oct/18 ]

Author:

{'name': 'yarai', 'email': 'yuta.arai@10gen.com', 'username': 'yarai'}

Message: SERVER-37417 Plans using $** wildcard indices can return duplicate results
Branch: master
https://github.com/mongodb/mongo/commit/e62512d50329877c84a7b2404c8c6158479efede

Comment by Yuta Arai [ 02/Oct/18 ]

david.storch Our approach is pretty much what you said. When initializing the index scan stage in the execution level, we'll set the dedup flag to be true if there are more than one unique index bounds for wildcard indexes.

Comment by David Storch [ 02/Oct/18 ]

yuta.arai james.wahlin what's your plan for solving this? I suppose we may need some special logic to instruct an index access stage to dedup if the bounds may contain keys with multiple $_path values.

Generated at Thu Feb 08 04:45:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.