[SERVER-40134] Distinct command against a view can return incorrect results when the distinct path is multikey Created: 14/Mar/19  Updated: 29/Oct/23  Resolved: 14/May/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: 3.4.20, 3.6.11, 4.0.6, 4.1.9
Fix Version/s: 3.6.14, 4.1.12, 4.0.11

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-6436 Add method for unwinding nested arrays Backlog
is related to SERVER-55112 Behaviour of distinct differs between... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Query 2019-04-22, Query 2019-05-06, Query 2019-05-20
Participants:

 Description   

Consider the following distinct command against a collection "c":

MongoDB Enterprise > db.c.drop()
true
MongoDB Enterprise > db.c.insert({a: [{b: 1}, {b: 2}]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.distinct("a.b")
[ 1, 2 ]

The expected response is that there are two distinct values, 1 and 2. If we create an identity view on top of "c" and run the same distinct against the view, the results are incorrect:

MongoDB Enterprise > db.createView("v", "c", [])
{ "ok" : 1 }
MongoDB Enterprise > db.v.distinct("a.b")
[ [ 1, 2 ] ]

Instead of getting two distinct values, 1 and 2, we get a single distinct value [1, 2]. This bug is due to how the distinct command is internally expanded into an aggregation operation by the read-only non-materialized views implementation. In particular, it expands to an $unwind followed by a $group with $addToSet such as this:

MongoDB Enterprise > db.c.aggregate([{$unwind: {path: "$a.b", preserveNullAndEmptyArrays: true}}, {$group: {_id: null, distinct: {$addToSet: "$a.b"}}}])
{ "_id" : null, "distinct" : [ [ 1, 2 ] ] }

The problem lies in the behavior of the $unwind stage, added to the distinct-to-agg transformation in SERVER-27644. Note what happens when we run this same pipeline without the $group stage:

MongoDB Enterprise > db.c.aggregate([{$unwind: {path: "$a.b", preserveNullAndEmptyArrays: true}}])
{ "_id" : ObjectId("5c8a9ec1c2ed87542687a3b8"), "a" : [ { "b" : 1 }, { "b" : 2 } ] }

When the $unwind path traverses through an array, but does not terminate at an array, no unwinding actually occurs. This is at odds with the distinct command's behavior, which expects all arrays along the path to be unwound. Fixing this will likely involve extending the expressivity of $unwind to meet the needs of the distinct command.



 Comments   
Comment by Githook User [ 08/Jul/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: SERVER-40134 fix bug in distinct() against views
Branch: v3.6
https://github.com/mongodb/mongo/commit/5425869f3612ea31c276221e920a60b8933f1a77

Comment by Githook User [ 13/Jun/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: SERVER-40134 fix bug in distinct() against views
Branch: v4.0
https://github.com/mongodb/mongo/commit/013537d55cbde2952f1287e819990235f7282806

Comment by Githook User [ 14/May/19 ]

Author:

{'email': 'puppyofkosh@gmail.com', 'name': 'Ian Boros', 'username': 'puppyofkosh'}

Message: SERVER-40134 fix bug in distinct() against views
Branch: master
https://github.com/mongodb/mongo/commit/e96d68d0c46b43c8fada1224436638a135731a38

Generated at Thu Feb 08 04:54:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.