-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: 4.4.3, 4.2.12, 4.9.0-alpha4
-
Component/s: None
-
Minor Change
-
ALL
-
v6.0
-
QO 2022-04-04, QO 2022-04-18, QO 2022-05-02, QO 2022-05-16
-
None
-
None
-
None
-
None
-
None
-
None
-
None
SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.
While investigating the behaviour as part of DRIVERS-1486 I found some more differences.
To test, I've created a collection with data and created a view based on an empty pipeline:
getDocument = function (id, value) {
return {
_id: id,
x: value,
nested: {
x: value,
nested: {
x: value
},
array: [value]
},
array: [value],
documentArray: [
{x: value},
{x: value << 2}
]
};
}
db.distinctTest.drop();
db.distinctTest.insertMany([
getDocument(1, 1),
getDocument(2, 2),
getDocument(3, 3),
getDocument(4, 1)
]);
db.createView('distinctViewTest', 'distinctTest', []);
I've come up with the following calls to distinct:
db.distinctTest.distinct('x'); db.distinctTest.distinct('nested.x'); db.distinctTest.distinct('nested.nested.x'); db.distinctTest.distinct('array'); db.distinctTest.distinct('nested.array'); db.distinctTest.distinct('documentArray'); db.distinctTest.distinct('documentArray.x'); db.distinctTest.distinct('documentArray[1].x'); db.distinctTest.distinct('documentArray.1.x');
Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:
MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x'); [ 4, 8, 12 ] MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x'); [ ]
Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:
[
{
"$unwind" : {
"path" : "$documentArray",
"preserveNullAndEmptyArrays" : true
}
},
{
"$unwind" : {
"path" : "$documentArray.1",
"preserveNullAndEmptyArrays" : true
}
},
{
"$unwind" : {
"path" : "$documentArray.1.x",
"preserveNullAndEmptyArrays" : true
}
},
{
"$match" : {
"documentArray" : {
"$_internalSchemaType" : "object"
},
"documentArray.1" : {
"$_internalSchemaType" : "object"
}
}
},
{
"$group" : {
"_id" : null,
"distinct" : {
"$addToSet" : "$documentArray.1.x"
}
}
}
]
This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:
[
{
"$set": {
"documentArray": { $arrayElemAt: [ "$documentArray", 1 ] }
}
},
{
"$unwind" : {
"path" : "$documentArray",
"preserveNullAndEmptyArrays" : true
}
},
{
"$unwind" : {
"path" : "$documentArray.x",
"preserveNullAndEmptyArrays" : true
}
},
{
"$match" : {
"documentArray" : {
"$_internalSchemaType" : "object"
}
}
},
{
"$group" : {
"_id" : null,
"distinct" : {
"$addToSet" : "$documentArray.x"
}
}
}
]
I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.
- is related to
-
DRIVERS-1486 VersionedAPI: Reimplement 'distinct' in 4.9+
-
- Development Complete
-
-
SERVER-37715 Use DISTINCT_SCAN for $unwind-$group pipelines
-
- Backlog
-
- related to
-
SERVER-27644 distinct on a view doesn't "flatten" array members
-
- Closed
-
-
SERVER-40134 Distinct command against a view can return incorrect results when the distinct path is multikey
-
- Closed
-