[SERVER-55112] Behaviour of distinct differs between collections and views Created: 10/Mar/21  Updated: 29/Oct/23  Resolved: 02/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.3, 4.2.12, 4.9.0-alpha4
Fix Version/s: 6.0.0-rc5, 6.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Andreas Braun Assignee: Ted Tuckman
Resolution: Fixed Votes: 0
Labels: query-director-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Documented
is documented by DOCS-15303 Investigate changes in SERVER-55112: ... Closed
Related
related to SERVER-27644 distinct on a view doesn't "flatten" ... Closed
related to SERVER-40134 Distinct command against a view can r... Closed
is related to DRIVERS-1486 VersionedAPI: Reimplement 'distinct' ... Closed
is related to SERVER-37715 Use DISTINCT_SCAN for $unwind-$group ... Backlog
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v6.0
Sprint: QO 2022-04-04, QO 2022-04-18, QO 2022-05-02, QO 2022-05-16
Participants:

 Description   

SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.

While investigating the behaviour as part of DRIVERS-1486 I found some more differences.

To test, I've created a collection with data and created a view based on an empty pipeline:

getDocument = function (id, value) {
    return {
        _id: id,
        x: value,
        nested: {
            x: value,
            nested: {
                x: value
            },
            array: [value]
        },
        array: [value],
        documentArray: [
            {x: value},
            {x: value << 2}
        ]
    };
}
 
db.distinctTest.drop();
db.distinctTest.insertMany([
    getDocument(1, 1),
    getDocument(2, 2),
    getDocument(3, 3),
    getDocument(4, 1)
]);
db.createView('distinctViewTest', 'distinctTest', []);

I've come up with the following calls to distinct:

db.distinctTest.distinct('x');
db.distinctTest.distinct('nested.x');
db.distinctTest.distinct('nested.nested.x');
db.distinctTest.distinct('array');
db.distinctTest.distinct('nested.array');
db.distinctTest.distinct('documentArray');
db.distinctTest.distinct('documentArray.x');
db.distinctTest.distinct('documentArray[1].x');
db.distinctTest.distinct('documentArray.1.x');

Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:

MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x');
[ 4, 8, 12 ]
MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x');
[ ]

Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:

[
    {
        "$unwind" : {
            "path" : "$documentArray",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.1",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.1.x",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$match" : {
            "documentArray" : {
                "$_internalSchemaType" : "object"
            },
            "documentArray.1" : {
                "$_internalSchemaType" : "object"
            }
        }
    },
    {
        "$group" : {
            "_id" : null,
            "distinct" : {
                "$addToSet" : "$documentArray.1.x"
            }
        }
    }
]

This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:

[
    {
        "$set": {
            "documentArray": { $arrayElemAt: [ "$documentArray", 1 ] }
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.x",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$match" : {
            "documentArray" : {
                "$_internalSchemaType" : "object"
            }
        }
    },
    {
        "$group" : {
            "_id" : null,
            "distinct" : {
                "$addToSet" : "$documentArray.x"
            }
        }
    }
]

I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.



 Comments   
Comment by Githook User [ 04/May/22 ]

Author:

{'name': 'Ted Tuckman', 'email': 'ted.tuckman@mongodb.com', 'username': 'TedTuckman'}

Message: SERVER-55112 Make distinct on a view match distinct on a collection for certain array cases
Branch: v6.0
https://github.com/mongodb/mongo/commit/b8cde770ab6e244bf22238a8eb822326dfbac6ce

Comment by Githook User [ 04/May/22 ]

Author:

{'name': 'Ted Tuckman', 'email': 'ted.tuckman@mongodb.com', 'username': 'TedTuckman'}

Message: SERVER-55112 Make distinct on a view match distinct on a collection for certain array cases
Branch: v6.0
https://github.com/10gen/mongo-enterprise-modules/commit/3c09cae8abd25175f23d566e15ce1fd10c01e1f8

Comment by Githook User [ 28/Apr/22 ]

Author:

{'name': 'Ted Tuckman', 'email': 'ted.tuckman@mongodb.com', 'username': 'TedTuckman'}

Message: SERVER-55112 Make distinct on a view match distinct on a collection for certain array cases
Branch: master
https://github.com/mongodb/mongo/commit/5063c61f2e2ea9eee9501b0f83ed8674fc7f6787

Comment by Githook User [ 28/Apr/22 ]

Author:

{'name': 'Ted Tuckman', 'email': 'ted.tuckman@mongodb.com', 'username': 'TedTuckman'}

Message: SERVER-55112 Make distinct on a view match distinct on a collection for certain array cases
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/9cf74e19884e1fad5e4e31db7729d1068ed5fa3f

Comment by Kyle Suarez [ 16/Mar/21 ]

After re-reading the description more closely I am sending this ticket back to QE Triage to see if we would consider an approach similar to the one in SERVER-40134.

Generated at Thu Feb 08 05:35:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.