[DOCS-15303] Investigate changes in SERVER-55112: Behaviour of distinct differs between collections and views Created: 02/May/22  Updated: 13/Nov/23  Resolved: 17/Jun/22

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 6.1.0-rc0, 6.0.0-rc5, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Jason Price
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by DOCS-15317 [BACKPORT] [v6.0] Behaviour of distin... Closed
Documented
documents SERVER-55112 Behaviour of distinct differs between... Closed
Participants:
Days since reply: 1 year, 33 weeks, 1 day ago
Epic Link: DOCSP-21764
Story Points: 3

 Description   
Original Downstream Change Summary

This is a minor change to the results of a distinct when run on a collection with a view, and the collection contains arrays at the field of interest. We are planning to backport this to 6.0.

Description of Linked Ticket

SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.

While investigating the behaviour as part of DRIVERS-1486 I found some more differences.

To test, I've created a collection with data and created a view based on an empty pipeline:

getDocument = function (id, value) {
    return {
        _id: id,
        x: value,
        nested: {
            x: value,
            nested: {
                x: value
            },
            array: [value]
        },
        array: [value],
        documentArray: [
            {x: value},
            {x: value << 2}
        ]
    };
}
 
db.distinctTest.drop();
db.distinctTest.insertMany([
    getDocument(1, 1),
    getDocument(2, 2),
    getDocument(3, 3),
    getDocument(4, 1)
]);
db.createView('distinctViewTest', 'distinctTest', []);

I've come up with the following calls to distinct:

db.distinctTest.distinct('x');
db.distinctTest.distinct('nested.x');
db.distinctTest.distinct('nested.nested.x');
db.distinctTest.distinct('array');
db.distinctTest.distinct('nested.array');
db.distinctTest.distinct('documentArray');
db.distinctTest.distinct('documentArray.x');
db.distinctTest.distinct('documentArray[1].x');
db.distinctTest.distinct('documentArray.1.x');

Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:

MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x');
[ 4, 8, 12 ]
MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x');
[ ]

Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:

[
    {
        "$unwind" : {
            "path" : "$documentArray",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.1",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.1.x",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$match" : {
            "documentArray" : {
                "$_internalSchemaType" : "object"
            },
            "documentArray.1" : {
                "$_internalSchemaType" : "object"
            }
        }
    },
    {
        "$group" : {
            "_id" : null,
            "distinct" : {
                "$addToSet" : "$documentArray.1.x"
            }
        }
    }
]

This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:

[
    {
        "$set": {
            "documentArray": { $arrayElemAt: [ "$documentArray", 1 ] }
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$unwind" : {
            "path" : "$documentArray.x",
            "preserveNullAndEmptyArrays" : true
        }
    },
    {
        "$match" : {
            "documentArray" : {
                "$_internalSchemaType" : "object"
            }
        }
    },
    {
        "$group" : {
            "_id" : null,
            "distinct" : {
                "$addToSet" : "$documentArray.x"
            }
        }
    }
]

I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.



 Comments   
Comment by Githook User [ 21/Jun/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: Docs-15303 distinct collections and views (#1263)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: v6.1
https://github.com/10gen/docs-mongodb-internal/commit/c4e8132a093e8bf1073a576c042d2e0c236afcac

Comment by Githook User [ 21/Jun/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: Docs-15303 distinct collections and views (#1263)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: v6.1
https://github.com/10gen/docs-mongodb-internal/commit/c4e8132a093e8bf1073a576c042d2e0c236afcac

Comment by Githook User [ 17/Jun/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: Docs-15303 distinct collections and views (#1263)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: master
https://github.com/10gen/docs-mongodb-internal/commit/c4e8132a093e8bf1073a576c042d2e0c236afcac

Comment by Githook User [ 17/Jun/22 ]

Author:

{'name': 'jason-price-mongodb', 'email': '69260375+jason-price-mongodb@users.noreply.github.com', 'username': 'jason-price-mongodb'}

Message: Docs-15303 distinct collections and views (#1263)

Co-authored-by: jason-price-mongodb <jshfjghsdfgjsdjh@aolsdjfhkjsdhfkjsdf.com>
Branch: master
https://github.com/10gen/docs-mongodb-internal/commit/c4e8132a093e8bf1073a576c042d2e0c236afcac

Comment by Education Bot [ 04/May/22 ]

Fix Version updated for upstream SERVER-55112:
6.1.0-rc0, 6.0.0-rc5

Comment by Education Bot [ 02/May/22 ]

Fix Version updated for upstream SERVER-55112:
6.1.0-rc0

Generated at Thu Feb 08 08:12:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.