Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55112

Behaviour of distinct differs between collections and views

    • Minor Change
    • ALL
    • v6.0
    • QO 2022-04-04, QO 2022-04-18, QO 2022-05-02, QO 2022-05-16

      SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.

      While investigating the behaviour as part of DRIVERS-1486 I found some more differences.

      To test, I've created a collection with data and created a view based on an empty pipeline:

      getDocument = function (id, value) {
          return {
              _id: id,
              x: value,
              nested: {
                  x: value,
                  nested: {
                      x: value
                  },
                  array: [value]
              },
              array: [value],
              documentArray: [
                  {x: value},
                  {x: value << 2}
              ]
          };
      }
      
      db.distinctTest.drop();
      db.distinctTest.insertMany([
          getDocument(1, 1),
          getDocument(2, 2),
          getDocument(3, 3),
          getDocument(4, 1)
      ]);
      db.createView('distinctViewTest', 'distinctTest', []);
      

      I've come up with the following calls to distinct:

      db.distinctTest.distinct('x');
      db.distinctTest.distinct('nested.x');
      db.distinctTest.distinct('nested.nested.x');
      db.distinctTest.distinct('array');
      db.distinctTest.distinct('nested.array');
      db.distinctTest.distinct('documentArray');
      db.distinctTest.distinct('documentArray.x');
      db.distinctTest.distinct('documentArray[1].x');
      db.distinctTest.distinct('documentArray.1.x');
      

      Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:

      MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x');
      [ 4, 8, 12 ]
      MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x');
      [ ]
      

      Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:

      [
          {
              "$unwind" : {
                  "path" : "$documentArray",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.1",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.1.x",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$match" : {
                  "documentArray" : {
                      "$_internalSchemaType" : "object"
                  },
                  "documentArray.1" : {
                      "$_internalSchemaType" : "object"
                  }
              }
          },
          {
              "$group" : {
                  "_id" : null,
                  "distinct" : {
                      "$addToSet" : "$documentArray.1.x"
                  }
              }
          }
      ]
      

      This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:

      [
          {
              "$set": {
                  "documentArray": { $arrayElemAt: [ "$documentArray", 1 ] }
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.x",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$match" : {
                  "documentArray" : {
                      "$_internalSchemaType" : "object"
                  }
              }
          },
          {
              "$group" : {
                  "_id" : null,
                  "distinct" : {
                      "$addToSet" : "$documentArray.x"
                  }
              }
          }
      ]
      

      I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.

            Assignee:
            ted.tuckman@mongodb.com Ted Tuckman
            Reporter:
            andreas.braun@mongodb.com Andreas Braun
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: