Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15303

Investigate changes in SERVER-55112: Behaviour of distinct differs between collections and views

      Original Downstream Change Summary

      This is a minor change to the results of a distinct when run on a collection with a view, and the collection contains arrays at the field of interest. We are planning to backport this to 6.0.

      Description of Linked Ticket

      SERVER-27644 and SERVER-40134 already exposed differences in behaviour between running a distinct command on a collection vs. running it on a view. The aggregation pipeline created internally to run distinct on a view does not cover all the use-cases.

      While investigating the behaviour as part of DRIVERS-1486 I found some more differences.

      To test, I've created a collection with data and created a view based on an empty pipeline:

      getDocument = function (id, value) {
          return {
              _id: id,
              x: value,
              nested: {
                  x: value,
                  nested: {
                      x: value
                  },
                  array: [value]
              },
              array: [value],
              documentArray: [
                  {x: value},
                  {x: value << 2}
              ]
          };
      }
      
      db.distinctTest.drop();
      db.distinctTest.insertMany([
          getDocument(1, 1),
          getDocument(2, 2),
          getDocument(3, 3),
          getDocument(4, 1)
      ]);
      db.createView('distinctViewTest', 'distinctTest', []);
      

      I've come up with the following calls to distinct:

      db.distinctTest.distinct('x');
      db.distinctTest.distinct('nested.x');
      db.distinctTest.distinct('nested.nested.x');
      db.distinctTest.distinct('array');
      db.distinctTest.distinct('nested.array');
      db.distinctTest.distinct('documentArray');
      db.distinctTest.distinct('documentArray.x');
      db.distinctTest.distinct('documentArray[1].x');
      db.distinctTest.distinct('documentArray.1.x');
      

      Most of the cases look the same (except for different result ordering which we can ignore), but the last case differs:

      MongoDB Enterprise > db.distinctTest.distinct('documentArray.1.x');
      [ 4, 8, 12 ]
      MongoDB Enterprise > db.distinctViewTest.distinct('documentArray.1.x');
      [ ]
      

      Looking at the pipeline generated, we can see that documentArray.1.x produces three $unwind stages:

      [
          {
              "$unwind" : {
                  "path" : "$documentArray",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.1",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.1.x",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$match" : {
                  "documentArray" : {
                      "$_internalSchemaType" : "object"
                  },
                  "documentArray.1" : {
                      "$_internalSchemaType" : "object"
                  }
              }
          },
          {
              "$group" : {
                  "_id" : null,
                  "distinct" : {
                      "$addToSet" : "$documentArray.1.x"
                  }
              }
          }
      ]
      

      This is incorrect, as documentArray.1 should not unwind documentArray first, but rather use $arrayElemAt to. This modified aggregation pipeline produces the same result as the corresponding distinct command:

      [
          {
              "$set": {
                  "documentArray": { $arrayElemAt: [ "$documentArray", 1 ] }
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$unwind" : {
                  "path" : "$documentArray.x",
                  "preserveNullAndEmptyArrays" : true
              }
          },
          {
              "$match" : {
                  "documentArray" : {
                      "$_internalSchemaType" : "object"
                  }
              }
          },
          {
              "$group" : {
                  "_id" : null,
                  "distinct" : {
                      "$addToSet" : "$documentArray.x"
                  }
              }
          }
      ]
      

      I was able to reproduce this in 4.2.12, 4.4.3, and 4.9.0-alpha4. It is likely that this also affects previous versions which I didn't have on hand to test. SERVER-27644 introduced the $unwind logic and was backported to 3.4, so I expect all versions starting with that being affected.

            Assignee:
            jason.price@mongodb.com Jason Price
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              1 year, 43 weeks, 1 day ago