Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40134

Distinct command against a view can return incorrect results when the distinct path is multikey

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      Query 2019-04-22, Query 2019-05-06, Query 2019-05-20

      Description

      Consider the following distinct command against a collection "c":

      MongoDB Enterprise > db.c.drop()
      true
      MongoDB Enterprise > db.c.insert({a: [{b: 1}, {b: 2}]})
      WriteResult({ "nInserted" : 1 })
      MongoDB Enterprise > db.c.distinct("a.b")
      [ 1, 2 ]
      

      The expected response is that there are two distinct values, 1 and 2. If we create an identity view on top of "c" and run the same distinct against the view, the results are incorrect:

      MongoDB Enterprise > db.createView("v", "c", [])
      { "ok" : 1 }
      MongoDB Enterprise > db.v.distinct("a.b")
      [ [ 1, 2 ] ]
      

      Instead of getting two distinct values, 1 and 2, we get a single distinct value [1, 2]. This bug is due to how the distinct command is internally expanded into an aggregation operation by the read-only non-materialized views implementation. In particular, it expands to an $unwind followed by a $group with $addToSet such as this:

      MongoDB Enterprise > db.c.aggregate([{$unwind: {path: "$a.b", preserveNullAndEmptyArrays: true}}, {$group: {_id: null, distinct: {$addToSet: "$a.b"}}}])
      { "_id" : null, "distinct" : [ [ 1, 2 ] ] }
      

      The problem lies in the behavior of the $unwind stage, added to the distinct-to-agg transformation in SERVER-27644. Note what happens when we run this same pipeline without the $group stage:

      MongoDB Enterprise > db.c.aggregate([{$unwind: {path: "$a.b", preserveNullAndEmptyArrays: true}}])
      { "_id" : ObjectId("5c8a9ec1c2ed87542687a3b8"), "a" : [ { "b" : 1 }, { "b" : 2 } ] }
      

      When the $unwind path traverses through an array, but does not terminate at an array, no unwinding actually occurs. This is at odds with the distinct command's behavior, which expects all arrays along the path to be unwound. Fixing this will likely involve extending the expressivity of $unwind to meet the needs of the distinct command.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: