Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-9023

Avoid treating arrays as literals in the aggregation pipeline

    • Type: Icon: Task Task
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • Server_Docs_20231030
    • Affects Version/s: None
    • Component/s: Server
    • Labels:
      None

      In the aggregation framework, arrays are interpreted as literals, which is not the intuitive behavior. Take the following code as an example:

      > db.foo.drop()
      true
      > db.foo.insert({_id: 0, a: ['foo', 'bar', 'baz'], b: 'bar', c: 'Baz'})
      WriteResult({ "nInserted" : 1 })
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ ] }  // Intuitively would expect [ 'bar', 'baz' ]
      > db.foo.insert({_id: 1, a: ['foo', '$b'], b: 'bar', c: 'Baz'})
      WriteResult({ "nInserted" : 1 })
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ ] }
      { "_id" : 1, "intersection" : [ "$b" ] }  // Instead, it is matching against literals '$b' and {$toLower: '$c'}
      

      Instead of evaluating '$b' to be the value of the field 'b' in the current document, as is done with '$a', '$b' is treated as a literal when it is parsed inside of an array. Because arrays are treated as literals, it also prevents a projection stage from creating arrays of fields:

      > db.bar.drop()
      true
      > db.bar.insert({_id: 0, point: {x: 10, y: 20}})
      WriteResult({ "nInserted" : 1 })
      > db.bar.aggregate([{$project: {coords: ['$point.x', '$point.y']}}])
      assert: command failed: {
      	"errmsg" : "exception: disallowed field type Array in object expression (at 'coords')",
      	"code" : 15992,
      	"ok" : 0
      } : aggregate failed
      

      The aggregation framework should parse arrays in the same ways it parses expressions elsewhere in the aggregation framework. This would yield the following results from above:

      // With documents from collection 'foo' from above.
      > db.foo.aggregate([{$project: {intersection: {$setIntersection: ['$a', ['$b', {$toLower: '$c'}]]}}}])
      { "_id" : 0, "intersection" : [ "baz", "bar" ] }
      { "_id" : 1, "intersection" : [ ] }
      
      // With document from collection 'bar' from above.
      > db.bar.aggregate([{$project: {coords: ['$point.x', '$point.y']}}])
      { "_id" : 0, "coords" : [ 10, 20 ] }
      

      Note this is a backwards breaking change.
      =======================================
      Original description:
      Builds an array.

      input:
       {a:1, b:2, c:3}
      
      operation:
       {$project: {array: {$array: ['$a', '$c', {$add:['$b', '$c'] }] } } }
      
      output:
       {array: [1, 3, 5]}
      

      An issue is how to handle missing fields (eg {$push:['$d']}). When building an object we would omit that field from the output. This would be expected if the array is a list or set, but could be a problem if indexed access is important. Possible solutions:

      1. Omit the field, shrinking the array.
      2. Replace the field with null, keeping the array the same size, but creating a value and differing from object behavior.
      3. Error out.
      4. Separate operators like $array when indexes are important and $list for when they aren't.

            Assignee:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Reporter:
            emily.hall Emily Hall
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              1 year, 25 weeks, 4 days ago