Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57914

Make $getField return missing if "input" is missing or not an object

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1.0-rc0
    • Component/s: None
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v5.0
    • Sprint:
      Query Optimization 2021-07-12

      Description

      We recently introduced the $getField expression to provide an alternative way to extract fields from objects by their key name. See SERVER-30417.

      Imagine you have a document where some field "foo" is missing. In this case, both the "$foo" field path expression and the corresponding $getField expression will return "missing":

      MongoDB Enterprise > db.c.drop()
      true
      MongoDB Enterprise > db.c.insert({})
      WriteResult({ "nInserted" : 1 })
      MongoDB Enterprise > db.c.aggregate([{$project: {result: "$foo"}}])
      { "_id" : ObjectId("60d10a22b614644d4c0f1a31") }
      MongoDB Enterprise > db.c.aggregate([{$project: {result: {$getField: "foo"}}}])
      { "_id" : ObjectId("60d10a22b614644d4c0f1a31") }
      

      You can see that because both expressions return "missing", no field named "result" appears in the resulting document.

      Let's now consider a similar example where the user is attempting to extract a field "bar" from object "foo". They can do so either with the dotted field path expression "$foo.bar", or with a chain of nested $getField expressions. However, the field path expression returns missing whereas the nested $getField expressions return null:

      MongoDB Enterprise > db.c.aggregate([{$project: {result: "$foo.bar"}}])
      { "_id" : ObjectId("60d10a22b614644d4c0f1a31") }
      MongoDB Enterprise > db.c.aggregate([{$project: {result: {$getField: {field: "bar", input: {$getField: "foo"}}}}}])
      { "_id" : ObjectId("60d10a22b614644d4c0f1a31"), "result" : null }
      

      The reason for this behavior is that MQL expressions generally return null when any of their inputs are either null, missing, or undefined. In the case of $getField, it will return null when the input argument is null, missing, or undefined. (The "field" argument, on the other hand, must always be a string literal, which is validated at parse time.) Furthermore, MQL expressions other than field path expressions generally do not return missing. However, $getField has a special case to return missing in order to ensure that it is analogous to a field path expression.

      The problem here is that this analogous behavior breaks down for dotted field paths. That is, a missing dotted field path will return null rather than missing if rewritten as a chain of nested $getField expressions. For this reason, we should consider changing the behavior of $getField so that it returns missing rather than null if the value of the "input" expression evaluates to missing, null, or undefined.

      There is a similar problem if a scalar exists along a dotted path:

      MongoDB Enterprise > db.c.find()
      { "_id" : ObjectId("60d10cbcb614644d4c0f1a32"), "foo" : 1 }
      MongoDB Enterprise > db.c.aggregate([{$project: {result: "$foo.bar"}}])
      { "_id" : ObjectId("60d10cbcb614644d4c0f1a32") }
      MongoDB Enterprise > db.c.aggregate([{$project: {result: {$getField: {field: "bar", input: {$getField: "foo"}}}}}])
      uncaught exception: Error: command failed: {
      	"ok" : 0,
      	"errmsg" : "PlanExecutor error during aggregation :: caused by :: $getField requires 'input' to evaluate to type Object, but got double",
      	"code" : 3041705,
      	"codeName" : "Location3041705"
      } with original command request: {
      	"aggregate" : "c",
      	"pipeline" : [
      		{
      			"$project" : {
      				"result" : {
      					"$getField" : {
      						"field" : "bar",
      						"input" : {
      							"$getField" : "foo"
      						}
      					}
      				}
      			}
      		}
      	],
      	"cursor" : {
       
      	},
      	"lsid" : {
      		"id" : UUID("5de90c09-a31d-45f7-a1bf-53d307de419f")
      	}
      } on connection: connection to 127.0.0.1:27017 : aggregate failed :
      _getErrorWithCode@src/mongo/shell/utils.js:25:13
      doassert@src/mongo/shell/assert.js:18:14
      _assertCommandWorked@src/mongo/shell/assert.js:731:17
      assert.commandWorked@src/mongo/shell/assert.js:823:16
      DB.prototype._runAggregate@src/mongo/shell/db.js:276:5
      DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12
      @(shell):1:1
      

      The field path expression will return missing whereas the $getField version will throw an exception. This means that $getField would also have to return missing if "input" is any non-object type.

      It's not obvious whether this suggested change is a good idea or not. It depends on whether we want $getField to act like all other MQL expressions, or if it should inherit the special behaviors of field path expressions.

      Shout out to Matthew Chiaravalloti for bringing this to our attention!

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ruslan.abdulkhalikov Ruslan Abdulkhalikov
              Reporter:
              david.storch David Storch
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: