Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21992

Inconsistent results when grouping by possibly missing values

    • Query Optimization
    • ALL
    • Query 11 (03/14/16)

      My database contains the following documents:

      use test;
      db.test.insert(
      	{
      	  	'name' : 'first',
      	  	'color' : 'blue',
      	  	'type' : 'break'
      	}
      )
      
      db.test.insert(
      	{
      	  	'name' : 'second',
      	  	'color' : 'red',
      	  	//'type' : 'break'
      	}
      )
      

      Note that the second document does not have a type field, it's missing. The aggregation pipeline produces inconsistent results sometimes showing 'type' :null in the id field, and sometimes not.

      For example, the following aggregation does NOT return the missing type in the key:

      db.test.aggregate([
      	{
      	  	$group : {
      	  	  	'_id' : {
      	  	  	 	'name' : '$name',
      	  	  	 	'type' : '$type',
      	  	  	 	'color' : '$color' 	
      	  	  	},
      	  	  	'count' : {$sum : 1}
      	  	}
      	  	
      	},
      	{
      	  	$group : {
      	  	  	'_id' : {
      	  	  	 	'type' : '$_id.type',
      	  	  	 	'color' : '$_id.color' 	
      	  	  	},
      	  	  	'count' : {$sum : 1}
      	  	}
      	  	
      	}
      ])
      

      Results in:

      { "_id" : { "type" : "break", "color" : "blue" }, "count" : 1 }
      { "_id" : { "color" : "red" }, "count" : 1 }
      

      However, this aggregation, returns back a 'type' : null,

      db.test.aggregate([
      	{
      	  	$group : {
      	  	  	'_id' : {
      	  	  	 	'name' : '$name',
      	  	  	 	'type' : '$type',
      	  	  	 	'color' : '$color' 	
      	  	  	},
      	  	  	'count' : {$sum : 1}
      	  	}
      	  	
      	},
      	{
      	  	$group : {
      	  	  	'_id' : {
      	  	  	 	'type' : '$_id.type',
      	  	  	 	//'color' : '$_id.color' 	
      	  	  	},
      	  	  	'count' : {$sum : 1}
      	  	}
      	  	
      	}
      ])
      

      Results in:

      { "_id" : { "type" : "break" }, "count" : 1 }
      { "_id" : { "type" : null }, "count" : 1 }
      

      Note that the only difference was the comment out of the 'color' field in the second group.

      The results should be consistent between these two queries. Either the first results should look like:

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            apara Alex Paransky
            Votes:
            1 Vote for this issue
            Watchers:
            24 Start watching this issue

              Created:
              Updated: