Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87987

Timeseries optimization does not exclude the timeField though it's renamed by the $addFields and excluded by a project

    • Query Integration
    • Fully Compatible
    • ALL
    • v7.3, v7.0, v6.0, v5.0
    • Hide

      > db.createCollection("ts", {timeseries: {timeField: "time", metaField: "tag"}});
      > db.ts.insert({_id: 0, time: ISODate("2024-01-01T00:00:00.000Z"), tag: "A"});
      > db.ts.aggregate([{$addFields: {time: {$dateFromParts: {year: "$tag.none"}}}}, {$project: {tag: 1}}]);

      { "tag" : "A", "_id" : 0, "time" : null }
      Show
      > db.createCollection("ts", {timeseries: {timeField: "time", metaField: "tag"}}); > db.ts.insert({_id: 0, time: ISODate("2024-01-01T00:00:00.000Z"), tag: "A"}); > db.ts.aggregate( [{$addFields: {time: {$dateFromParts: {year: "$tag.none"}}}}, {$project: {tag: 1}}] ); { "tag" : "A", "_id" : 0, "time" : null }
    • 13

      As can be seen in the repro script, the special timeseries field time is actually renamed replaced by the $addFields stage by computing the value referencing another special field tag's any subfield and then the next $project stage excludes it but the timeseries optimization does not exclude it as follows:

      > db.ts.explain().aggregate([{$addFields: {time: {$dateFromParts: {year: "$tag.none"}}}}, {$project: {tag: 1}}]);
      {
      	"explainVersion" : "1",
      	"stages" : [
      		{
      			"$cursor" : {
      				"queryPlanner" : {
      					"namespace" : "test.system.buckets.ts",
      					"indexFilterSet" : false,
      					"parsedQuery" : {
      
      					},
      					"queryHash" : "FCBE9F38",
      					"planCacheKey" : "64E90EFC",
      					"optimizationTimeMillis" : 1,
      					"maxIndexedOrSolutionsReached" : false,
      					"maxIndexedAndSolutionsReached" : false,
      					"maxScansToExplodeReached" : false,
      					"prunedSimilarIndexes" : false,
      					"winningPlan" : {
      						"isCached" : false,
      						"stage" : "COLLSCAN",
      						"direction" : "forward"
      					},
      					"rejectedPlans" : [ ]
      				}
      			}
      		},
      		{
      			"$addFields" : {
      				"time" : {
      					"$dateFromParts" : {
      						"year" : "$meta.none"
      					}
      				}
      			}
      		},
      		{
      			"$_internalUnpackBucket" : {
      				"include" : [
      					"_id",
      					"tag"
      				],
      				"timeField" : "time",
      				"metaField" : "tag",
      				"bucketMaxSpanSeconds" : 3600,
      				"assumeNoMixedSchemaData" : true,
      				"computedMetaProjFields" : [
      					"time"
      				]
      			}
      		}
      	],
      

      It does not seems that dependency tracking or inclusion/exclusion tracking for special timeseries fields work correctly and it has been introduced around 7.3 timeframe, seeing it's failing on v7.3 branch as well.

      Ideally, we would want to totally remove $addFields as it's excluded by the subsequent $project while optimizaing the pipeline

      The simplest fix would be to not push down $addFields when it actually renames the timeseries special field timeField.

            Assignee:
            erin.zhu@mongodb.com Erin Zhu
            Reporter:
            yoonsoo.kim@mongodb.com Yoon Soo Kim
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: