Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-101271

Timeseries query can erroneously include excluded fields after including the meta field

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • ALL
    • Hide

      Repro, run in any suite

      const tsColl = db.tsColl;
      assert(tsColl.drop());
      assert.commandWorked(db.createCollection(tsColl.getName(), {
          timeseries: {timeField: 't', metaField: 'm'},
      }));
      assert.commandWorked(tsColl.insert([
          {t: new Date(0), m: 1, x: 0},
          {t: new Date(10000), m: 2, x: 1},
      ]));
      
      const inputQuery = [{"$project": {"_id": 0, "m": 1}}, {"$group": {"_id": "$x"}}];
      
      // `inputQuery` against the timeseries view will be optimized to this query against the bucket
      // collection.
      const optimizedBucketQuery = [
          {
              "$_internalUnpackBucket": {
                  "include": ["m"],
                  "timeField": "t",
                  "metaField": "m",
                  "bucketMaxSpanSeconds": NumberInt(3600),
                  "assumeNoMixedSchemaData": true
              }
          },
          {"$group": {"_id": "$x"}}
      ];
      
      // `optimizedBucketQuery` against the timeseries bucket collection will then incorrectly be
      // optimized to this query against the bucket collection.
      // {
      //     "$_internalUnpackBucket": {
      //         "include": ["x"],
      //         "timeField": "t",
      //         "metaField": "m",
      //         "bucketMaxSpanSeconds": NumberInt(3600),
      //         "assumeNoMixedSchemaData": true
      //     }
      // },
      // {"$group": {"_id": "$x"}}
      
      const viewResults = tsColl.aggregate(inputQuery).toArray();
      const directResults = db.system.buckets[tsColl.getName()].aggregate(optimizedBucketQuery).toArray();
      jsTestLog(viewResults);
      jsTestLog(directResults);
      assert(viewResults.length === 1, viewResults);
      assert(directResults.length === 1, directResults);
      assert.sameMembers(viewResults, directResults);
      
      

      The issue appears to be the special handling for an absorbed include spec that only has the metafield. When we create the BucketUnpacker for this spec, we will remove the meta field from the include/exclude field and instead set a special _includeMetaField flag on BucketUnpacker. When we then check that the include/exclude field set is empty when trying to absorb a projection, the set will be empty if we were only including the meta field.

      In production, this can occur when we optimize a query multiple times. Queries that run during resharding appear to be re-routed after being optimized which is why this bug was discovered in a resharding passthrough suite.

      Show
      Repro, run in any suite const tsColl = db.tsColl; assert(tsColl.drop()); assert.commandWorked(db.createCollection(tsColl.getName(), { timeseries: {timeField: 't' , metaField: 'm' }, })); assert.commandWorked(tsColl.insert([ {t: new Date(0), m: 1, x: 0}, {t: new Date(10000), m: 2, x: 1}, ])); const inputQuery = [{ "$project" : { "_id" : 0, "m" : 1}}, { "$group" : { "_id" : "$x" }}]; // `inputQuery` against the timeseries view will be optimized to this query against the bucket // collection. const optimizedBucketQuery = [ { "$_internalUnpackBucket" : { "include" : [ "m" ], "timeField" : "t" , "metaField" : "m" , "bucketMaxSpanSeconds" : NumberInt(3600), "assumeNoMixedSchemaData" : true } }, { "$group" : { "_id" : "$x" }} ]; // `optimizedBucketQuery` against the timeseries bucket collection will then incorrectly be // optimized to this query against the bucket collection. // { // "$_internalUnpackBucket" : { // "include" : [ "x" ], // "timeField" : "t" , // "metaField" : "m" , // "bucketMaxSpanSeconds" : NumberInt(3600), // "assumeNoMixedSchemaData" : true // } // }, // { "$group" : { "_id" : "$x" }} const viewResults = tsColl.aggregate(inputQuery).toArray(); const directResults = db.system.buckets[tsColl.getName()].aggregate(optimizedBucketQuery).toArray(); jsTestLog(viewResults); jsTestLog(directResults); assert(viewResults.length === 1, viewResults); assert(directResults.length === 1, directResults); assert.sameMembers(viewResults, directResults); The issue appears to be the special handling for an absorbed include spec that only has the metafield. When we create the BucketUnpacker for this spec, we will remove the meta field from the include/exclude field and instead set a special _includeMetaField flag on BucketUnpacker . When we then check that the include/exclude field set is empty when trying to absorb a projection, the set will be empty if we were only including the meta field. In production, this can occur when we optimize a query multiple times. Queries that run during resharding appear to be re-routed after being optimized which is why this bug was discovered in a resharding passthrough suite.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

          Assignee:
          gil.alon@mongodb.com Gil Alon
          Reporter:
          matt.boros@mongodb.com Matt Boros
          Votes:
          0 Vote for this issue
          Watchers:
          12 Start watching this issue

            Created:
            Updated:
            None
            None
            None
            None