Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-64339

Aggregation with sort+limit against timeseries collection produces wrong result

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • ALL
    • Hide

       

      
      const documentList = [
          {_id: 28, time: new Date("2019-07-04T14:00:00.000Z"), zzz: 15, "obj": {_id: 29, "array": [] }, },
          {_id: 316, time: new Date("2019-08-05T12:00:00.000Z"), zzz: 17, "obj": {_id: 317, "array": [[]] }, },
          {_id: 354, time: new Date("2019-08-10T17:00:00.000Z"), zzz: 21, "array": [[]], "obj": {_id: 356, "array": [null], }, },
      ];
      var coll1 = db.fuzzer_coll1;
      coll1.insertMany(documentList);
      
      db.createCollection('fuzzer_coll2', {timeseries: {timeField: 'time', metaField: 'tag'}});
      var coll2 = db.fuzzer_coll2;
      coll2.insertMany(documentList);
      
      const pl1 =
          [{$densify: {field: "time", range: {step: 10, unit: "day", bounds: [new Date("2019-09-26T14:21:58.686Z"),new Date("2020-07-01T12:06:39.484Z")]}}},
           {$sort: {_id: 1}},
           {$limit: 10},
           {$graphLookup: {from: "fuzzer_coll1",
                           startWith: {$unsetField: {field: "yyy", input: "$obj"}},
                           connectFromField: "array",
                           connectToField: "obj.array",
                           as: "ArrayHierarchy",
                           depthField: "zzz"}}
          ];
      
      const pl2 =
          [{$densify: {field: "time", range: {step: 10, unit: "day", bounds: [new Date("2019-09-26T14:21:58.686Z"),new Date("2020-07-01T12:06:39.484Z")]}}},
           {$sort: {_id: 1}},
           {$limit: 10},
           {$graphLookup: {from: "fuzzer_coll2",
                           startWith: {$unsetField: {field: "yyy", input: "$obj"}},
                           connectFromField: "array",
                           connectToField: "obj.array",
                           as: "ArrayHierarchy",
                           depthField: "zzz"}}
          ];
      
      var res1 = coll1.aggregate(pl1);
      var res2 = coll2.aggregate(pl2);
      
      var expl1 = coll1.explain().aggregate(pl1);
      var expl2 = coll2.explain().aggregate(pl2);
      
      

       

      Show
        const documentList = [ {_id: 28, time: new Date( "2019-07-04T14:00:00.000Z" ), zzz: 15, "obj" : {_id: 29, "array" : [] }, }, {_id: 316, time: new Date( "2019-08-05T12:00:00.000Z" ), zzz: 17, "obj" : {_id: 317, "array" : [[]] }, }, {_id: 354, time: new Date( "2019-08-10T17:00:00.000Z" ), zzz: 21, "array" : [[]], "obj" : {_id: 356, "array" : [ null ], }, }, ]; var coll1 = db.fuzzer_coll1; coll1.insertMany(documentList); db.createCollection( 'fuzzer_coll2' , {timeseries: {timeField: 'time' , metaField: 'tag' }}); var coll2 = db.fuzzer_coll2; coll2.insertMany(documentList); const pl1 = [{$densify: {field: "time" , range: {step: 10, unit: "day" , bounds: [ new Date( "2019-09-26T14:21:58.686Z" ), new Date( "2020-07-01T12:06:39.484Z" )]}}}, {$sort: {_id: 1}}, {$limit: 10}, {$graphLookup: {from: "fuzzer_coll1" , startWith: {$unsetField: {field: "yyy" , input: "$obj" }}, connectFromField: "array" , connectToField: "obj.array" , as: "ArrayHierarchy" , depthField: "zzz" }} ]; const pl2 = [{$densify: {field: "time" , range: {step: 10, unit: "day" , bounds: [ new Date( "2019-09-26T14:21:58.686Z" ), new Date( "2020-07-01T12:06:39.484Z" )]}}}, {$sort: {_id: 1}}, {$limit: 10}, {$graphLookup: {from: "fuzzer_coll2" , startWith: {$unsetField: {field: "yyy" , input: "$obj" }}, connectFromField: "array" , connectToField: "obj.array" , as: "ArrayHierarchy" , depthField: "zzz" }} ]; var res1 = coll1.aggregate(pl1); var res2 = coll2.aggregate(pl2); var expl1 = coll1.explain().aggregate(pl1); var expl2 = coll2.explain().aggregate(pl2);  
    • QO 2022-04-04
    • 35

      A an aggregation query extracted from a fuzzer test from BF-24419 produces different result if evaluated against a timeseries collection vs a regular one.

      The attached file "reduced-fuzzer.3.js" is a highly reduced fuzzer test with just the problem pipeline, and the minimum reduced documents. If the limit in the query is high enough to produce all documents, then the test passes. However if the limit is reduced, then different documents end up in the final result. The most likely reason is the different plans for the two cases where sorting is performed at a different time wrt other operators.

      I extended the test file to print both results into the test log. The results are marked with "Aggregation result 1" and "Aggregation result 2".

      Above I also added a standalone test that attempts to reproduce the same problem. Unfortunately I cannot get exactly the same problem. With the above standalone test the resulting documents differ only in the way the result documents and the individual fields are ordered, while if I run the attached reduced fuzzer I can see documents with "time" field values that are not present in the other result.

      I also noticed that the explains printed by the attached fuzzer file are the same for both variants of the query (and I am suspicious that I am not printing the right exaplains), while the explains produced by the standalone test above are clearly different.

      Below are the explains resulting from the standalone test above:

      {
        explainVersion: '2',
        stages: [
          {
            '$cursor': {
              queryPlanner: {
                namespace: 'test.fuzzer_coll1',
                indexFilterSet: false,
                parsedQuery: {},
                queryHash: '3956A6F7',
                planCacheKey: '3956A6F7',
                maxIndexedOrSolutionsReached: false,
                maxIndexedAndSolutionsReached: false,
                maxScansToExplodeReached: false,
                winningPlan: {
                  queryPlan: {
                    stage: 'SORT',
                    planNodeId: 2,
                    sortPattern: { time: 1 },
                    memLimit: 104857600,
                    type: 'simple',
                    inputStage: {
                      stage: 'COLLSCAN',
                      planNodeId: 1,
                      filter: {},
                      direction: 'forward'
                    }
                  },
                  slotBasedPlan: {
                    slots: '$$RESULT=s4 $$RID=s5 env: { s2 = Nothing (SEARCH_META), s1 = TimeZoneDatabase(Asia/Thimphu...America/El_Salvador) (timeZoneDB), s3 = 1646990233082 (NOW) }',
                    stages: '[2] sort [s9] [asc] [s4, s5] \n' +
                      '[2] project [s9 = fillEmpty (s8, undefined)] \n' +
                      '[2] traverse s8 s7 s6 {if (s7 <=> s8 < 0, s7, s8)} {} \n' +
                      'from \n' +
                      '    [2] project [s6 = fillEmpty (getField (s4, "time"), null)] \n' +
                      '    [1] scan s4 s5 none none none none [] @"f629d175-7149-4e1d-9e10-c98027708b7d" true false \n' +
                      'in \n' +
                      '    [2] project [s7 = s6] \n' +
                      '    [2] limit 1 \n' +
                      '    [2] coscan \n'
                  }
                },
                rejectedPlans: []
              }
            }
          },
          {
            '$_internalDensify': {
              field: 'time',
              partitionByFields: [],
              range: {
                step: 10,
                bounds: [
                  ISODate("2019-09-26T14:21:58.686Z"),
                  ISODate("2020-07-01T12:06:39.484Z")
                ],
                unit: 'day'
              }
            }
          },
          { '$sort': { sortKey: { _id: 1 }, limit: Long("3") } },
          {
            '$graphLookup': {
              from: 'fuzzer_coll1',
              as: 'ArrayHierarchy',
              connectToField: 'obj.array',
              connectFromField: 'array',
              startWith: {
                '$setField': {
                  field: { '$const': 'yyy' },
                  input: '$obj',
                  value: '$$REMOVE'
                }
              },
              depthField: 'zzz'
            }
          }
        ],
        serverInfo: {
          host: 'extreme',
          port: 27017,
          version: '6.0.0-alpha',
          gitVersion: 'ee5dacd18fd01e61b449066e2081515da79a8d66'
        },
        serverParameters: {
          internalQueryFacetBufferSizeBytes: 104857600,
          internalQueryFacetMaxOutputDocSizeBytes: 104857600,
          internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
          internalDocumentSourceGroupMaxMemoryBytes: 104857600,
          internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
          internalQueryProhibitBlockingMergeOnMongoS: 0,
          internalQueryMaxAddToSetBytes: 104857600,
          internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
        },
        command: {
          aggregate: 'fuzzer_coll1',
          pipeline: [
            {
              '$densify': {
                field: 'time',
                range: {
                  step: 10,
                  unit: 'day',
                  bounds: [
                    ISODate("2019-09-26T14:21:58.686Z"),
                    ISODate("2020-07-01T12:06:39.484Z")
                  ]
                }
              }
            },
            { '$sort': { _id: 1 } },
            { '$limit': 3 },
            {
              '$graphLookup': {
                from: 'fuzzer_coll1',
                startWith: { '$unsetField': { field: 'yyy', input: '$obj' } },
                connectFromField: 'array',
                connectToField: 'obj.array',
                as: 'ArrayHierarchy',
                depthField: 'zzz'
              }
            }
          ],
          cursor: {},
          '$db': 'test'
        },
        ok: 1
      }
      
      {
        explainVersion: '1',
        stages: [
          {
            '$cursor': {
              queryPlanner: {
                namespace: 'test.system.buckets.fuzzer_coll2',
                indexFilterSet: false,
                parsedQuery: {},
                queryHash: '17830885',
                planCacheKey: '17830885',
                maxIndexedOrSolutionsReached: false,
                maxIndexedAndSolutionsReached: false,
                maxScansToExplodeReached: false,
                winningPlan: { stage: 'COLLSCAN', direction: 'forward' },
                rejectedPlans: []
              }
            }
          },
          {
            '$_internalUnpackBucket': {
              exclude: [],
              timeField: 'time',
              metaField: 'tag',
              bucketMaxSpanSeconds: 3600
            }
          },
          { '$sort': { sortKey: { time: 1 } } },
          {
            '$_internalDensify': {
              field: 'time',
              partitionByFields: [],
              range: {
                step: 10,
                bounds: [
                  ISODate("2019-09-26T14:21:58.686Z"),
                  ISODate("2020-07-01T12:06:39.484Z")
                ],
                unit: 'day'
              }
            }
          },
          { '$sort': { sortKey: { _id: 1 }, limit: Long("3") } },
          {
            '$graphLookup': {
              from: 'fuzzer_coll2',
              as: 'ArrayHierarchy',
              connectToField: 'obj.array',
              connectFromField: 'array',
              startWith: {
                '$setField': {
                  field: { '$const': 'yyy' },
                  input: '$obj',
                  value: '$$REMOVE'
                }
              },
              depthField: 'zzz'
            }
          }
        ],
        serverInfo: {
          host: 'extreme',
          port: 27017,
          version: '6.0.0-alpha',
          gitVersion: 'ee5dacd18fd01e61b449066e2081515da79a8d66'
        },
        serverParameters: {
          internalQueryFacetBufferSizeBytes: 104857600,
          internalQueryFacetMaxOutputDocSizeBytes: 104857600,
          internalLookupStageIntermediateDocumentMaxSizeBytes: 104857600,
          internalDocumentSourceGroupMaxMemoryBytes: 104857600,
          internalQueryMaxBlockingSortMemoryUsageBytes: 104857600,
          internalQueryProhibitBlockingMergeOnMongoS: 0,
          internalQueryMaxAddToSetBytes: 104857600,
          internalDocumentSourceSetWindowFieldsMaxMemoryBytes: 104857600
        },
        command: {
          aggregate: 'system.buckets.fuzzer_coll2',
          pipeline: [
            {
              '$_internalUnpackBucket': {
                timeField: 'time',
                metaField: 'tag',
                bucketMaxSpanSeconds: 3600,
                assumeNoMixedSchemaData: false
              }
            },
            {
              '$densify': {
                field: 'time',
                range: {
                  step: 10,
                  unit: 'day',
                  bounds: [
                    ISODate("2019-09-26T14:21:58.686Z"),
                    ISODate("2020-07-01T12:06:39.484Z")
                  ]
                }
              }
            },
            { '$sort': { _id: 1 } },
            { '$limit': 3 },
            {
              '$graphLookup': {
                from: 'fuzzer_coll2',
                startWith: { '$unsetField': { field: 'yyy', input: '$obj' } },
                connectFromField: 'array',
                connectToField: 'obj.array',
                as: 'ArrayHierarchy',
                depthField: 'zzz'
              }
            }
          ],
          cursor: {},
          collation: { locale: 'simple' }
        },
        ok: 1
      }
      

            Assignee:
            timour.katchaounov@mongodb.com Timour Katchaounov
            Reporter:
            timour.katchaounov@mongodb.com Timour Katchaounov
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: