Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-98423

Results differ depending on the order of operands in $sum inside $group with optimizations on

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • ALL
    • 0

      This came up while investigating BF-35904. The minimized query looks like this

      {
          "$group": {
              "_id": null,
              "num": {
                  "$stdDevSamp": {
                      "$sum": [
                          NumberLong("20439413136"),
                          "$measurement1"
                      ]
                  }
              }
          }
      }

      The explain output for optimizations turned has the operands swapped like so

      "$group" : {
          "_id" : {
              "$const" : null
          },
          "num" : {
              "$stdDevSamp" : {
                  "$sum" : [
                      "$measurement1",
                      {
                          "$const" : NumberLong("20439413136")
                      }
                  ]
              }
          }
      }, 

      while with optimizations off the operands remain in the same order

      "$group" : {
          "_id" : {
              "$const" : null
          },
          "num" : {
              "$stdDevSamp" : {
                  "$sum" : [
                      {
                          "$const" : NumberLong("20439413136")
                      },
                      "$measurement1"
                  ]
              }
          }
      }, 

      this change comes from ExpressionNary::optimize() as it will hold the ExpressionConstant operand in a separate array and push it back to the operand array at the end. This simple reordering causes the _addend inside DoubleDoubleSummation to differ in the 5th decimal place, which then propagates to have the result of the sum (which is pulled into AccumulatorStdDev::processInternal()) differ in the ~16th significant figure, and then _m2 will differ in the ~10th significant figure.

      I also confirmed that the results don't differ if the input query has the field path before the constant, like this

      {
          "$group": {
              "_id": null,
              "num": {
                  "$stdDevSamp": {
                      "$sum": [
                          "$measurement1",
                          NumberLong("20439413136"),
                      ]
                  }
              }
          }
      }, 

      This failure was reproduced deterministically in a VWS running ubuntu on ARM, on a regular collection with no indexes. This might also happen with other accumulators but I did not test that.

      This could be considered not a bug but we might want to add a rule to the fuzzer to accept this as a close enough result, probably inside assertAggFuzzerResultDivergenceIsAcceptable().

            Assignee:
            Unassigned Unassigned
            Reporter:
            mariano.shaar@mongodb.com Mariano Shaar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: