[SERVER-77666] project and setWindowFields accepts "continuous" and "discrete" values. Created: 31/May/23  Updated: 29/Oct/23  Resolved: 08/Jun/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.0-rc1
Fix Version/s: 7.1.0-rc0, 7.0.0-rc4

Type: Bug Priority: Major - P3
Reporter: Slav Babanin Assignee: Irina Yatsenko (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Operating System: OS X
Backport Requested:
v7.0
Sprint: QI 2023-06-12
Participants:

 Description   

I’ve been experimenting with different ‘method’ values as mentioned in the design documentation. I tested different values for the method parameter in the  $group stage using the following code snippet:

test.aggregate( [
  {
    $group: {
      _id: "$t",
      Percentile: {
        $percentile: {
          p: [
            0.9
          ],
          method: "discrete",
          input: "$a"
        }
      }
    }
  }
]) 

For both ‘continuous’ and ‘discrete’ values, I received the error message{} 'MongoServerError: Currently only approximate percentiles are supported'.

As I understand it, this aligns with the comment in the design documentation provided here “https://docs.google.com/document/d/1NVQ6hiD3rvt03Eegb9JZj5mtWMHnX1ueumAbejwdOWY/edit?disco=AAAAvPx8do4”. It seems that only ‘approximate’ is supported as of now.

However, I noticed that when I used the $project stage with 'continuous', 'discrete' and 'approximate' values, they all worked fine if I used a field reference in the input parameter. Here’s an example:

test.aggregate([
  {
    $project: {
      percentile: {
        $percentile: {
          input: $a,
          p: [
            0.5,
            0.9,
            0.95
          ],
          method: "continuous"
        }
      }
    }
  }
]) 

The same with $setWindowFields stage, it supports all three methods without any issues. For example:

test.aggregate( {
  $setWindowFields: {
    partitionBy: "$t",
    sortBy: {
      t: 1
    },
    output: {
      sat_p95: {
        $percentile: {
          input: "$a",
          p: [
            0.95
          ],
          method: "continuous"
        },
        window: {
          documents: [
            -1,
            0
          ]
        }
      }
    }
  }
}) 

Collection test data: 

[
  { _id: ObjectId("6477bc4707718501f4056fab"), t: 0, a: 1 },
  { _id: ObjectId("6477bc4707718501f4056fac"), t: 0, a: 2 },
  { _id: ObjectId("6477bc4707718501f4056fad"), t: 1, a: 2 },
  { _id: ObjectId("6477bc4707718501f4056fae"), t: 1, a: 4 },
  { _id: ObjectId("6477bc4707718501f4056faf"), t: 1, a: 5 },
  { _id: ObjectId("6477bc4707718501f4056fb0"), t: 1, a: 6 },
  { _id: ObjectId("6477bc4707718501f4056fb1"), t: 1, a: 4 },
  { _id: ObjectId("6477bc4707718501f4056fb2"), t: 1, a: 1 },
  { _id: ObjectId("6477bc4707718501f4056fb3"), t: 1, a: 2 },
  { _id: ObjectId("6477bc4707718501f4056fb4"), t: 1, a: 3 },
  { _id: ObjectId("6477bc4707718501f4056fb5"), t: 1, a: 1 },
  { _id: ObjectId("6477bc4707718501f4056fb6"), t: 1, a: 100 }
] 



 Comments   
Comment by Githook User [ 13/Jun/23 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-77666 Refactor how method field in $percentile is checked
Branch: v7.0
https://github.com/mongodb/mongo/commit/cba0714ce00253155a3c6374f48fea5acf4dbdd0

Comment by Githook User [ 08/Jun/23 ]

Author:

{'name': 'Irina Yatsenko', 'email': 'irina.yatsenko@mongodb.com', 'username': 'IrinaYatsenko'}

Message: SERVER-77666 Refactor how method field in $percentile is checked
Branch: master
https://github.com/mongodb/mongo/commit/5f97bdf2b3b2cbf2fd817fc92ea3bc60cc845582

Comment by Irina Yatsenko (Inactive) [ 07/Jun/23 ]

Requesting backport because if an expression with "continuous" method is run over array input, it will bypass the initial parsing checks and run into an internal tassert about "continuous" algo not being implemented, which means the user can intentionally crash the server by crafting their own input.

Comment by Irina Yatsenko (Inactive) [ 07/Jun/23 ]

Unfortunately, if the input isn't a scalar and the method is specified as "continuous" we'll attempt to create an accumulator and hit internal tassert, which would kill the server.

Comment by Irina Yatsenko (Inactive) [ 02/Jun/23 ]

ExpressionFromAccumulatorQuantile might create a percentile algorithm directly, without going through the accumulator which would reject not implemented methods. It would return a correct result (e.g. when the input is a scalar all methods should return that scalar as the answer for any percentile), however, it might be confusing for the users why sometimes we complain about not implemented methods and sometimes we don't. We should consistently reject "discrete" and "continuous" percentiles for now.

Generated at Thu Feb 08 06:36:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.