[SERVER-74302] issues with GROUP_BY stage in aggregation (6.2.0) Created: 23/Feb/23  Updated: 27/Oct/23  Resolved: 17/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Anderson Assignee: Milena Ivanova
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Operating System: ALL
Steps To Reproduce:

example data:

 

[
{ metadata: { testId: "foo", publicFacetLabels: {difficulty: "easy"}, score: 2 } },
{ metadata: { testId: "foo", publicFacetLabels: {difficulty: "easy"}, score: 2 } },
{ metadata: { testId: "foo", publicFacetLabels: {difficulty: "easy"}, score: 2 } },
{ metadata: { testId: "foo", publicFacetLabels: {difficulty: "hard"}, score: 2 } },
{ metadata: { testId: "foo", publicFacetLabels: {difficulty: "hard"}, score: 2 } },
]

Execute pipeline:

[{
 $match: {
  'metadata.testId': 'foo'
 }
}, {
 $group: {
  _id: {
   difficulty: '$metadata.publicFacetLabels.difficulty'
  },
  responseCount: {
 $count: {}
  },
  sumScore: {
   $sum: '$metadata.score'
  }
 }
}, {
 $project: {
  _id: 0,
  sumScore: 1,
  responseCount: 1,
  difficulty: '$_id.difficulty'
 }
}] 

Expected Result:

[
{difficulty: "easy", sumScore: 6, responseCount: 3},
{difficulty: "hard", sumScore: 4, responseCount: 2}
]

 

Sprint: QO 2023-04-03
Participants:

 Description   

I have pipelines that previously worked but no longer do since upgrading to 6.2.0. They all specifically have a GROUP_BY stage that is no longer working as expected. e.g.

[{
 $match: {
  'metadata.testId': 'foo'
 }
}, {
 $group: {
  _id: {
   difficulty: '$metadata.publicFacetLabels.difficulty'
  },
  responseCount: {
   $count: {}
  },
  sumScore: {
   $sum: '$metadata.score'
  }
 }
}, {
 $project: {
  _id: 0,
  sumScore: 1,
  responseCount: 1,
  difficulty: '$_id.difficulty'
 }
}]

In the above example I was previously getting the following result

 

{
 responseCount:166,       
 sumScore: 150,
 difficulty: "Hard"
} 

Since updating to version 6.2.0 I am getting:

 

{
sumScore: 150,

Note: these pipelines are being run against a timeseries collection

 
 



 Comments   
Comment by Milena Ivanova [ 17/Mar/23 ]

The issue was fixed in revision 6.2.1

Comment by Milena Ivanova [ 17/Mar/23 ]

I can confirm, that the bug exists in 6.2.0-rc0

{ "sumScore" : 2 }
{ "sumScore" : 2 }

It is fixed in MongoDB server version: 6.2.1

{ "responseCount" : 3, "sumScore" : 6, "difficulty" : "easy" }
{ "responseCount" : 2, "sumScore" : 4, "difficulty" : "hard" }

There were a number of tickets last year touching the optimization of $group over timeseries, I am not sure when exactly it was introduced. After the following commit, the aggregation works correctly.

commit 5a20629d209ea93d3685872c0d18fa508b7c7146 (tag: r6.2.1-rc0)
Date:   Fri Feb 10 22:25:37 2023 +0000
    SERVER-73822 Time-series $group rewrite ignores certain accumulators

An update to the latest 6.2 revision should fix the issue. 

 

 

Generated at Thu Feb 08 06:27:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.