[SERVER-59951] Make object form of the '_id' group-by expression work to handle multiple group-by keys. Created: 14/Sep/21  Updated: 29/Oct/23  Resolved: 19/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Improvement Priority: Major - P3
Reporter: Yoon Soo Kim Assignee: Yoon Soo Kim
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-64812 SBE interprets objects containing agg... Closed
Backwards Compatibility: Fully Compatible
Sprint: QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-01-24
Participants:

 Description   

Currently $group SBE stage builder (SlotBasedStageBuilder::buildGroup) does not support object form of the '_id' group-by expression like _id: {a: "$a", b: "$b"}. Two ideas were proposed so far.

1) One idea is to pass through ExpressionObject from DocumentSourceGroup to GroupNode QSN to SlotBasedStageBuilder::buildGroup and implement the expression walker for ExpressionObject.

  • We would get $object implementation almost for free too.
  • The group stage's group key becomes a full document with field names which would be less performing compared to multiple group-by keys just like the second idea.
  • We would get the same behavior as the classic engine for generated _id documents.
  • We need to modify DocumentSourceGroup and GroupNode. There would be chaining changes.

2) Another idea is to insert a mkbson stage manually to compose a _id document out of multiple group-by key expressions before returning a result document.

  • We don't need to modify DocumentSourceGroup and GroupNode. We just need to add some logic to SlotBasedStageBuilder::buildGroup.
  • The group stage's group key becomes multiple group-by key without field names which would be better performing compared to the first idea.
  • We would not get the same behavior as the classic engine since passed-through group-by key expressions do not follow the original order in _id document specification.

Maybe we can follow a hybrid approach to get the best out of two approaches above though code changes would be bigger.

1. Pass through ExpressionObject from DocumentSourceGroup to GroupNode QSN to SlotBasedStageBuilder::buildGroup and implement the expression walker for ExpressionObject.
2. Extract slots and expressions for _id document fields from the generated SBE PlanStage tree from ExpressionObject to pass them to makeHashAgg inside SlotBasedStageBuilder::buildGroup.

Pros:

  • We would get the same behavior as the classic engine for generated _id documents.
  • We would get the better performance for group stage.
  • We would get $object implementation almost for free too.

Cons:

  • Code changes would be bigger like more changes to SlotBasedStageBuilder::buildGroup and DocumentSourceGroup and GroupNode and chaining changes.


 Comments   
Comment by Githook User [ 19/Jan/22 ]

Author:

{'name': 'Yoonsoo Kim', 'email': 'yoonsoo.kim@mongodb.com', 'username': 'yun-soo'}

Message: SERVER-59951 Support document _id expression
Branch: master
https://github.com/mongodb/mongo/commit/ab99324a4ac6e15f1805fbfce29f90832f471330

Generated at Thu Feb 08 05:48:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.