[SERVER-81868] SBE $group implementation still scales poorly with number of accumulators Created: 04/Oct/23  Updated: 05/Oct/23

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ian Boros Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

If we run a simple test with a $group by query with many accumulators, SBE performs worse than classic, and the gap appears to increase as the number of accumulators grows.

For many queries the runtime is dominated by other work besides the accumulators (reading data, evaluating other expressions, etc). In these cases, the "regression" in time spent accumulating may not be visible at all. On the other hand, when running the accumulators is a large fraction of the query runtime, there is a clear difference.

Currently the only way to see this issue is through queries with a large (20+) number of a accumulators. However, when running a time series $group query in SBE, we see similar behavior. This is because with time series, the amount of work done to read each document is relatively small, so the $group-by processing represents a greater fraction of the runtime.

The issue appears to be most severe with $avg, presumably because the SBE implementation decomposes this into two separate accumulators (sum and count).


Generated at Thu Feb 08 06:47:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.