[SERVER-55624] Track materialized counts and sums for time-series measurements Created: 30/Mar/21  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-55625 Optimize $group on meta $count/$sum/$... Backlog
Related
related to SERVER-55575 Optimize $group on meta $min/$max on ... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2021-10-04, Execution Team 2021-10-18, Execution Team 2021-11-01
Participants:

 Description   

This is a ticket meant to track an idea that was proposed in early conversations around time-series collections. We already record and aggregate a min and max for each measurement field per bucket. If we wanted to (perhaps with an opt-in from the user), we could extend this to other metrics like counts and sums.



 Comments   
Comment by Charlie Swanson [ 15/Sep/21 ]

Just wanted to bump this back to "Needs Scheduling" based off Rushan's latest comment about a similar optimization but a different approach. I think Rushan's approach would best live on the QO backlog, but we may not be at a point yet where we know if we want to do one or the other (or both depending?) Feel free to send this back to the backlog if we still believe materialization is the way to go or if it remains unimportant to figure this out.

Comment by Rushan Chen [ 15/Sep/21 ]

This could also be implemented without requiring additional storage. Sum and count could be pushed inside unpack (like sample) and output one sum and/or one count for each bucket.

More advanced rewrite could also handle group on meta and time, where group boundary is detected, and output one sum and count for each group (in this case there could be multiple groups for one bucket).

 

TSBS Benchmark "devops" workload has 6 queries implemented for mongodb and 4 have the following pattern:

(1) min/max/avg (sum over count)

(2) group over time interval (minutes or hours)

(3) none of the 4 is grouping only on a meta field

If buckets do not align with the group boundary, detecting group boundary would be needed for this optimization. (same applies to the min/max optimization).

We could choose to optimize the cases where buckets are aligned with group <time unit> as we could tell from granularity. In this case, the optimization could also work when grouping on an aligned time unit and/or meta field.

Generated at Thu Feb 08 05:36:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.