[SERVER-81911] Reduce memory checking frequency in the window stage Created: 05/Oct/23  Updated: 25/Oct/23  Resolved: 25/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Rui Liu Assignee: Rui Liu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Participants:
Linked BF Score: 135

 Description   

We think the estimation code for the hash_agg stage is not suitable (or at least unwieldy) for window functions, for a few reasons:

1. It defines a memory checkpoint for spilling, and the checkpoint counter is incremented per record processed. However, in window function project, we have two parts of memory (window buffer, and window states) and they are updated at different paces. This makes it hard to define memory checking that is checkpoint-based.

2. The checkpoint calculation references the spilling threshold, we cannot easily transfer this to our context, because the memory is divided into two parts.

3. Most importantly, in the hash_agg case, we assume the hash table size is either stable or linearly growing. We don't have this guarantee in our context. For example, for a range-based window, the window frame size might differ drastically for each record. I'm not sure how to define a reasonable checkpoint in this case.

We should come up with an actual model of the memory estimation, as a function of the window frame / window buffer size. It's reasonable to assume the model is linear (including those states with constant size). We propose the following:

We estimate memory for the window buffer and window states differently. The window buffer memory is estimated by the average of each record sample. We simply multiply the average by the number of records in the window buffer.

We don't have access to the delta memory change for the window state, and we can only get the memory for the entire state. So instead we use each memory sample to calculate a linear regression model, where x is the size of the window frame, and y is the memory size of the window state.

The memory samples are taken in an exponential backoff way, every one record / frame size, then every two record / frame size, every four and so on, up to a certain maximum interval. This is also what hash_agg stage does with a configurable query knob.



 Comments   
Comment by Githook User [ 25/Oct/23 ]

Author:

{'name': 'Rui Liu', 'email': 'lriuui0x0@gmail.com', 'username': 'lriuui0x0'}

Message: SERVER-81911 Reduce memory checking frequency in the window stage
Branch: master
https://github.com/mongodb/mongo/commit/c9a24ae2c97ed4865aa9bdb15a0bf31abf4554d4

Generated at Thu Feb 08 06:47:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.