Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-81911

Reduce memory checking frequency in the window stage

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • 135

      We think the estimation code for the hash_agg stage is not suitable (or at least unwieldy) for window functions, for a few reasons:

      1. It defines a memory checkpoint for spilling, and the checkpoint counter is incremented per record processed. However, in window function project, we have two parts of memory (window buffer, and window states) and they are updated at different paces. This makes it hard to define memory checking that is checkpoint-based.

      2. The checkpoint calculation references the spilling threshold, we cannot easily transfer this to our context, because the memory is divided into two parts.

      3. Most importantly, in the hash_agg case, we assume the hash table size is either stable or linearly growing. We don't have this guarantee in our context. For example, for a range-based window, the window frame size might differ drastically for each record. I'm not sure how to define a reasonable checkpoint in this case.

      We should come up with an actual model of the memory estimation, as a function of the window frame / window buffer size. It's reasonable to assume the model is linear (including those states with constant size). We propose the following:

      We estimate memory for the window buffer and window states differently. The window buffer memory is estimated by the average of each record sample. We simply multiply the average by the number of records in the window buffer.

      We don't have access to the delta memory change for the window state, and we can only get the memory for the entire state. So instead we use each memory sample to calculate a linear regression model, where x is the size of the window frame, and y is the memory size of the window state.

      The memory samples are taken in an exponential backoff way, every one record / frame size, then every two record / frame size, every four and so on, up to a certain maximum interval. This is also what hash_agg stage does with a configurable query knob.

            Assignee:
            rui.liu@mongodb.com Rui Liu
            Reporter:
            rui.liu@mongodb.com Rui Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: