Track, expose and test memory usage in AsyncResultsMerger

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Execution
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The AsyncResultsMerger is used on mongos instances to aggregate the query results from the different shards.

      The AsyncResultsMerger will send getMore requests to the shards asynchronously. When a response from a shard comes in, the AsyncResultsMerger will extract the individual documents from it and buffer them in a queue specific to the particular shard.
      Buffering is necessary because some variants of the AsyncResultsMerger cannot just return the received documents to the client, but need to return a sorted stream of results. For this, the AsyncResultsMerger uses a merge queue.

      Buffering the received documents in memory consumes resources. We currently have no insights into how much memory is used for buffering the documents in a single AsyncResultsMerger instance nor for all AsyncResultsMerger instances combined in a mongos instance.
      It would be good to track the memory usage for buffered documents in each single AsyncResultsMerger instance, so that we can query it programmatically (e.g. from inside unittests) to guard against regressions. It will also help us to reason about better and more efficient prefetching strategies for the AsyncResultsMerger in the future. More efficient prefetching may come as part of SPM-4203.
      It is currently unclear if and how we should expose the memory usage of a single AsyncResultsMerger instance to end users. It is probably not necessary.

      It would also be good to add a global, observable metric that combines the memory usage of all AsyncResultsMerger instances on a mongos instance. Exposing the total memory usage for buffered documents will lead to better observability, may help us to detect the root cause of out-of-memory failures on mongos, and could also help us in the future to prevent them (e.g. by throttling workloads).

      We also discovered by accident that the AsyncResultsMerger currently buffers the full BSON responses in memory yet another time in order to track additional transaction participants that were added by other shards. This full buffering is currently necessary because handling of the shards' getMore responses is performed by the network thread on mongos, which cannot assume ownership of the OperationContext and cannot assume thread-safe access to the TransactionRouter instance.
      The buffering of the full BSON responses will be removed by SERVER-105552, but even after that, the AsyncResultsMerger will buffer a small amount of metadata for every received response until it can be processed. This amount of memory should also be accounted for in the above memory usage metrics.

      Once the memory usage tracking for a single AsyncResultsMerger instance is in place, we unittest it and guard against future regressions w.r.t. memory usage.

            Assignee:
            Unassigned
            Reporter:
            Jan Steemann
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: