Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87828

Memory tracking undercounts by 33%

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Atlas Streams
    • Fully Compatible
    • Sprint 47, Sprint 48, Sprint 49

      https://docs.google.com/document/d/1zPurErldtRGkl9COOM8jv_R_SAkfn4KW2Zd0temb71I/edit

      1. Our streams Memory tracking undercounts by 33%. This undercounting can lead to some pod/OS “out of memory” errors if we’re not careful. 
      2. As part of this work we should extend the testing to other common memory-intensive pipelines. We might be missing other important allocation sites. Our “above the allocator” approach to memory tracking is hard to make fully accurate. We can also consider multiplying the MemoryTracker numbers by 1.X to account for undercounting.
      3. One important aspect of this issue is: for an $unwind that duplicates strings, we save memory by referencing counting. However when restoring from a checkpoint we duplicate the string memory. This scenario is (in my opinion) not worth optimizing in checkpointing… but I’ve been able to cause a pod/OS OOM with this sort of pipeline during checkpoint restore. We should identify why the MemoryTracker is not catching this scenario.

      See the attached results.json[0]["heapProfileBeforeCheckpoint"] for the stacks reported by the heap profiler. Guessing a bit, but we might be undercounting in the string allocation stack here:

                  "0": "tcmalloc::tcmalloc_internal::SampleifyAllocation<>()",
                  "1": "slow_alloc<>()",
                  "2": "mongo::ValueStorage::putString()",
                  "3": "mongo::ExpressionConcat::evaluate()",
                  "4": "mongo::projection_executor::ProjectionNode::applyExpressions()",

            Assignee:
            harendra.chawla@mongodb.com Harendra Chawla
            Reporter:
            matthew.normyle@mongodb.com Matthew Normyle
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: