Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17424

WiredTiger uses substantially more memory than accounted for by cache

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.3
    • Component/s: WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      Running sysbench shows the following timeline:

      • At A the cache has reached its configured limit (actually 85%, I believe). At B the insert phase of the benchmark ends and the update phase begins.
      • Second row shows allocated bytes as reported by tcmalloc. Per documentation I believe this is the bytes as requested by the app so does not include any tcmalloc overhead. This reaches approximately 6 GB.
      • Third row shows bytes in cache as reported by WT, which reases 5 GB, about 20% less than the allocated bytes.
      • Last row shows difference between bytes in cache and allocated bytes. This grows in proportion to the number of bytes in the cache at a rate of about 20%, and stops growing at the point where bytes in cache stops growing. This suggests that WT allocates about 20% more bytes than are accounted for in the "bytes currently in cache" statistic when it is doing inserts.

      Call stack data was obtained by using perf to monitor calls to all tmalloc entry points. Here's a partially expanded reversed call tree for a shorter run up to 449 MB bytes allocated by WT. Timeline shows currently active memory (i.e. accounting for both allcoate and free) charged to each call site over the course of the run. Read max bytes active throughout the run for each call site from the "max.MB" column. Full call tree attached as well.

      1. perf-wt-reverse.html.gz
        2.72 MB
        Bruce Lucas
      2. ss.log
        20.26 MB
        Matt Parlane
      1. cache_overhead=30.png
        74 kB
      2. cache.png
        379 kB
      3. noncache.png
        87 kB
      4. perf-wt-reverse.png
        290 kB
      5. repro.png
        70 kB
      6. simple.png
        64 kB
      7. sysbench.png
        105 kB

        Issue Links

          Activity

          Hide
          matt.parlane@gmail.com Matt Parlane added a comment -

          I've just started the logging now.

          I'm not sure why I didn't think of this before, but I just realised that the machine is also running both MMS backup and monitoring daemons. I was thinking about what would cause the increase in open cursors and I realised they were both still running. Apologies for not clicking before, sorry if I've wasted your time.

          I've just stopped the daemons, I'll see if memory usage still grows.

          Matt

          Show
          matt.parlane@gmail.com Matt Parlane added a comment - I've just started the logging now. I'm not sure why I didn't think of this before, but I just realised that the machine is also running both MMS backup and monitoring daemons. I was thinking about what would cause the increase in open cursors and I realised they were both still running. Apologies for not clicking before, sorry if I've wasted your time. I've just stopped the daemons, I'll see if memory usage still grows. Matt
          Hide
          pamp Paulo added a comment -

          Hi,
          Any news about this issue?

          Paulo

          Show
          pamp Paulo added a comment - Hi, Any news about this issue? Paulo
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Paulo Pereira, this issue has been scheduled for the current development cycle. Updates will be posted here as they become available, feel free to watch this ticket to be notified of updates.

          Regards,
          Ramón.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Paulo Pereira , this issue has been scheduled for the current development cycle. Updates will be posted here as they become available, feel free to watch this ticket to be notified of updates. Regards, Ramón.
          Hide
          bruce.lucas Bruce Lucas added a comment -

          Behavior is considerably improved in 3.0.3. Running the repro above on a large machine until amount of memory in cache is 40 GB:

                 cache   alloc   virtual   alloc/   virtual/
                 (MB)    (MB)    (MB)      cache    cache
          3.0.3  41061   42768   43538     1.042    1.060
          3.0.0  40980   48551   49269     1.185    1.202
          

          In 3.0.0, the excess memory consumed was about 19% (tcmalloc-reported allocated memory) or 20% (virtual memory). In 3.0.3 that has been reduced to about 4% and 6% respectively.

          The 6% virtual memory overage will include memory in free pools and non-allocated overhead such as stacks that WT cannot be expected to account for.

          The 4% allocated memory overage will include some fixed overhead allocated outside of WT, but that should be negligible relative to the 40GB, so presumably this mostly represents the approximate remaining underaccounting of memory by WT, which I believe includes things like internal fragmentation and other allocator overhead.

          Show
          bruce.lucas Bruce Lucas added a comment - Behavior is considerably improved in 3.0.3. Running the repro above on a large machine until amount of memory in cache is 40 GB: cache alloc virtual alloc/ virtual/ (MB) (MB) (MB) cache cache 3.0.3 41061 42768 43538 1.042 1.060 3.0.0 40980 48551 49269 1.185 1.202 In 3.0.0, the excess memory consumed was about 19% (tcmalloc-reported allocated memory) or 20% (virtual memory). In 3.0.3 that has been reduced to about 4% and 6% respectively. The 6% virtual memory overage will include memory in free pools and non-allocated overhead such as stacks that WT cannot be expected to account for. The 4% allocated memory overage will include some fixed overhead allocated outside of WT, but that should be negligible relative to the 40GB, so presumably this mostly represents the approximate remaining underaccounting of memory by WT, which I believe includes things like internal fragmentation and other allocator overhead.
          Hide
          bruce.lucas Bruce Lucas added a comment -

          With the accounting change that went into 3.0.3 the overuse of memory fell from about 18% to about 4%, which is now on the same order as other non-cache uses of memory, and is no longer substantial overuse.

          Show
          bruce.lucas Bruce Lucas added a comment - With the accounting change that went into 3.0.3 the overuse of memory fell from about 18% to about 4%, which is now on the same order as other non-cache uses of memory, and is no longer substantial overuse.

            People

            • Votes:
              8 Vote for this issue
              Watchers:
              38 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: