Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30114

Monitor cumulative time spent in tcmalloc spin lock in serverStatus

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v3.6, v3.4, v3.2
    • Sprint:
      Platforms 2018-01-29, Platforms 2018-02-12
    • Case:

      Description

      We sometimes encounter allocator bottlenecks that currently require asking users to collect stack traces with gdb to diagnose. We could diagnose these bottlenecks from FTDC data if we counted cumulative time spent waiting for the primary lock involved. Typical stacks for such a bottleneck look like this:

      #0  0x00000000014a6f85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1  0x00000000014a6e57 in SpinLock::SlowLock() ()
      #2  0x00000000014a9393 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) ()
      #3  0x00000000014b4d2a in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
      

      #0  0x00000000014a6f85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1  0x00000000014a6e57 in SpinLock::SlowLock() ()
      #2  0x00000000014a8efd in tcmalloc::CentralFreeList::InsertRange(void*, void*, int) ()
      #3  0x00000000014b4ed8 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned long, int) ()
      #4  0x00000000014b4f7d in tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned long) ()
      

        Attachments

        1. load.java
          15 kB
        2. run3.png
          run3.png
          142 kB
        3. spinlock.diff
          3 kB
        4. spinlock.png
          spinlock.png
          99 kB
        5. spinlock2.diff
          3 kB

          Activity

            People

            • Votes:
              25 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: