Core Server
  1. Core Server
  2. SERVER-574

Detrimental performance when paging (need to reduce concurrency, use madvise and mincore)

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major - P3 Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 1.3.1
    • Fix Version/s: planned but not scheduled
    • Component/s: Performance
    • Labels:
      None
    • Environment:
      Ubuntu 9.10, 64 bit
    • Backport:
      No
    • # Replies:
      3
    • Last comment by Customer:
      true

      Description

      If the memory mapped pages of a file are not present in RAM then a page fault is taken to get the contents. If there is concurrency and the requests do not have much in common then the situation gets worse as many (random) different areas will be taking page faults which causes lots of disk accesses including random seeks dropping throughput considerably. As this slows down completion of all executing requests, it increases the chance of another request coming in and if that starts executing, it will make things even worse. What you end up seeing is an essentially idle CPU, the I/O subsystem at 100% capacity and hard disks seeking their hearts out.

      The consequence is that when MongoDB hits this capacity limit, performance falls off a cliff. There are several things that can be done to correct this:

      • Reduce concurrency as saturation is approached to let requests complete quicker instead of having lots of slow very long running requests
      • Under POSIX the madvise system call can be used. For example if an index or data is being sequentially read you could use madvise MADV_SEQUENTIAL|MADV_WILLNEED to suggest the kernel fill those pages in. You can use MADV_DONTNEED on pages that won't be needed again in the near future, as that will help the kernel determine which pages can be evicted to make space for new ones.
      • You can use the mincore system call to determine if a page fault will be taken for a memory range. This is probably the best test for available concurrency (ie throttle how often you proceed when it returns false)

        Activity

        Hide
        Roger Binns
        added a comment -

        In a benchmark run, going from 5 concurrent worker processes to 3 decreased run time from 10h1m to 7h12m. The concurrency was killing performance! Conversely with CouchDB 5 workers took 8h1m and 3 took 10h.

        Show
        Roger Binns
        added a comment - In a benchmark run, going from 5 concurrent worker processes to 3 decreased run time from 10h1m to 7h12m. The concurrency was killing performance! Conversely with CouchDB 5 workers took 8h1m and 3 took 10h.
        Hide
        Matthias Götzke
        added a comment -

        it might be possible to achieve the same warning by looking at deviations of query time over the last x seconds/minutes. this way os specific functions would not be needed (especially since they might be difficult to work into the access code).

        e.g.

        have x concurrent workers, watch speed over last x queries or seconds.
        try with x-1 -> compare
        try with x+1 -> compare
        adjust x to best speed

        do it again after x seconds or after deviation detected

        limit x to be within min/max

        it would be a similar auto-detection mechanism as used for indices, optimizing for best worker thread size automatically

        Show
        Matthias Götzke
        added a comment - it might be possible to achieve the same warning by looking at deviations of query time over the last x seconds/minutes. this way os specific functions would not be needed (especially since they might be difficult to work into the access code). e.g. have x concurrent workers, watch speed over last x queries or seconds. try with x-1 -> compare try with x+1 -> compare adjust x to best speed do it again after x seconds or after deviation detected limit x to be within min/max it would be a similar auto-detection mechanism as used for indices, optimizing for best worker thread size automatically
        Hide
        Roger Binns
        added a comment -

        @Matthias: While just watching query completion times will help, the problem with that is that throttling will affect all queries including those that could have been served out of memory immediately.

        For example lets say that half of queries are over the same range of data which is consequently in memory, and the other half are very random. Throttling will affect both whereas only the random ones need to be throttled.

        Taking advantage of operating system calls in order to be smart about throttling is a good thing.

        Show
        Roger Binns
        added a comment - @Matthias: While just watching query completion times will help, the problem with that is that throttling will affect all queries including those that could have been served out of memory immediately. For example lets say that half of queries are over the same range of data which is consequently in memory, and the other half are very random. Throttling will affect both whereas only the random ones need to be throttled. Taking advantage of operating system calls in order to be smart about throttling is a good thing.

          People

          • Votes:
            38 Vote for this issue
            Watchers:
            36 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              3 years, 28 weeks, 6 days ago
              Date of 1st Reply: