Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10362

yielding during read queries waiting too long for fair locking

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.6, 2.5.2
    • Component/s: Concurrency, Replication
    • Labels:
      None
    • Operating System:
      ALL

      Description

      MongoDB Status as of January 3rd, 2014

      ISSUE SUMMARY

      Starting in version 2.4, a call to sleepmicros(1) was added to the yielding logic which is called after scanning each 256 documents when finding the start position for the tailable cursor used in replication. On some hypervisors, the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond"). It can sleep as long as one millisecond or longer, at times.

      USER IMPACT
      The change to the yielding logic on certain hypervisors can affect the method "FindingStartCursor", which is the method that replica sets use to find their position in their sync source's oplog. On a busy system this query could take a lot longer than it should have, sometimes resulting in timeouts.

      SOLUTION
      Replace the call to sleepmicros(1) with a call to pthread_yield() on linux.

      WORKAROUNDS
      There is no workaround.

      PATCHES
      Production release v2.4.6 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

      Original Description

      In 2.4, sleepmicros(1) was added to the yield code which is called after scanning 256 documents when finding the start position for the tailable cursor used in replication.
      On some platforms (in particular, virtualized ones), the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond")
      It can sleep as long as one millisecond or longer, at times.

      This affected FindingStartCursor particularly hard, which is the method that replica sets use to find their position in their sync source's oplog. Thus on a busy system this query was taking a lot longer than it should have, sometimes resulting in timeouts.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: