Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.4.6, 2.5.2
Affects Version/s: 2.4.0
Component/s: Concurrency, Replication
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

MongoDB Status as of January 3rd, 2014

ISSUE SUMMARY

Starting in version 2.4, a call to sleepmicros(1) was added to the yielding logic which is called after scanning each 256 documents when finding the start position for the tailable cursor used in replication. On some hypervisors, the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond"). It can sleep as long as one millisecond or longer, at times.

USER IMPACT
The change to the yielding logic on certain hypervisors can affect the method "FindingStartCursor", which is the method that replica sets use to find their position in their sync source's oplog. On a busy system this query could take a lot longer than it should have, sometimes resulting in timeouts.

SOLUTION
Replace the call to sleepmicros(1) with a call to pthread_yield() on linux.

WORKAROUNDS
There is no workaround.

PATCHES
Production release v2.4.6 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

In 2.4, sleepmicros(1) was added to the yield code which is called after scanning 256 documents when finding the start position for the tailable cursor used in replication.
On some platforms (in particular, virtualized ones), the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond")
It can sleep as long as one millisecond or longer, at times.

This affected FindingStartCursor particularly hard, which is the method that replica sets use to find their position in their sync source's oplog. Thus on a busy system this query was taking a lot longer than it should have, sometimes resulting in timeouts.

duplicates

SERVER-8939 consider making server not use sleepmicros()

Closed

is related to

SERVER-9707 Make oplog timeout configurable

Closed

Assignee:: Eliot Horowitz (Inactive)
Reporter:: Daniel Pasette (Inactive)
Participants:: auto, Daniel Pasette, Eliot Horowitz
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jul 27 2013 06:07:03 PM UTC
Updated:: Jul 11 2016 05:37:35 PM UTC
Resolved:: Jul 30 2013 01:51:44 PM UTC

Details

Description

Original Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty