-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.4.0
-
Component/s: Concurrency, Replication
-
None
-
ALL
ISSUE SUMMARY
Starting in version 2.4, a call to sleepmicros(1) was added to the yielding logic which is called after scanning each 256 documents when finding the start position for the tailable cursor used in replication. On some hypervisors, the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond"). It can sleep as long as one millisecond or longer, at times.
USER IMPACT
The change to the yielding logic on certain hypervisors can affect the method "FindingStartCursor", which is the method that replica sets use to find their position in their sync source's oplog. On a busy system this query could take a lot longer than it should have, sometimes resulting in timeouts.
SOLUTION
Replace the call to sleepmicros(1) with a call to pthread_yield() on linux.
WORKAROUNDS
There is no workaround.
PATCHES
Production release v2.4.6 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.
Original Description
In 2.4, sleepmicros(1) was added to the yield code which is called after scanning 256 documents when finding the start position for the tailable cursor used in replication.
On some platforms (in particular, virtualized ones), the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond")
It can sleep as long as one millisecond or longer, at times.
This affected FindingStartCursor particularly hard, which is the method that replica sets use to find their position in their sync source's oplog. Thus on a busy system this query was taking a lot longer than it should have, sometimes resulting in timeouts.
- duplicates
-
SERVER-8939 consider making server not use sleepmicros()
- Closed
- is related to
-
SERVER-9707 Make oplog timeout configurable
- Closed