[SERVER-10362] yielding during read queries waiting too long for fair locking Created: 27/Jul/13  Updated: 11/Jul/16  Resolved: 30/Jul/13

Status: Closed
Project: Core Server
Component/s: Concurrency, Replication
Affects Version/s: 2.4.0
Fix Version/s: 2.4.6, 2.5.2

Type: Bug Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-8939 consider making server not use sleepm... Closed
Related
is related to SERVER-9707 Make oplog timeout configurable Closed
Operating System: ALL
Participants:

 Description   
MongoDB Status as of January 3rd, 2014

ISSUE SUMMARY

Starting in version 2.4, a call to sleepmicros(1) was added to the yielding logic which is called after scanning each 256 documents when finding the start position for the tailable cursor used in replication. On some hypervisors, the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond"). It can sleep as long as one millisecond or longer, at times.

USER IMPACT
The change to the yielding logic on certain hypervisors can affect the method "FindingStartCursor", which is the method that replica sets use to find their position in their sync source's oplog. On a busy system this query could take a lot longer than it should have, sometimes resulting in timeouts.

SOLUTION
Replace the call to sleepmicros(1) with a call to pthread_yield() on linux.

WORKAROUNDS
There is no workaround.

PATCHES
Production release v2.4.6 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

Original Description

In 2.4, sleepmicros(1) was added to the yield code which is called after scanning 256 documents when finding the start position for the tailable cursor used in replication.
On some platforms (in particular, virtualized ones), the sleepmicros() call actually sleeps for much more than a microsecond (its guaranteed to "sleep at least 1 microsecond")
It can sleep as long as one millisecond or longer, at times.

This affected FindingStartCursor particularly hard, which is the method that replica sets use to find their position in their sync source's oplog. Thus on a busy system this query was taking a lot longer than it should have, sometimes resulting in timeouts.



 Comments   
Comment by auto [ 30/Aug/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-10362 actually sleep when writers yield
Branch: master
https://github.com/mongodb/mongo/commit/c0d3a2fc9969c36045c5b753a3a7a7a26bb990be

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@10gen.com'}

Message: SERVER-10362: call ClientCursor::staticYield to do a pthread_yield rather than sleepmicros.
Branch: v2.4
https://github.com/mongodb/mongo/commit/d502de0e89f72e6a912d68dfa34b14354d3210d1

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-10362: only use pthread_yield on linux, on osx use sleepmicros(1)
Branch: v2.4
https://github.com/mongodb/mongo/commit/12ecb7dbc765df9544d21bb3823a5fbde0efe0e6

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-10362: add option to ClientCursor::staticYield to do a pthread_yield instead of sleepmicros
this is mostly because of some platforms (like xen) where sleepmicros is too inaccurate
Branch: v2.4
https://github.com/mongodb/mongo/commit/6d738db1f45ec0f7447e2a14e0216af2410074c8

Comment by auto [ 29/Jul/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-10362: only use pthread_yield on linux, on osx use sleepmicros(1)
Branch: master
https://github.com/mongodb/mongo/commit/9498d1148cb14c862b615a528e65aa757705aa00

Comment by auto [ 29/Jul/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-10362: tell ClientCursor::staticYield to do a pthread_yield rather than sleepmicros
this ensures this call doesn't take too long on some platforms (xen)
Branch: master
https://github.com/mongodb/mongo/commit/11f47af7789e8825d92bbcf76b6930d2962fbb7e

Comment by auto [ 29/Jul/13 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-10362: add option to ClientCursor::staticYield to do a pthread_yield instead of sleepmicros
this is mostly because of some platforms (like xen) where sleepmicros is too inaccurate
Branch: master
https://github.com/mongodb/mongo/commit/d2ce20775779a3116f693b9e60744234bdbc5f50

Generated at Thu Feb 08 03:22:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.