[SERVER-10085] Heartbeats can time out due to high network latency while fetching oplog batches Created: 03/Jul/13 Updated: 11/Jul/16 Resolved: 09/Jul/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 2.4.6, 2.5.1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Matt Kangas | Assignee: | Eric Milkie |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
buildbot: Linux 64-bit DEBUG, Linux 64-bit debug dur off |
||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
MongoDB Status as of September 30th, 2013 ISSUE SUMMARY USER IMPACT SOLUTION WORKAROUNDS PATCHES Detailed description: bgsync::produce holds the BackgroundSync::_mutex through the call to r.tailingQueryGTE, which fetches the next batch of data from the primary's oplog. If it takes a long time to get a response from the primary then heartbeats may start timing out as heartbeats also require getting the BackgroundSync::_mutex. The fix is to change bgsync::produce to call r.tailingQueryGTE outside of the _mutex lock.
Initial description by Eric on June 28:
This failure has been visible since Linux 64-bit DEBUG Build #2260 on June 27, but likely was hidden by simpler bugs. The last green Linux 64-bit DEBUG build was #2200 on June 13 (SHA1 86e76e34e88c). http://buildbot.10gen.cc/builders/Linux%2064-bit%20DEBUG?numbuilds=100 It is also visible in Linux 64-bit debug dur off builds since #2440 on June 29. Last green build on this builder was #2438 (SHA1 babd275f8818) http://buildbot.10gen.cc/builders/Linux%2064-bit%20debug%20dur%20off?numbuilds=50 |
| Comments |
| Comment by auto [ 15/Jul/13 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: |
| Comment by auto [ 09/Jul/13 ] |
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: |