[SERVER-29937] Make sure liveness timeouts cannot be missed Created: 30/Jun/17 Updated: 30/Oct/23 Resolved: 15/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.18, 3.4.11, 3.6.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||
| Sprint: | Repl 2017-10-02 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
In ReplicationCoordinatorImpl::_scheduleNextLivenessUpdate_inlock(), we do not schedule a new liveness update if the nextTimeout would be in the past. This is wrong; we should schedule an immediate liveness update in that case. One scenario is that we have just run our liveness check and the earliest live member was just barely fresh ("almost stale"), so we do nothing. A small time passes before we schedule the new one, and now that member is stale, so the next timeout period is in the past. We then stop doing liveness checks. |
| Comments |
| Comment by Githook User [ 15/Nov/17 ] |
|
Author: {'name': 'Judah Schvimer', 'username': 'judahschvimer', 'email': 'judah@mongodb.com'}Message: (cherry picked from commit f1bf0b33b4f1ce7bb50f208ef5e2d736ef5eba68) |
| Comment by Githook User [ 30/Oct/17 ] |
|
Author: {'email': 'judah@mongodb.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'}Message: (cherry picked from commit f1bf0b33b4f1ce7bb50f208ef5e2d736ef5eba68) |
| Comment by Ramon Fernandez Marina [ 15/Sep/17 ] |
|
Author: {'username': u'judahschvimer', 'name': u'Judah Schvimer', 'email': u'judah@mongodb.com'}Message: |