[SERVER-42385] election timeout callbacks are scheduled too frequently Created: 24/Jul/19 Updated: 29/Oct/23 Resolved: 24/Oct/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07, Repl 2019-10-21, Repl 2019-11-04 | ||||||||
| Participants: | |||||||||
| Description |
|
In the process of investigating a BF, I discovered this behavior: If you jump to line 5707 in that log, you can see that "Scheduling election timeout callback at " is happening on average every 2-3 milliseconds, for many seconds. Additionally, even within one thread, the callback is not being repeatedly scheduled for times exclusively in the future of the previous scheduling; sometimes it goes backwards in time. This doesn't seem correct nor efficient. |
| Comments |
| Comment by Githook User [ 23/Oct/19 ] |
|
Author: {'name': 'Matthew Russotto', 'username': 'mtrussotto', 'email': 'matthew.russotto@mongodb.com'}Message: |
| Comment by Matthew Russotto [ 10/Sep/19 ] |
|
Eric: Each time we reschedule the timeout, we reschedule it for now() + the election timeout + some random jitter. The random jitter is independent of the last scheduled timeout, so if we're rescheduling quickly, the timeout can go backwards. Judah: We currently don't know if we're scheduling the election timeout after it actually fired or just pushing it forward, but we could do so (I think it may actually be a bug that we don't) and then bump the reschedule case up to LOG_FOR_ELECTION(5). We probably also want to use LOG_FOR_ELECTION(5) instead of LOG(4) here |
| Comment by Eric Milkie [ 04/Sep/19 ] |
|
Do we have an answer for why the rescheduling isn't monotonically increasing? |
| Comment by Judah Schvimer [ 03/Sep/19 ] |
|
Should we improve the logging to make it clear this is rescheduling a timeout, and not just scheduling a new one? Or is that always the case and not actually helpful? |
| Comment by Matthew Russotto [ 30/Aug/19 ] |
|
This actually occurs because we're doing a bulk write, and each batch causes us to reschedule the election timeout. So reducing the logging is probably the right answer. |
| Comment by Matthew Russotto [ 30/Aug/19 ] |
|
Oops, I got fooled, there's actually a wrapper function called ReplicationCoordinatorImpl::_scheduleWorkAt which should prevent this from happening. |
| Comment by Matthew Russotto [ 29/Aug/19 ] |
|
Also the backwards-in-time is because we add jitter to the election timeout; it's normal. |
| Comment by Matthew Russotto [ 29/Aug/19 ] |
|
For heartbeats, we check to see whether the timeout was canceled before attempting to reschedule it. For elections, we do not. We unconditionally cancel and reschedule the election timeout. This means we can get into a loop T1: Start election timeout (because we use thread pools, it actually isn't necessary that T1 and T2 are different threads) If the callback is canceled, we should not reschedule it in the callback function, nor execute an election. It appears a similar bug also exists in priority takeover and catchup takeover, where canceling the takeover starts an election; we should fix this and add testing to make sure it doesn't happen. Somewhat mystifying is the lack of logs indicating that the election started or failed to start. It is not clear if this is a real thing or a log artifact. |