[SERVER-42385] election timeout callbacks are scheduled too frequently Created: 24/Jul/19  Updated: 29/Oct/23  Resolved: 24/Oct/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-43089 Various timeouts should not start ele... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07, Repl 2019-10-21, Repl 2019-11-04
Participants:

 Description   

In the process of investigating a BF, I discovered this behavior:
https://logkeeper.mongodb.org/lobster/build/0b38b5af3083b9d185100bc1ce94b3a5/test/5d379645f84ae81c0108232f#bookmarks=0%2C8967&l=1

If you jump to line 5707 in that log, you can see that "Scheduling election timeout callback at " is happening on average every 2-3 milliseconds, for many seconds. Additionally, even within one thread, the callback is not being repeatedly scheduled for times exclusively in the future of the previous scheduling; sometimes it goes backwards in time. This doesn't seem correct nor efficient.
At least, can we change this log message to log level 5 so that it doesn't affect performance of the test by swamping the log system?



 Comments   
Comment by Githook User [ 23/Oct/19 ]

Author:

{'name': 'Matthew Russotto', 'username': 'mtrussotto', 'email': 'matthew.russotto@mongodb.com'}

Message: SERVER-42385 Reduce log spam associated with election timeout resets.
Branch: master
https://github.com/mongodb/mongo/commit/2070d8ca09576b858c61677bc366a25e7ecfecad

Comment by Matthew Russotto [ 10/Sep/19 ]

Eric: Each time we reschedule the timeout, we reschedule it for now() + the election timeout + some random jitter. The random jitter is independent of the last scheduled timeout, so if we're rescheduling quickly, the timeout can go backwards.

https://github.com/mongodb/mongo/blob/96a1a78772038fa29b94dbe92b51b9e9957d3f6f/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L853

Judah: We currently don't know if we're scheduling the election timeout after it actually fired or just pushing it forward, but we could do so (I think it may actually be a bug that we don't) and then bump the reschedule case up to LOG_FOR_ELECTION(5). We probably also want to use LOG_FOR_ELECTION(5) instead of LOG(4) here

https://github.com/mongodb/mongo/blob/96a1a78772038fa29b94dbe92b51b9e9957d3f6f/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp#L831

Comment by Eric Milkie [ 04/Sep/19 ]

Do we have an answer for why the rescheduling isn't monotonically increasing?

Comment by Judah Schvimer [ 03/Sep/19 ]

Should we improve the logging to make it clear this is rescheduling a timeout, and not just scheduling a new one? Or is that always the case and not actually helpful?

Comment by Matthew Russotto [ 30/Aug/19 ]

This actually occurs because we're doing a bulk write, and each batch causes us to reschedule the election timeout. So reducing the logging is probably the right answer.

Comment by Matthew Russotto [ 30/Aug/19 ]

Oops, I got fooled, there's actually a wrapper function called ReplicationCoordinatorImpl::_scheduleWorkAt which should prevent this from happening.

Comment by Matthew Russotto [ 29/Aug/19 ]

Also the backwards-in-time is because we add jitter to the election timeout; it's normal.

Comment by Matthew Russotto [ 29/Aug/19 ]

For heartbeats, we check to see whether the timeout was canceled before attempting to reschedule it. For elections, we do not. We unconditionally cancel and reschedule the election timeout. This means we can get into a loop

T1: Start election timeout
T2: Cancel election timeout, start election timeout
T1: Receive cancel from T2, cancel election timeout, start election timeout
T2: Receive cancel from T1, cancel election timeout, start election timeout

(because we use thread pools, it actually isn't necessary that T1 and T2 are different threads)

If the callback is canceled, we should not reschedule it in the callback function, nor execute an election.

It appears a similar bug also exists in priority takeover and catchup takeover, where canceling the takeover starts an election; we should fix this and add testing to make sure it doesn't happen.

Somewhat mystifying is the lack of logs indicating that the election started or failed to start. It is not clear if this is a real thing or a log artifact.

Generated at Thu Feb 08 05:00:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.