[SERVER-32906] Improve logging around elections Created: 25/Jan/18 Updated: 30/Oct/23 Resolved: 27/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Tess Avitabile (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | elections, neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-07-02, Repl 2018-07-16, Repl 2018-07-30 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
To aide diagnosing test failures, we should add logging any time we take an election-related action (election timeout expiring, calling for an election, voting in an election, etc), with all possible inputs to that decision (dump of heartbeat and spanning tree liveness tables, etc.). This could be done with a new log sub-component, so that we can have higher verbosity in our tests than in prod. |
| Comments |
| Comment by Githook User [ 27/Jul/18 ] |
|
Author: {'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com', 'username': 'tessavitabile'}Message: |
| Comment by Tess Avitabile (Inactive) [ 10/Jul/18 ] |
|
Thank you! |
| Comment by William Schultz (Inactive) [ 09/Jul/18 ] |
|
tess.avitabile You should now be able to change these settings to configure Evergreen specific logging levels. |
| Comment by William Schultz (Inactive) [ 05/Jul/18 ] |
|
The ability to configure log verbosity defaults differently for local testing and tests that run in Evergreen should be enabled by |
| Comment by Tess Avitabile (Inactive) [ 21/Jun/18 ] |
|
Sure, I can add that at the default log level. It's related to elections, and it seems easier than scheduling separate work. |
| Comment by Spencer Brody (Inactive) [ 21/Jun/18 ] |
|
Hmm... I'd actually like something at default log level for the decision on whether or not to go into catchup mode, but maybe the full replSetGetStatus output is too verbose to log at default level. I think just a line with the target optime and the current optimes of each other node would suffice. Maybe that's out of scope for this ticket though. |
| Comment by Tess Avitabile (Inactive) [ 21/Jun/18 ] |
|
For For Yes, we do have logging at level 0 when we fail to receive a response from a node. I believe the ThreadPoolTaskExecutor calls the callback with a non-ok RemoteCommandResponse when there is a timeout. Sure, I can add logging of replSetGetStatus output when we decide whether to go into catchup mode. I'll put this in the election sub-component at level 4. |
| Comment by Spencer Brody (Inactive) [ 21/Jun/18 ] |
|
Everything laid out sounds good. The two linked duplicate tickets ( |
| Comment by Tess Avitabile (Inactive) [ 20/Jun/18 ] |
|
We have the following logging for election events, with the following log levels:
I think we should do the following for this ticket:
spencer, can you please review this plan? |