[SERVER-81984] Use consistent tags for continuous stepdown thread logs Created: 09/Oct/23  Updated: 23/Nov/23  Resolved: 23/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Vishnu Kaushik
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2023-10-30, Repl 2023-11-13, Repl 2023-11-27
Participants:

 Description   

The ContinuousStepdown thread which runs in resmoke calls a replicaset.ReplicaSetFixture.stepup_node() method which uses the fixture's logger

The result is that the log emitted in this function are tagged with the replicaset tag (e.g. [j0:s0] )

This is very confusing for two main reasons:

  • The log is emitted by the stepdown thread and not from the replicaset itself. This makes very complicated to grep for all the logs line emitted by the stepdown thread.
  • The tag should belong to a shard but it doesn't contain the replicaset node number (e.g. [j0:s0:n1] ), thus it is exactly the same tag as the first mongos.

One way to solve this could be to have these methods accept a logging.Logger and having the ContinuousStepdown thread pass in its logger would be one way to associate the work being done on behalf of the ContinuousStepdown thread.



 Comments   
Comment by Githook User [ 23/Nov/23 ]

Author:

{'name': 'kauboy26', 'email': 'vishnu.kaushik@mongodb.com', 'username': 'kauboy26'}

Message: SERVER-81984 have ContinuousStepdown log that it is attempting to step up a node.
Branch: master
https://github.com/mongodb/mongo/commit/4087dbb4c898744aceac9af4c8aa3dc6b9f1f6ff

Comment by Tommaso Tocci [ 24/Oct/23 ]

Do you think maybe having a log line before it attempts to step any node up would be sufficient? For example, "Attempting to step up a new node from the list of secondaries..."

No, I don't think that would be sufficient because it would solve the problem only for the stepup action but not for all the other replicaset operation the ContinuousStepdown thread is performing. For instance, we have the exact same problem with step-down or kill primary.

Comment by Vishnu Kaushik [ 17/Oct/23 ]

I was going to also change the log prefixes, as we did in SERVER-54723. But it looks like there's a possibility that this would break the BF / BFG parsing system, as mentioned here. So instead I've just made a very small change to the hook to add additional log lines.

If we want to make the prefix change too we can file a separate ticket for SDP to do!

Comment by Vishnu Kaushik [ 11/Oct/23 ]

Ok, that naming convention sounds good. I think the ContinuousStepdown hook logs when it has finished stepping a node up. For example,

[ContinuousStepdown:job0] Starting stepdown of all primaries
...
[ContinuousStepdown:job0] Successfully stepped up the secondary on port 20000 of replica set 'shard-rs0'.
...
[ContinuousStepdown:job0] Primary on port 20001 of replica set 'shard-rs0' stepped down.

Do you think maybe having a log line before it attempts to step any node up would be sufficient? For example, "Attempting to step up a new node from the list of secondaries..."

And then by filtering on "ContinuousStepdown" you will see the all the above logs including the new one above, and you can zoom in on the exact window of log lines to pay attention to.

Comment by Tommaso Tocci [ 11/Oct/23 ]

Do you think changing the name of the mongos / replica set would be better? For example, naming the replica set j0:sh0 ("sh" stands for shard) instead of j0:s0, to distinguish it from mongos.

vishnu.kaushik@mongodb.com what about j0:rs<X>:n<Y> where X is the shard/replicaset and Y is the node in the replicaset?

This would reduce confusion for sure but does not completely address the issue explained in this ticket. ContinuosStepdown thread logs are still difficult to grep since they do not have any common tag/prefix.

Comment by Vishnu Kaushik [ 10/Oct/23 ]

Hi tommaso.tocci@mongodb.com, it's unfortunate that the replica set tag [j0:s0] and the mongos name are exactly the same in Resmoke. There are other lines too that are emitted by the replica set that can be mistakenly thought to be emitted by mongos. For example:

[j0:s0] Issuing replSetInitiate command: {'_id': 'shard-rs0', 'protocolVersion': 1, 'settings': HistoryDict({'catchUpTimeoutMillis': 0, 'electionTimeoutMillis': 86400000}), 'members': [{'_id': 0, 'host': 'localhost:20000'}]}

Do you think changing the name of the mongos / replica set would be better? For example, naming the replica set j0:sh0 ("sh" stands for shard) instead of j0:s0, to distinguish it from mongos.

Generated at Thu Feb 08 06:47:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.