-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Component/s: None
-
Networking & Observability
-
None
-
3
-
TBD
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In a recent customer escalation, there is the question of whether the system clock is causing time to jump backwards, or if there is something wrong with the collection of log lines from the log.
To help distinguish between the two possibilities, I went looking for duplicated log lines: lines in the log that happened at the same millisecond and have exactly the same fields.
I found several examples of duplicated lines, where duplicated were not always adjacent to each other, but looking at the code that produces these lines I realized that they might actually be distinct events. It's just that there is not enough context in the log attributes to distinguish between the events.
The list of duplicated log lines is attached. The format is "<count> <timestamp> <log line>".
Here are the categories, ids, and messages of the duplicated lines:
--------------
NETWORK 4615610 ReplicaSetMonitor-TaskExecutor Failed to check socket connectivity
CONNPOOL 22561 ReplicaSetMonitor-TaskExecutor Dropping unhealthy pooled connection
NETWORK 4615610 ReplicaSetMonitor-TaskExecutor Failed to check socket connectivity
CONNPOOL 22561 ReplicaSetMonitor-TaskExecutor Dropping unhealthy pooled connection
NETWORK 4333208 ReplicaSetMonitor-TaskExecutor RSM host selection timeout
NETWORK 4712102 ReplicaSetMonitor-TaskExecutor Host failed in replica set
NETWORK 4615610 PeriodicTaskRunner Failed to check socket connectivity
REPL 3873113 BackgroundSync Cannot select sync source which is not ahead of me
REPL 3873106 BackgroundSync Cannot select sync source because it is not up
REPL 3873113 BackgroundSync Cannot select sync source which is not ahead of me
-------------------
Consider adding event-specific context to those log messages: for example, the raw pointer value of the relevant `ConnectionInterface*`.
I don't think we will be able to uniquely identify each log event independently of the timestamp (not without a counter, anyway), but at least some of these log messages could have event-specific context added in order to distinguish them.