Add unique context to specific networking log messages

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Networking & Observability
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In a recent customer escalation, there is the question of whether the system clock is causing time to jump backwards, or if there is something wrong with the collection of log lines from the log.

      To help distinguish between the two possibilities, I went looking for duplicated log lines: lines in the log that happened at the same millisecond and have exactly the same fields.

      I found several examples of duplicated lines, where duplicated were not always adjacent to each other, but looking at the code that produces these lines I realized that they might actually be distinct events. It's just that there is not enough context in the log attributes to distinguish between the events.

      The list of duplicated log lines is attached. The format is "<count> <timestamp> <log line>".

      Here are the categories, ids, and messages of the duplicated lines:

      --------------

      NETWORK   4615610  ReplicaSetMonitor-TaskExecutor  Failed to check socket connectivity
      CONNPOOL  22561    ReplicaSetMonitor-TaskExecutor  Dropping unhealthy pooled connection
      NETWORK   4615610  ReplicaSetMonitor-TaskExecutor  Failed to check socket connectivity
      CONNPOOL  22561    ReplicaSetMonitor-TaskExecutor  Dropping unhealthy pooled connection
      NETWORK   4333208  ReplicaSetMonitor-TaskExecutor  RSM host selection timeout
      NETWORK   4712102  ReplicaSetMonitor-TaskExecutor  Host failed in replica set
      NETWORK   4615610  PeriodicTaskRunner              Failed to check socket connectivity
      REPL      3873113  BackgroundSync                  Cannot select sync source which is not ahead of me
      REPL      3873106  BackgroundSync                  Cannot select sync source because it is not up
      REPL      3873113  BackgroundSync                  Cannot select sync source which is not ahead of me

      -------------------

      Consider adding event-specific context to those log messages: for example, the raw pointer value of the relevant `ConnectionInterface*`.

      I don't think we will be able to uniquely identify each log event independently of the timestamp (not without a counter, anyway), but at least some of these log messages could have event-specific context added in order to distinguish them.

              Assignee:
              Unassigned
              Reporter:
              David Goffredo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: