Determine whether system clock drift corrupts ReplicaSetMonitor

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In HELP-77706, a three member replica set (instances 0, 1, and 2) got into a state during maintenance where instances 0 and 2 thought that instance 1 was primary, while instance 1 responded to queries as if it were secondary. Instance 1's FTDC replication data indicated that it thought that it was primary, and so the issue lay in a disagreement between instance 1's replication coordinator (which was correct) and instance 1's replica set monitor (which was incorrect).

      During the investigation, we noticed that the log lines in all instances, but especially in instance 1, were not in chronological order by timestamp (.t.$date). The log would regularly jump backwards in time. In the logs for instance 0 and 2, the jumps were infrequent and almost always exactly one millisecond. On instance 1, the jumps were more frequent and often as much as four minutes.

      I (david.goffredo@mongodb.com) read through the replica set monitor (SDAM) server code to try to find a way that these large backward jumps in time could get the server into a state where the replica set monitor thinks that the instance is secondary and ignores replication coordinator evidence to the contrary. I was unable to find a possible cause.

      This ticket is to further the investigation into that hypothetical, and if possible create a test that reproduces the pathological state seen in HELP-77706.

        1. DBClientReplicaSet.jpg
          DBClientReplicaSet.jpg
          68 kB
        2. RSM topology listeners.jpg
          RSM topology listeners.jpg
          80 kB
        3. unsatisfied read preference.jpg
          unsatisfied read preference.jpg
          76 kB

              Assignee:
              Unassigned
              Reporter:
              David Goffredo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: