Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Networking & Observability
Confidence Status:
None
Work Order:
3
Size Category:
TBD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In HELP-77706, a three member replica set (instances 0, 1, and 2) got into a state during maintenance where instances 0 and 2 thought that instance 1 was primary, while instance 1 responded to queries as if it were secondary. Instance 1's FTDC replication data indicated that it thought that it was primary, and so the issue lay in a disagreement between instance 1's replication coordinator (which was correct) and instance 1's replica set monitor (which was incorrect).

During the investigation, we noticed that the log lines in all instances, but especially in instance 1, were not in chronological order by timestamp (.t.$date). The log would regularly jump backwards in time. In the logs for instance 0 and 2, the jumps were infrequent and almost always exactly one millisecond. On instance 1, the jumps were more frequent and often as much as four minutes.

I (david.goffredo@mongodb.com) read through the replica set monitor (SDAM) server code to try to find a way that these large backward jumps in time could get the server into a state where the replica set monitor thinks that the instance is secondary and ignores replication coordinator evidence to the contrary. I was unable to find a possible cause.

This ticket is to further the investigation into that hypothetical, and if possible create a test that reproduces the pathological state seen in HELP-77706.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

DBClientReplicaSet.jpg
68 kB
Jul 16 2025 09:45:23 PM UTC
RSM topology listeners.jpg
80 kB
Jul 16 2025 09:45:23 PM UTC
unsatisfied read preference.jpg
76 kB
Jul 16 2025 09:45:23 PM UTC

related to

SERVER-107659 Test RSM when host clock experience major jump back and forth in time

Closed

Assignee:: Unassigned
Reporter:: David Goffredo
Participants:: David Goffredo
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jul 16 2025 09:27:01 PM UTC
Updated:: Jul 21 2025 06:24:01 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates