Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Fixed
Priority: Minor - P4
Fix Version/s: 4.4.1
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Backport Requested:

v4.4
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In HELP-16677, a failure of 2/3 nodes caused those nodes to crash, and the third node was in an unknown failed state. The oplogs of the two replica set members were repaired. During the repair process, the nodes were stood up in standalone mode. The sdam spec requires that nodes self-reporting as standalone should be removed from the TopologyDescription. Once the nodes were restored as replica set members, mongos did not route traffic to these nodes, and a core dump showed that the list of hosts that were being monitored was the third node (with server description having type=Unknown) that could not be contacted. Restarting mongos fixed this problem. This ticket is a placeholder to investigate ways to mitigate this situation without manual intervention.

While there are no guarantees that this would positively impact liveness, the host lists stored in the config server and/or the initial connection string can be used to instruct the RSM to monitor nodes that may have been in the replica set in the past in the case that all current members of the replica set are down for a configured period of time. Adding them to the TopologyDescription as type=Unknown, would cause the RSM to contact those nodes at least once without negative effects on the rest of the protocol.

Update: In HELP-16677, we decided to do the following:
1. If all nodes are down for some configurable time period, add in the initial replica set members as type=Unknown.
2. Do not remove type=Standalone servers from the topology description.

Assignee:: Lamont Nelson
Reporter:: Lamont Nelson
Participants:: Githook User, Lamont Nelson
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jul 02 2020 05:08:05 PM UTC
Updated:: Oct 29 2023 10:06:09 PM UTC
Resolved:: Sep 16 2020 02:53:47 PM UTC
Confidence Status Last Update:: 17/Aug/20 9:14 PM

Details

Description

Attachments

Activity

People

Dates