[DRIVERS-1954] SDAM should give priority to electionId over setVersion when updating topology Created: 14/Oct/21 Updated: 07/Oct/22 Resolved: 05/Oct/22 |
|
| Status: | Closed |
| Project: | Drivers |
| Component/s: | SDAM |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | Neal Beeken |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Server Compat: | 4.4, 5.0, 5.1, 5.3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Quarter: | FY23Q3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Upstream Changes Summary: | Filed |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Downstream Changes Summary: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
in progress...SummarySDAM spec specifies that RSM is using the { setVersion, electionId } in that order to detect stale primaries. The motivation for this is that if the protocol version changes (like it happened in 3.2.0) the electionId might not be directly comparable but the setVersion is guaranteed to increment. Details: https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst#using-setversion-and-electionid-to-detect-stale-primaries The problem with that if the failover happens before the former primary was able to get the consensus on setVersion increment the new primary will communicate a decremented setVersion while electionId incremented. The existing SDAM treats this as stale primary, which leads to full cluster outage and requires manual intervention. Details in Drawback: if we need to make non-compatible protocol versions in future, which will make the electionId non monotonical, if will require an additional contingency plan. Tests: the SDAM updated in head to match new behavior: https://github.com/mongodb/mongo/tree/master/src/mongo/client/sdam/json_tests/sdam_tests MotivationWho is the affected end user?Who are the stakeholders? Divers team, server teams. How does this affect the end user?Full cluster outage is possible. How likely is it that this problem or use case will occur?It happens in tests all the time. If the problem does occur, what are the consequences and how severe are they?Outage. Is this issue urgent?Not urgent but high priority. Is this ticket required by a downstream team?TBD, might be just normal upgrade path. Is this ticket only for tests?No. |
| Comments |
| Comment by Githook User [ 13/Sep/22 ] |
|
Author: {'name': 'Shane Harvey', 'email': 'shnhrv@gmail.com', 'username': 'ShaneHarvey'}Message: |
| Comment by Githook User [ 01/Apr/22 ] |
|
Author: {'name': 'Boris', 'email': 'boris.dogadov@mongodb.com', 'username': 'BorisDog'}Message: |
| Comment by Githook User [ 07/Mar/22 ] |
|
Author: {'name': 'Neal Beeken', 'email': 'neal.beeken@mongodb.com', 'username': 'nbbeeken'}Message: |
| Comment by Shane Harvey [ 07/Feb/22 ] |