[SERVER-43847] Make ReplSetTest's stepUp function resilient to slow machines Created: 04/Oct/19  Updated: 29/Oct/23  Resolved: 31/Jan/20

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.3.4, 4.0.24

Type: Bug Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2019-10-21, Repl 2019-11-04, Repl 2019-11-18, Repl 2019-12-02, Repl 2019-12-16, Repl 2019-12-30, Repl 2020-01-13, Repl 2020-01-27, Repl 2020-02-10
Participants:
Linked BF Score: 15

 Description   

If a node experiences machine slowness while initiating a replica set, then it could cause an unplanned election. If this happens while trying to stepUp a node, specifically after checking that all nodes agree on the applied opTime, then the stepUp could fail because the node stepping up may not have applied the new term oplog entry from the new primary. As a result, the intended primary will fail its election because the new primary is ahead of it and will vote no (this is specific to replica sets that will need the primary's vote to get a majority).

The stepUp function in ReplSetTest should have logic to retry the step up in such a case.



 Comments   
Comment by Githook User [ 29/Mar/21 ]

Author:

{'name': 'Samyukta Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-43847 Make ReplSetTest's stepUp function resilient to slow machines

(cherry picked from commit c5a53e4882bd316dcb37141ccfab56f5acaec8f4)
Branch: v4.0
https://github.com/mongodb/mongo/commit/bcd4dd0cf75497959c6a6c5980071413d22ea781

Comment by Githook User [ 20/Nov/20 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-43847 Make ReplSetTest's stepUp function resilient to slow machines.

(cherry picked from commit c5a53e4882bd316dcb37141ccfab56f5acaec8f4)

SERVER-49187 Make ReplSetTest.stepUp() robust to election failures.

(cherry picked from commit 311b7982f61009fd08bd7b76b1638d62cc8703de)
Branch: v4.2
https://github.com/mongodb/mongo/commit/db72156b34591a37f98f1eeae0e5d0c67ed555ff

Comment by Githook User [ 31/Jan/20 ]

Author:

{'name': 'Samyukta Lanka', 'username': 'lankas', 'email': 'samy.lanka@mongodb.com'}

Message: SERVER-43847 Make ReplSetTest's stepUp function resilient to slow machines
Branch: master
https://github.com/mongodb/mongo/commit/c5a53e4882bd316dcb37141ccfab56f5acaec8f4

Comment by A. Jesse Jiryu Davis [ 07/Oct/19 ]

Similar problem to SERVER-43226 (which deals with stepdown and freeze), perhaps we can take a similar approach.

Generated at Thu Feb 08 05:04:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.