[SERVER-49187]  Make ReplSetTest .stepUp() robust to election failures. Created: 30/Jun/20  Updated: 29/Oct/23  Resolved: 30/Jul/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.3, 4.2.12, 4.0.24

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-50049 assert.soonNoExcept() should not acce... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Repl 2020-07-13, Repl 2020-07-27, Repl 2020-08-10
Participants:
Linked BF Score: 100

 Description   

ReplSetTest .stepUp() retires again if replSetStepUp cmd fails. Retries results in calling awaitReplication() again which requires a primary to be present for the replica set. And, we can't hold that guarantee (a primary will be present for every retries) if a test run with high election timeout (24 hrs).

Consider a scenario where a primary stepped down during the first failed attempt of replSetStepUp cmd. So, the replica set won't be having a primary going forward since the test runs with high election timeout. This would result the retry awaitReplication() step to be stuck waiting for the primary.

Previously, when reconstruct_prepared_transactions_initial_sync.js had a similar issue, I fixed it by making the jstest to use stepUpNoAwaitReplication  instead of ReplSetTest .stepUp() (see SERVER-48778). Now, retryable_commit_transaction_after_failover.js also failed due to the above mentioned issue. Revalidating SERVER-48778 fix makes me to realize that these steps are not necessary to be retried on failure of replSetStepUp cmd. Since replSetStepUp  cmd is wrapped up in the assert.soon(), we really don't need to call those steps on every retries.



 Comments   
Comment by Githook User [ 29/Mar/21 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-49187 Make ReplSetTest .stepUp() robust to election failures.

(cherry picked from commit 311b7982f61009fd08bd7b76b1638d62cc8703de)
Branch: v4.0
https://github.com/mongodb/mongo/commit/9b56acc6c6c0ccf9bf882a0786c037e04bac753f

Comment by Githook User [ 20/Nov/20 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-43847 Make ReplSetTest's stepUp function resilient to slow machines.

(cherry picked from commit c5a53e4882bd316dcb37141ccfab56f5acaec8f4)

SERVER-49187 Make ReplSetTest.stepUp() robust to election failures.

(cherry picked from commit 311b7982f61009fd08bd7b76b1638d62cc8703de)
Branch: v4.2
https://github.com/mongodb/mongo/commit/db72156b34591a37f98f1eeae0e5d0c67ed555ff

Comment by Githook User [ 20/Nov/20 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-49187 Make ReplSetTest.stepUp() robust to election failures.

(cherry picked from commit 311b7982f61009fd08bd7b76b1638d62cc8703de)
Branch: v4.4
https://github.com/mongodb/mongo/commit/c5fc5b52da70c3af685d7fdc6254a5052b92c12e

Comment by Githook User [ 30/Jul/20 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-49187 Make ReplSetTest .stepUp() robust to election failures.
Branch: master
https://github.com/mongodb/mongo/commit/311b7982f61009fd08bd7b76b1638d62cc8703de

Generated at Thu Feb 08 05:19:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.