[SERVER-27810] Guarantee that replicaset is stable with node 0 as primary after ReplSetTest.initiate() Created: 25/Jan/17  Updated: 05/Apr/17  Resolved: 23/Feb/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.3, 3.5.4

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-27839 Allow for step downs during reconfig ... Closed
related to SERVER-28376 ReplSetTest.initiate() should call aw... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: Repl 2017-03-06
Participants:
Linked BF Score: 0

 Description   

reconfig can occasionally lead to the primary stepping down. This means that the guarantee we thought SERVER-20844 gave us, of being a stable replicaset with node 0 as primary, is not always guaranteed. To address this, we should call awaitNodesAgreeOnPrimary() and then make sure that node 0 is the primary. If it is not, we should step up node 0 to primary. In PV1 we can use replSetStepUp on node 0 to accomplish this. In PV0, we can repeatedly call replSetStepDown on the primary with a high step down timeout until node 0 is elected. At that point we can call replSetFreeze on every node with a freeze timeout of 0 to allow them all to run for election immediately if they want to.



 Comments   
Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option.

This option was removed in SERVER-27810.

(cherry picked from commit 2e189a57db00b291b171d5a2323700d6f57cd471)
Branch: v3.4
https://github.com/mongodb/mongo/commit/ede51fda3a16f7aa3de35579f3cafe886f138a4c

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option.

This option was removed in SERVER-27810.
Branch: master
https://github.com/mongodb/mongo/commit/2e189a57db00b291b171d5a2323700d6f57cd471

Comment by Githook User [ 06/Mar/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27810 Guarantee that all nodes agree node 0 is primary after ReplSetTest.initiate()

(cherry picked from commit 3823a20f0186d0e6b544212fb423f9f0ef786235)
Branch: v3.4
https://github.com/mongodb/mongo/commit/04eb46ca8ceb1862c82ea70745cf72b4cc6450e3

Comment by Githook User [ 23/Feb/17 ]

Author:

{u'username': u'judahschvimer', u'name': u'Judah Schvimer', u'email': u'judah@mongodb.com'}

Message: SERVER-27810 Guarantee that all nodes agree node 0 is primary after ReplSetTest.initiate()
Branch: master
https://github.com/mongodb/mongo/commit/3823a20f0186d0e6b544212fb423f9f0ef786235

Comment by Judah Schvimer [ 06/Feb/17 ]

I actually think these are separate issues. SERVER-27839 is about an error during the reconfig command. This one is about letting the replicaset stabilize a bit before it's used in a ShardingTest. Fixing SERVER-27839 in the minimal way will not address this. I'm re-opening this for now.

Comment by Crystal Horn [ 03/Feb/17 ]

We should fix this for all repl sets not just ones in the sharding test by fixing SERVER-27839

Generated at Thu Feb 08 04:16:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.