[SERVER-28376] ReplSetTest.initiate() should call awaitReplication before requesting node 0 to step up as primary Created: 17/Mar/17  Updated: 15/Aug/17  Resolved: 29/Mar/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.4, 3.5.6

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-28699 ReplSetTest should always awaitSecond... Closed
Duplicate
is duplicated by SERVER-28273 replsettest.js needs to authenticate ... Closed
Related
is related to SERVER-27785 Make external authorization tests not... Closed
is related to SERVER-27107 Update 'last-stable' MongoRunner alia... Closed
is related to SERVER-14017 Refactor ShardingTest and ReplSetTest... Backlog
is related to SERVER-21714 Increase replSetTest.initiate() timeo... Closed
is related to SERVER-27490 Stop consulting storage engine isDura... Closed
is related to SERVER-27810 Guarantee that replicaset is stable w... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl 2017-04-17
Participants:
Linked BF Score: 0

 Description   

During ReplSetTest initialization, when elections happen while adding nodes to the replica set, it is possible for node 0 (the intended primary node after initialization) to lag the other nodes before it is asked to step up.

https://github.com/mongodb/mongo/blob/231e760c744013fe68fe863a7d0315148c69047a/src/mongo/shell/replsettest.js#L771

replsettest.js

771
            this.stepUp(this.nodes[0]);
772
        }
773
 
774
        // Setup authentication if running test with authentication
775
        if ((jsTestOptions().keyFile) && cmdKey == 'replSetInitiate') {
776
            master = this.getPrimary();
777
            jsTest.authenticateNodes(this.nodes);
778
        }
779
    };
780
 
781
    this.stepUp = function(node) {
782
        this.awaitSecondaryNodes();
783
        this.awaitNodesAgreeOnPrimary();
784
        if (this.getPrimary() === node) {
785
            return;
786
        }

This may result in the step up phase of ReplSetTest,initiate() failing to elect node 0 with a "candidate's data is staler than mine" reason.

[js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL     [conn2] stepping down from primary, because a new term has begun: 2
[js_test:replsetprio1] 2017-03-15T13:03:02.788+0000 d20010| 2017-03-15T13:03:02.787+0000 I REPL     [replExecDBWorker-2] transition to SECONDARY
...
[js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB  [conn2] Received heartbeat request from mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20011", fromId: 1, term: 2 }
[js_test:replsetprio1] 2017-03-15T13:03:02.802+0000 d20010| 2017-03-15T13:03:02.799+0000 D REPL_HB  [conn2] Processed heartbeat from mci-osx1010-10.build.10gen.cc:20011 and generated response, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 } }
 
...
[js_test:replsetprio1] 2017-03-15T13:03:03.315+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB  [ReplicationExecutor] Sending heartbeat (requestId: 52) to mci-osx1010-10.build.10gen.cc:20012, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 }
[js_test:replsetprio1] 2017-03-15T13:03:03.316+0000 d20010| 2017-03-15T13:03:03.315+0000 D REPL_HB  [ReplicationExecutor] Sending heartbeat (requestId: 53) to mci-osx1010-10.build.10gen.cc:20011, { replSetHeartbeat: "testSet", configVersion: 2, from: "mci-osx1010-10.build.10gen.cc:20010", fromId: 0, term: 1 }
...
[js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB  [ReplicationExecutor] Received response to heartbeat (requestId: 52) from mci-osx1010-10.build.10gen.cc:20012, { ok: 1.0, state: 2, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 0, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 }
[js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL     [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20012 is now in state SECONDARY
[js_test:replsetprio1] 2017-03-15T13:03:03.318+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB  [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20012 at 2017-03-15T13:03:08.316Z
[js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB  [ReplicationExecutor] Received response to heartbeat (requestId: 53) from mci-osx1010-10.build.10gen.cc:20011, { ok: 1.0, electionTime: new Date(6397710192368156673), state: 1, v: 2, hbmsg: "", set: "testSet", term: 2, primaryId: 1, durableOpTime: { ts: Timestamp 1489582961000|2, t: 1 }, opTime: { ts: Timestamp 1489582961000|2, t: 1 }, operationTime: Timestamp 1489582982000|1 }
[js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 I REPL     [ReplicationExecutor] Member mci-osx1010-10.build.10gen.cc:20011 is now in state PRIMARY
[js_test:replsetprio1] 2017-03-15T13:03:03.319+0000 d20010| 2017-03-15T13:03:03.316+0000 D REPL_HB  [ReplicationExecutor] Scheduling heartbeat to mci-osx1010-10.build.10gen.cc:20011 at 2017-03-15T13:03:08.316Z
...
[js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I COMMAND  [conn11] Received replSetStepUp request
[js_test:replsetprio1] 2017-03-15T13:03:07.781+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL     [conn11] Starting an election due to step up request
[js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL     [conn11] conducting a dry run election to see if we could be elected
[js_test:replsetprio1] 2017-03-15T13:03:07.782+0000 d20010| 2017-03-15T13:03:07.781+0000 I REPL     [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20011 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 }
[js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL     [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from mci-osx1010-10.build.10gen.cc:20012 with reason "candidate's data is staler than mine"; response message: { term: 2, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0, operationTime: Timestamp 1489582984000|1 }
[js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I REPL     [ReplicationExecutor] not running for primary, we received insufficient votes
[js_test:replsetprio1] 2017-03-15T13:03:07.783+0000 d20010| 2017-03-15T13:03:07.782+0000 I COMMAND  [conn11] replSetStepUp request failed  :: caused by :: CommandFailed: Election failed.



 Comments   
Comment by Benety Goh [ 01/Apr/17 ]

fixed build issues in backported commit

Comment by Githook User [ 01/Apr/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 ReplSetTest.stepUp() calls awaitReplication() before sending replSetStepUp command

(cherry picked from commit f1fb0600a328f0792fa84bea78e368831e4af669)

This backport also includes some changes to JS tests and replsettest.js from
commit cdc7af4c6d453b8c1ce2319d0cd3b50074609d87.
Branch: v3.4
https://github.com/mongodb/mongo/commit/ae657f64a63699a14df6b6e6950818393de75b3a

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: Revert "SERVER-28376 ReplSetTest.stepUp() calls awaitReplication() before sending replSetStepUp command"

This reverts commit 3f8cb1199fe9f7d83725c5f79113beb77758b574.
Branch: v3.4
https://github.com/mongodb/mongo/commit/aeae03f23b14e29d3c07a65762d57267d6bae0b9

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 ReplSetTest.stepUp() calls awaitReplication() before sending replSetStepUp command

(cherry picked from commit f1fb0600a328f0792fa84bea78e368831e4af669)
Branch: v3.4
https://github.com/mongodb/mongo/commit/3f8cb1199fe9f7d83725c5f79113beb77758b574

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 fixed jstests/ssl and jstests/sslSpecial tests to not require node 0 as primary

(cherry picked from commit 710cefca98db7c8f7736a79cebca855b99509235)
Branch: v3.4
https://github.com/mongodb/mongo/commit/2ddd396a509a46d07b2ba43724c549c20a366349

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove requirement for node 0 of every shard to be primary in ShardingTest initialization

(cherry picked from commit ff6116630ad77b2afd348bbf81c8af2eccfa8163)
Branch: v3.4
https://github.com/mongodb/mongo/commit/60b8363aafacd49d3d9cbaf46c68c4929665ef9e

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 Make authorization tests not require node 0 to be primary after ReplSetTest initialization

(cherry picked from commit 95c7a8365964ef62be3f8af7d4e5436046e0b3b3)
Branch: v3.4
https://github.com/10gen/mongo-enterprise-modules/commit/52e4fc7bf4d5d6d0e711ed6f3cd7a3c18806a16a

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 split ReplSetTest.initiate() into 3 functions.

ReplSetTest.initiateWithAnyNodeAsPrimary()
ReplSetTest.initiateWithNodeZeroAsPrimary()
ReplSetTest.initiate() - alias for initiateWithNodeZeroAsPrimary()

(cherry picked from commit cc9c461904392d5bf71d815853d13065f6b91629)
Branch: v3.4
https://github.com/mongodb/mongo/commit/451630c29ca4478097373c0d27c68e65a2089efb

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 fixed replsets/auth2.js to not clobber global 'rs' variable.

(cherry picked from commit 0501d9eedc19ee1d4bf35525fb25d0e5a50caa49)
Branch: v3.4
https://github.com/mongodb/mongo/commit/8ebb459615c85fda6b4bb077e0a8dc88436f7f83

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option.

This option was removed in SERVER-27810.

(cherry picked from commit 2e189a57db00b291b171d5a2323700d6f57cd471)
Branch: v3.4
https://github.com/mongodb/mongo/commit/ede51fda3a16f7aa3de35579f3cafe886f138a4c

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 ReplSetTest.stepUp() calls awaitReplication() before sending replSetStepUp command
Branch: master
https://github.com/mongodb/mongo/commit/f1fb0600a328f0792fa84bea78e368831e4af669

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 fixed jstests/ssl and jstests/sslSpecial tests to not require node 0 as primary
Branch: master
https://github.com/mongodb/mongo/commit/710cefca98db7c8f7736a79cebca855b99509235

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove requirement for node 0 of every shard to be primary in ShardingTest initialization
Branch: master
https://github.com/mongodb/mongo/commit/ff6116630ad77b2afd348bbf81c8af2eccfa8163

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 Make authorization tests not require node 0 to be primary after ReplSetTest initialization
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/95c7a8365964ef62be3f8af7d4e5436046e0b3b3

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 split ReplSetTest.initiate() into 3 functions.

ReplSetTest.initiateWithAnyNodeAsPrimary()
ReplSetTest.initiateWithNodeZeroAsPrimary()
ReplSetTest.initiate() - alias for initiateWithNodeZeroAsPrimary()
Branch: master
https://github.com/mongodb/mongo/commit/cc9c461904392d5bf71d815853d13065f6b91629

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 fixed replsets/auth2.js to not clobber global 'rs' variable.
Branch: master
https://github.com/mongodb/mongo/commit/0501d9eedc19ee1d4bf35525fb25d0e5a50caa49

Comment by Githook User [ 29/Mar/17 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-28376 remove references to unsupported ReplSetTest.initiate() initiateTimeout option.

This option was removed in SERVER-27810.
Branch: master
https://github.com/mongodb/mongo/commit/2e189a57db00b291b171d5a2323700d6f57cd471

Generated at Thu Feb 08 04:17:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.