[SERVER-29153] Make sure replica set nodes agree on which node is primary before doing writes in ShardingTest initialization Created: 12/May/17  Updated: 30/Oct/23  Resolved: 03/Apr/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.6, 3.6.18, 4.4.0-rc0, 4.0.18, 4.7.0

Type: Improvement Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2, v4.0, v3.6
Sprint: Sharding 2020-04-20
Participants:
Linked BF Score: 12

 Description   

This can occur during Replica Set primary election:

[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.376+0000 c20512| 2017-03-29T17:21:06.375+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 1
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.381+0000 c20512| 2017-03-29T17:21:06.381+0000 D REPL_HB  [ReplicationExecutor] Cancelling all heartbeats.
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:06.382+0000 c20512| 2017-03-29T17:21:06.381+0000 I REPL     [ReplicationExecutor] Could not access any nodes within timeout when checking for additional ops to apply before finishing transition to primary. Will move forward with becoming primary anyway.

Followed by

[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
[js_test:mapReduce_nonSharded] 2017-03-29T17:21:18.428+0000 c20512| 2017-03-29T17:21:18.428+0000 I REPL     [replExecDBWorker-0] transition to SECONDARY

roughly 10 seconds later.

ShardingTest initialization does writes here and here that can fail if the config or shard steps down respectively.

awaitNodesAgreeOnPrimary() seems appropriate to call before doing necessary config server writes.

The shard writes (in the link above) does not appear to actually be necessary. It looks like the write followed by awaitReplication() was just a hack to wait for the replica set to finish setting up (https://github.com/mongodb/mongo/commit/69207258a19b68fbbbc1377f61b00a9124d78c90#diff-bcff736df8cd6507ce081e800d9d1dd0R168). In which case, we can just remove the write, which is outdated and not longer useful, and replace awaitSecondaryNodes with awaitNodesAgreeOnPrimary(), which is a lot more obvious of the intent and has the same effect.



 Comments   
Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-29153 Wait for nodes to agree on primary before writes in ShardingTest setup

(cherry picked from commit 85a915d3b49c0cd0b106f40df55a68a2f6779de1)
Branch: v4.2
https://github.com/mongodb/mongo/commit/f9170b2a35d3ab9d1d6d7669d1bacf9da785a94d

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-29153 Wait for nodes to agree on primary before writes in ShardingTest setup

(cherry picked from commit 85a915d3b49c0cd0b106f40df55a68a2f6779de1)
Branch: v4.4
https://github.com/mongodb/mongo/commit/09e84e1b02db665a48ab6eb04b30909e7ca88494

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-29153 Wait for nodes to agree on primary before writes in ShardingTest setup

(cherry picked from commit 85a915d3b49c0cd0b106f40df55a68a2f6779de1)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6883bdfb8b8cff32176b1fd176df04da9165fd67

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-29153 Wait for nodes to agree on primary before writes in ShardingTest setup

(cherry picked from commit 85a915d3b49c0cd0b106f40df55a68a2f6779de1)
Branch: v3.6
https://github.com/mongodb/mongo/commit/c6980931e00dac592fdcb2d0c48b2ecc81d17a6e

Comment by Githook User [ 03/Apr/20 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-29153 Wait for nodes to agree on primary before writes in ShardingTest setup
Branch: master
https://github.com/mongodb/mongo/commit/85a915d3b49c0cd0b106f40df55a68a2f6779de1

Generated at Thu Feb 08 04:20:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.