[SERVER-48170] Multiversion tests assume primary stability when using upgradeCluster() with 2-node replica set shards Created: 12/May/20  Updated: 29/Oct/23  Resolved: 02/Oct/20

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.2

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
is related to SERVER-46300 Set high election timeout in map_redu... Closed
is related to SERVER-47899 Configure ShardingTest used in merge_... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-07-13, Sharding 2020-06-01, Sharding 2020-06-15, Sharding 2020-06-29, Sharding 2020-07-27, Sharding 2020-08-10, Sharding 2020-08-24, Sharding 2020-09-21
Participants:
Linked BF Score: 29

 Description   

Sharding.prototype.upgradeCluster() and the related ReplSetTest.prototype.upgradeSet() functions assert the node which was originally primary before the replica set was being upgrading is still the current primary when trying to step it down. In a 2-node replica set where both members have a vote, the voting-majority is defined as both members. This means while the secondary is being restarted, the primary may end up stepping down (due to heartbeats not being received) and for the restarted node to run for and win the election. (Configuring a high election timeout or adding a third member to the replica sets are other ways to avoid this issue.)

The ReplSetTest.prototype.upgradePrimary() function already has a concept of "no downtime possible"

const noDowntimePossible = this.nodes.length > 2;

We should plumb noDowntimePossible into ReplSetTest.prototype.stepdown() to have it skip this assertion when noDowntimePossible === false.



 Comments   
Comment by Githook User [ 03/Oct/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48170 Multiversion tests assume primary stability when using upgradeCluster() with 2-node replica set shards

(cherry picked from commit 478bf480ebfad1ab6fee19cb9a199071115f42c8)
Branch: v4.4
https://github.com/mongodb/mongo/commit/c9cb7ecc844b1ea3b14db4afcb8edaedb5a337ce

Comment by Githook User [ 02/Oct/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48170 Multiversion tests assume primary stability when using upgradeCluster() with 2-node replica set shards
Branch: master
https://github.com/mongodb/mongo/commit/478bf480ebfad1ab6fee19cb9a199071115f42c8

Comment by Githook User [ 02/Oct/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: Revert "SERVER-48170 Multiversion tests assume primary stability when using upgradeCluster() with 2-node replica set shards"

This reverts commit 8be28ffc901d981b96fff2f378faed007de95e52.
Branch: master
https://github.com/mongodb/mongo/commit/49e617b554b2c6c144f29c1f3e97d49e4e1a6944

Comment by Tommaso Tocci [ 02/Oct/20 ]

Reverting it because it is causing several failures

Comment by Githook User [ 02/Oct/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48170 Multiversion tests assume primary stability when using upgradeCluster() with 2-node replica set shards
Branch: master
https://github.com/mongodb/mongo/commit/8be28ffc901d981b96fff2f378faed007de95e52

Generated at Thu Feb 08 05:16:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.