[SERVER-32691] Create passthrough for w="majority" with 2-node replica set to address lost test coverage Created: 12/Jan/18  Updated: 30/Oct/23  Resolved: 12/Feb/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.20, 3.4.14, 3.6.3, 3.7.2

Type: Bug Priority: Critical - P2
Reporter: Kevin Albertson Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-30850 startParallelShell cannot be used in ... Closed
depends on SERVER-32522 set_read_and_write_concerns.js treats... Closed
Related
related to SERVER-34116 resmoke.py is silently not running no... Closed
related to SERVER-30642 Raise election timeouts as a way to p... Closed
related to SERVER-32774 Ensure change_streams_secondary_reads... Closed
is related to SERVER-31670 Change replica set fixture used by re... Closed
is related to SERVER-32794 Make timeouts unrelated to elections ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6, v3.4, v3.2
Sprint: TIG 2018-02-12, TIG 2018-02-26
Participants:

 Description   

The changes in SERVER-31670 made suites using the ReplicaSetFixture have non-voting secondaries by default unless voting_secondaries=true or all_nodes_electable=True. There are test suites for readConcern={level: "majority"} that were incidentally testing w="majority" writeConcern. However, none of these test suites actually verified that the secondary in a 2-node replica set had applied the write immediately after the client performed a w="majority" write.

We should add a new write_concern_majority_passthrough.yml test suite that uses both (1) the set_read_and_write_concerns.js override with w="majority" and readConcern={level: "level} and (2) the set_read_preference_secondary.js override. It should be backported to the 3.2, 3.4, and 3.6 branches and run anywhere we already run an of aggregation_read_concern_majority_passthrough.yml, read_concern_linearizable_passthrough.yml, and read_concern_majority_passthrough.yml test suites.


Original summary

Passthrough suites using majority writes should have voting secondaries

Original description

The changes in SERVER-31670 made suites using the ReplicaSetFixture have non-voting secondaries by default unless voting_secondaries: true or all_nodes_electable:True was specified in the fixture configuration.

The majority write passthrough suites which use majority writes by default should be updated to have voting secondaries. Otherwise the majority writes are only on the primary, so there isn't much of a purpose to the passthrough.

The following suites should probably just need voting_secondaries: true set on the ReplicaSetFixture:

  • read_concern_majority_passthrough
  • read_concern_linearizable_passthrough
  • aggregation_read_concern_majority_passthrough

I think the following suites which use majority writes by default should also be changed:

  • change_streams_secondary_reads


 Comments   
Comment by Githook User [ 24/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough_WT task in Evergreen.

Also changes the aggregation_read_concern_majority_passthrough.yml,
read_concern_majority_passthrough.yml, and
write_concern_majority_passthrough.yml test suites to skip any
JavaScript tests that use the "collMod" command because it only supports
a w=1 writeConcern.

(cherry picked from commit bb8ac01f052a7b4b5c042085334ce640a1ab8dd1)
Branch: v3.2
https://github.com/mongodb/mongo/commit/e59d00a7323cbe3c164ab41ef4c53a4638caba5d

Comment by Githook User [ 24/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough.yml test suite.

Reverts to emulating the write concern to work around how prior to
MongoDB 3.4, operations that did writes didn't necessarily accept a
writeConcern object.

Also limits the usage of replica set connection strings to only the
write_concern_majority_passthrough.yml test suite to work around the
lack of complete support of MongoURI parsing in versions of the mongo
shell prior to MongoDB 3.4.

(cherry picked from commit 264d971842cffdf8b4f80def1d90241f132345b7)
Branch: v3.2
https://github.com/mongodb/mongo/commit/e54239b3b99687ab79048f4ae0f20b2095910e18

Comment by Githook User [ 14/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough_WT task in Evergreen.

Modifies the usages of DBCommandCursor to work around how if the
CursorTracker is registered on the replica set connection (i.e. the
DBClientRS instance), then it'll trigger a verify() failure when garbage
collection occurs.

Also changes the aggregation_read_concern_majority_passthrough.yml,
read_concern_majority_passthrough.yml, and
write_concern_majority_passthrough.yml test suites to skip any
JavaScript tests that use the "collMod" command because it only supports
a w=1 writeConcern.

(cherry picked from commit bb8ac01f052a7b4b5c042085334ce640a1ab8dd1)
Branch: v3.4
https://github.com/mongodb/mongo/commit/6408164d14181b8717bdcb462456a90c16020a42

Comment by Githook User [ 14/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough.yml test suite.

Also adds support for using replica set connection strings in resmoke.py
without making all nodes electable.

(cherry picked from commit 264d971842cffdf8b4f80def1d90241f132345b7)
Branch: v3.4
https://github.com/mongodb/mongo/commit/7585ab8e5a5fd1b1c2f5926a98cf12387d717fa9

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough_WT task in Evergreen.

(cherry picked from commit bb8ac01f052a7b4b5c042085334ce640a1ab8dd1)
Branch: v3.6
https://github.com/mongodb/mongo/commit/5ccb16c2b8eb80d1eb3fe4e6ba36c092e03eb5d4

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough.yml test suite.

Also adds support for using replica set connection strings in resmoke.py
without making all nodes electable.

(cherry picked from commit 264d971842cffdf8b4f80def1d90241f132345b7)
Branch: v3.6
https://github.com/mongodb/mongo/commit/b2399471de27583b684a1b4ad67195cab7062865

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough task in Evergreen.
Branch: master
https://github.com/mongodb/mongo/commit/bb8ac01f052a7b4b5c042085334ce640a1ab8dd1

Comment by Max Hirschhorn [ 10/Feb/18 ]

I'm reopening this ticket because upon cherry-picking the changes from 264d971 to the 3.6 branch, I realized that the lack of merge conflicts in etc/evergreen.yml meant I never actually integrated the write_concern_majority_passthrough.yml test suite into Evergreen.

Comment by Githook User [ 31/Jan/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'name': 'Max Hirschhorn', 'username': 'visemet'}

Message: SERVER-32691 Add write_concern_majority_passthrough.yml test suite.

Also adds support for using replica set connection strings in resmoke.py
without making all nodes electable.
Branch: master
https://github.com/mongodb/mongo/commit/264d971842cffdf8b4f80def1d90241f132345b7

Comment by Andy Schwerin [ 19/Jan/18 ]

Talked to Judah. This SGTM. Thanks for thinking about this, max.hirschhorn and judah.schvimer.

Comment by Judah Schvimer [ 19/Jan/18 ]

I'm on board with the above. I filed SERVER-32794 to address the stated timeout problem.

Comment by Max Hirschhorn [ 19/Jan/18 ]

The w:majority writes aren't really testing this, the "awaitReplication" call is, which we do in all of our passthroughs, so these aren't really increasing coverage of that.

Precisely - I meant this in the sense that specifying w="majority" doesn't cause the primary to write different oplog entries or have those oplog entries be processed differently by a secondary. The dbhash check always needs to a w="all" to ensure that all nodes in the replica set have processed all write operations because it doesn't treat 2-node replica sets specially.

I don't really agree. If there were a bug where we waited for replication but replication didn't happen, we'd catch that if we had a majority greater than 1.

Your example of there being a bug with w="majority" hanging when the size of the majority in the replica set is >1 node has convinced me that there is value in having voting secondaries for a w="majority" passthrough.

Should we do secondary local reads after w:majority writes then?

I think we could do as you described for testing w="majority" with a 2-node replica set, but since we're talking about the *read_concern_majority_passthrough.yml test suites, we'd need to keep using readConcern={level: "majority"}.

schwerin, judah.schvimer, what do you think about keeping the *read_concern_majority_passthrough.yml test suites with voting_secondaries=false and introducing a new write_concern_majority_passthrough.yml test suite that performs reads on the secondary in a 2-node replica set? That feels more explicit to me about what we're actually trying to test with these different test suites.

My goal is to remove "an unexpected election caused a JavaScript test to fail" from the class of failures the Replication team needs to deal with because we tend to blame them on the machine being slow anyway. My push to maximize the number of test suites that using voting_secondaries=false is to make it so that failovers only happen in test suites that are prepared to handle (i.e. the stepdown ones). I would want to be able to have a write_concern_majority_passthrough.yml test suite run with an election timeout of infinity while still using a heartbeat interval of 2 seconds and maxTimeMS on oplog getMores of 5 seconds for consistency with the default values. My impression is that using "half of the election timeout" is pervasive throughout parts of the replication codebase and—from a Slack conversation with Judah—that SERVER-29946 didn't necessarily go far enough to address this behavior.

Comment by Judah Schvimer [ 18/Jan/18 ]

the size of "majority" for the passthrough testing wasn't a significant factor in exercising correctness properties we are already testing.

I don't really agree. If there were a bug where we waited for replication but replication didn't happen, we'd catch that if we had a majority greater than 1.

In particular, performing a w="majority" write doesn't guarantee that you can perform a readConcern={level: majority} read on a secondary and read your write because the logic to wait for the majority-committed snapshot to advance only occurs on the current primary. This means that the *read_concern_majority_passthrough.yml test suites cannot be used to test that w="majority" writes actually wait for replication.

Should we do secondary local reads after w:majority writes then?

(2) that the data is eventually replicated to the secondaries.

The w:majority writes aren't really testing this, the "awaitReplication" call is, which we do in all of our passthroughs, so these aren't really increasing coverage of that.

Comment by Max Hirschhorn [ 17/Jan/18 ]

I'm not enthusiastic about using voting but unelectable nodes as a way to ensure that the majority machinery runs without inducing elections. I worry that in a set where only 1 node is electable but many nodes can vote, some important consensus machinery may get excluded. After all, if only 1 node can win an election, and everyone agrees to that, there's no real point in the consensus machinery. The fact that we still use it in that case might be considered an implementation detail.

schwerin, I had discussed the impact on the readConcern={level: majority} test suites with siyuan.zhou on Jan 5th and we had reached a conclusion that the size of "majority" for the passthrough testing wasn't a significant factor in exercising correctness properties we are already testing. In particular, performing a w="majority" write doesn't guarantee that you can perform a readConcern={level: majority} read on a secondary and read your write because the logic to wait for the majority-committed snapshot to advance only occurs on the current primary. This means that the *read_concern_majority_passthrough.yml test suites cannot be used to test that w="majority" writes actually wait for replication. Given this limitation, my impression is that we are using the *read_concern_majority_passthrough.yml test suite to verify that (1) the majority-committed snapshot is advanced and (2) that the data is eventually replicated to the secondaries.

(If we were going to change the implementation of mongod such that readConcern={level: majority} invoked parts of the consensus machinery, then I would want to have voting secondaries.)

I think the following suites which use majority writes by default should also be changed:

  • change_streams_secondary_reads

kevin.albertson, I don't see how this test suite isn't failing if we are performing reads on the secondary but only ensure that writes have made it to the primary.

Comment by Andy Schwerin [ 16/Jan/18 ]

Yeah. I think the odds of us introducing a regression in the next week or two are pretty low; especially one not caught by the causal consistency passthroughs.

Comment by Cristopher Stauffer [ 16/Jan/18 ]

schwerin We have Max back on Wednesday - is that soon enough to review?

Comment by Andy Schwerin [ 13/Jan/18 ]

I'm not enthusiastic about using voting but unelectable nodes as a way to ensure that the majority machinery runs without inducing elections. I worry that in a set where only 1 node is electable but many nodes can vote, some important consensus machinery may get excluded. After all, if only 1 node can win an election, and everyone agrees to that, there's no real point in the consensus machinery. The fact that we still use it in that case might be considered an implementation detail.

Comment by William Schultz (Inactive) [ 12/Jan/18 ]

An alternative approach, so that we still test majority writes but avoid spurious elections, is to make the election timeouts in these suites very high, and set the priority of secondary nodes to zero. This would, hopefully, make primary's much more stable, even if they become slow or have communicating with other members of the set for some periods of time. Making the secondaries priority zero would make sure they don't trigger spurious elections if we want the primary to remain stable.

Generated at Thu Feb 08 04:31:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.