[SERVER-33498] ReplSetTest.checkReplicaSet fails when run against a config server replica set. Created: 26/Feb/18  Updated: 07/Mar/18  Resolved: 06/Mar/18

Status: Closed
Project: Core Server
Component/s: JavaScript, Replication, Sharding
Affects Version/s: 3.4.11
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: James O'Leary Assignee: Max Hirschhorn
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-33068 run_check_repl_dbhash.js hook exits w... Closed
duplicates SERVER-30900 remove collMod writeConcern argument ... Closed
duplicates SERVER-21630 Expand resmoke's CheckReplDBHash supp... Closed
duplicates SERVER-31441 Make run_validate_collections.js vali... Closed
duplicates SERVER-24759 Run resmoke.py collection validation ... Closed
duplicates SERVER-28989 Avoid dropping dummy database in Repl... Closed
Related
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

checkReplicaSet fails with the following message:

w:'majority' is the only valid write concern when writing to config servers

This is failing as part of the validation checking at the end of the 3.4 testsuite.



 Comments   
Comment by Max Hirschhorn [ 06/Mar/18 ]

Closing as a duplicate of the linked SERVER tickets which I've marked for backport to the 3.4 branch.

Comment by James O'Leary [ 02/Mar/18 ]

Thankfully the config server check runs last so the rest of the replsets checks should be running.

These post run checks are only ever flagged as failures if something else fails.

So it may cause some noise in our output (but the frequency of these tests is low and worst case we could disable the post check for the config servers).

Comment by Max Hirschhorn [ 01/Mar/18 ]

It sounds like we'll want to backport SERVER-21630, SERVER-24759, and SERVER-33068 to the 3.4 branch in order to make it correct to run the run_validate_collections.js and run_check_repl_dbhash.js hooks there. jim.oleary, are you able to not run the data consistency checks against sharded clusters on the 3.4 branch until those backports can be scheduled?

Comment by James O'Leary [ 28/Feb/18 ]

Hi Max,

As part of PERF-1146, we added functionality to run the checks against each of the component replica sets (including the config servers).

This code was added in late December and the failure wasn't reported until one of the other test fails.

-Jim

Comment by Max Hirschhorn [ 27/Feb/18 ]

Hi jim.oleary, I'd like to better understand how the Performance team is using the jstests/hooks/run_check_repl_dbhash.js hook in the sys-perf-3.4 Evergreen project. Prior to SERVER-21630—which isn't currently backported to the 3.4 branch—it wasn't meaningful to run the hook against a sharded cluster as it was originally designed to only run against a single replica set.

Comment by Esha Maharishi (Inactive) [ 26/Feb/18 ]

max.hirschhorn, ok, I think backporting either set of changes is valid, and don't have much opinion on which one to choose.

Since the second set of changes you suggested involves only backports to test infrastructure, and 3.4 is fairly stable by now, I slightly prefer that.

Comment by Max Hirschhorn [ 26/Feb/18 ]

esha.maharishi, it isn't sufficient to run a w="majority" write because we're attempting to verify that all data has made it to the replica set members. We could instead address this issue purely in the ReplSetTest#checkReplicaSet() by backporting the the combination of the changes from 5c702fe and ac84a0d as part of SERVER-28989 and SERVER-30900, respectively, to the 3.4 branch.

The implementation of the ReplSetTest#checkReplicaSet() method on the master branch doesn't do a write to a dummy database or specify a writeConcern for the collMod commands, and instead only relies on ReplSetTest#awaitReplication() for synchronization.

Comment by Esha Maharishi (Inactive) [ 26/Feb/18 ]

max.hirschhorn, it can probably be backported, but if it's possible to just send the writes with majority writeConcern (there is no pressing reason why a different writeConcern is needed), that might be better.

I don't remember exactly why we took that requirement out - maybe precisely to allow things like w: "all" or w: <number>, like jim.oleary's script seems to have been doing before, or using tags? It doesn't seem like anything was blocked on it.

Comment by Max Hirschhorn [ 26/Feb/18 ]

esha.maharishi, kaloian.manassiev, it appears this requirement was removed in SERVER-28641. Are the changes from 2cee2f8 something which can be backported to the 3.4 branch?

Generated at Thu Feb 08 04:33:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.