[SERVER-53343] Tests which write to ConfigServer collections are not safe to run in the sharding_csrs_continuous_config_stepdown suite Created: 14/Dec/20  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: cs-subteam1, sharding-csrs-stepdown-upkeep, sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-59891 Replace the coverage from sharding_co... Backlog
is related to SERVER-54796 Blacklist blanacer_shell_commands.js ... Closed
Assigned Teams:
Cluster Scalability
Operating System: ALL
Sprint: Sharding 2021-07-12, Sharding 2021-07-26, Sharding 2021-08-09
Participants:
Linked BF Score: 0

 Description   

The changes from SERVER-51070 removed a special-cased logic on MongoS which was automatically retrying writes against the config server in exchange for them going through the regular write path.

The original logic was parsing the internal requests and bubbling up the embedded errors, which were automatically retried, but this was only correct if the writes were actually idempotent (which is the case of our tests, but not safe for general writes to the config server).

The regular write path uses a RetryPolicy::kNoRetry policy, which means NotPrimary errors for example will be passed directly back to the client. It should be the client (in this case the tests), which perform retries rather than MongoS doing it unconditionally.



 Comments   
Comment by Max Hirschhorn [ 11/Sep/21 ]

Hoping to not do this ticket and to do SERVER-59891 instead.

Comment by Kaloian Manassiev [ 25/Feb/21 ]

tommaso.tocci, yes, blacklisting it is fine. This test is just for the shell commands, which are anyways ran manually, so robustness there is not a priority.

Comment by Tommaso Tocci [ 22/Feb/21 ]

The test that is failing the most due to this bug is balancer_shell_commands.js, I propose at least to blacklist it from the sharding_csrs_continuous_config_stepdown suite until we implement a proper fix.

Comment by Max Hirschhorn [ 14/Dec/20 ]

One thought here would be to set retryWrites=true for operations on the config database. Session options are wired up globally for all databases so this would require some work if we wanted to limit this configuration only the config database. prepareCommandRequest() doesn't currently receive the database name the command is being run on, but if it did then we could something like the following:

const session = db.getSession();
session._serverSession.assignTransactionNumber = function assignTransactionNumberIfConfig(dbName, cmdObj) {
    if (dbName !== "config") {
        // Skip injecting the txnNumber for operations not run on the config database.
        return cmdObj;
    }
 
    return session._serverSession.apply(this, arguments);
};

Generated at Thu Feb 08 05:30:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.