[SERVER-53812] replsettest.awaitReplication does not work with keyfile authentication Created: 14/Jan/21  Updated: 29/Oct/23  Resolved: 18/Feb/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Mark Benvenuto Assignee: Xuerui Fa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-53605 Ensure replsettest.asCluster works wi... Closed
Related
related to SERVER-56937 upgradeSet() in multi_rs.js may lose ... Closed
is related to SERVER-14017 Refactor ShardingTest and ReplSetTest... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

/**
 * @tags: [requires_persistence, requires_replication]
 */
 
 (function() {
    'use strict';
 
    const rst = new ReplSetTest({
        nodes: 3,
        waitForKeys: false,
        nodeOptions: {
            keyFile: "jstests/libs/key1",
        }
    });
    rst.startSet();
 
    rst.initiateWithAnyNodeAsPrimary(
        Object.extend(rst.getReplSetConfig(), {writeConcernMajorityJournalDefault: true}));
 
    rst.awaitReplication();
 
    let primary = rst.getPrimary();
 
    primary.getDB('admin').createUser({user: 'root', pwd: 'root', roles: ['root']}, {w: 3});
    primary.getDB("admin").auth("root", "root");
    assert.commandWorked(primary.getDB("admin").runCommand({hello: 1}));
    assert.commandWorked(primary.getDB('test').a.insert({a: 1, str: 'TESTTESTTEST'}));
 
    rst.awaitReplication();
 
    rst.stopSet();
})();

Sprint: Repl 2021-02-08, Repl 2021-02-22
Participants:
Linked BF Score: 20

 Description   

replsettest.awaitReplication() does not work when auth is enabled and when using keyfile authentication. It does not work with clusterAuthMode=x509 (SERVER-53605) but that has never worked properly.

replsettest.stopSet() will also not work.

Example:

2021-01-14T15:51:16.331-0500 assert.retryNoExcept caught exception, exception: Error: command failed: {
2021-01-14T15:51:16.332-0500  "ok" : 0,
2021-01-14T15:51:16.332-0500  "errmsg" : "not authorized on admin to execute command { replSetGetConfig: 1.0, lsid: { id: UUID(\"0492ea4c-a83d-4651-9112-7e779ab576c0\") }, $clusterTime: { clusterTime: Timestamp(1610657473, 1), signature: { hash: BinData(0, 83B3AD543903CD26B9E0F7EE312DAECA5578BC05), keyId: 6917721137233264644 } }, $readPreference: { mode: \"secondaryPreferred\" }, $db: \"admin\" }",
2021-01-14T15:51:16.332-0500  "code" : 13,
2021-01-14T15:51:16.332-0500  "codeName" : "Unauthorized",
2021-01-14T15:51:16.332-0500  "$clusterTime" : {
2021-01-14T15:51:16.332-0500          "clusterTime" : Timestamp(1610657473, 1),
2021-01-14T15:51:16.332-0500          "signature" : {
2021-01-14T15:51:16.332-0500                  "hash" : BinData(0,"g7OtVDkDzSa54PfuMS2uylV4vAU="),
2021-01-14T15:51:16.332-0500                  "keyId" : NumberLong("6917721137233264644")
2021-01-14T15:51:16.332-0500          }
2021-01-14T15:51:16.333-0500  },
2021-01-14T15:51:16.333-0500  "operationTime" : Timestamp(1610657473, 1)
2021-01-14T15:51:16.333-0500 }
2021-01-14T15:51:16.333-0500 assert.retry failed on attempt 3 of 3
2021-01-14T15:51:17.333-0500 ReplSetTest awaitReplication: couldnt get repl set config. The hang analyzer is automatically called in assert.retry functions. If you are *expecting* assert.soon to possibly fail, call assert.retry with {runHangAnalyzer: false} as the fifth argument (you can fill unused arguments with `undefined`). Running hang analyzer from assert.retry.
2021-01-14T15:51:17.334-0500 Skipping runHangAnalyzer: not running in Evergreen
2021-01-14T15:51:17.341-0500 uncaught exception: Error: ReplSetTest awaitReplication: couldnt get repl set config. The hang analyzer is automatically called in assert.retry functions. If you are *expecting* assert.soon to possibly fail, call assert.retry with {runHangAnalyzer: false} as the fifth argument (you can fill unused arguments with `undefined`). :
2021-01-14T15:51:17.341-0500 doassert@src/mongo/shell/assert.js:20:14
2021-01-14T15:51:17.341-0500 assert.retry@src/mongo/shell/assert.js:450:9
2021-01-14T15:51:17.341-0500 assert.retryNoExcept@src/mongo/shell/assert.js:463:9
2021-01-14T15:51:17.341-0500 ReplSetTest/this.awaitReplication@src/mongo/shell/replsettest.js:1949:9
2021-01-14T15:51:17.341-0500 @cluster_keyfile_experiment.js:20:5
2021-01-14T15:51:17.342-0500 @cluster_keyfile_experiment.js:5:3
2021-01-14T15:51:17.342-0500 failed to load: cluster_keyfile_experiment.js
2021-01-14T15:51:17.342-0500 exiting with code -3



 Comments   
Comment by Githook User [ 18/Feb/21 ]

Author:

{'name': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com', 'username': 'XueruiFa'}

Message: SERVER-53812: Fix awaitReplication in ReplSetTest with keyfile auth enabled
Branch: master
https://github.com/mongodb/mongo/commit/cd333ff3ea110a8993c41d289f8e181f60e4abd6

Comment by Xuerui Fa [ 12/Feb/21 ]

For 1 and 2, I believe this issue has existed for a long time, and the failure appears to be consistent. I tried running Mark's repro on a compiled version of master from a few months ago, and it still failed. I believe Mark discovered this bug when he was working on another BF fix, prior to that we probably didn't have any tests that tested this exact scenario.

I think the minimum fix would be adding asCluster() to each command in awaitReplication(). After adding that however, I found that this error appears if we authenticate first:

*[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] {
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 	"ok" : 0,
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 	"errmsg" : "logical sessions can't have multiple authenticated users (for more details see: https://docs.mongodb.com/manual/core/authentication/#authentication-methods)",
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 	"code" : 13,
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 	"codeName" : "Unauthorized",
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 	"$clusterTime" : {
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 		"clusterTime" : Timestamp(1611949466, 7),
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 		"signature" : {
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.647+0000 [jsTest] 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.648+0000 [jsTest] 			"keyId" : NumberLong(0)
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.648+0000 [jsTest] 		}
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.648+0000 [jsTest] 	},
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.648+0000 [jsTest] 	"operationTime" : Timestamp(1611949466, 7)
[js_test:keyfile_auth_test] 2021-01-29T19:44:27.648+0000 [jsTest] }

As a result, I also modified asCluster to first check if the given connections are authenticated already. This seems to have resolved the issue, I'm currently running an evergreen patch to verify this.

Comment by Steven Vannelli [ 25/Jan/21 ]

Next steps:

  1. Figure out why this suddenly started happening
  2. Understand why this is happening intermittently 
  3. What's the minimum fix for this and how long would that take?
Comment by Xuerui Fa [ 22/Jan/21 ]

In ReplSetTest, we only maintain one connection to each node. For auth tests, this connection has to be authenticated so that commands can be successfully received. We currently authenticate on a command-by-command basis through the asCluster function, which will sign us in as a user, run the command, then log out. This means that for auth tests, we would have to ensure that each command is correctly authenticated. This seems overly complicated, expensive, and difficult to maintain.

It seems like the idea proposed SERVER-14017 would be more worthwhile in resolving this problem. We can maintain two connections, one that is authenticated for control operations like awaitReplication(), and another that will handle test operations. Marking this as "Needs Scheduling" for further discussion in Triage.

Comment by Steven Vannelli [ 21/Jan/21 ]

xuerui.fa assigning this to you for BF Friday. Please try to work closely with the Security team on this.

Generated at Thu Feb 08 05:31:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.