-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Catalog and Routing
-
ALL
-
🟩 Routing and Topology
-
None
-
None
-
None
-
None
-
None
-
None
Running the jstest code below, setClusterParameter will just keep trying to set the parameter on a shard will never succeed because of a failpoint. The only log message you get at default verbosity is a deeply-internal one that doesn't help understand the situation and in fact could mislead you to thinking the network is bad ("ctx":"NetworkInterfaceTL-ConfigsvrCoordinatorServiceNetwork","msg":"Ending connection due to bad connection status","attr":
{"hostAndPort":"ip-10-122-13-216:20045","error":"CallbackCanceled: Baton wait canceled"). Maybe in practice, shards always execute this command successfully, but we've seen a (rather unique) case where this can cause a lot of confusion. Here is the jstest code, with increased log verbosity for some relevant components that makes it clearer what's going on:
import {ShardingTest} from "jstests/libs/shardingtest.js"; import {configureFailPoint} from "jstests/libs/fail_point_util.js"; const st = new ShardingTest({shards: 3}); const csrsPrimary = st.configRS.getPrimary(); csrsPrimary.adminCommand({ setParameter: 1, logComponentVerbosity: { network: { verbosity: 2, asio: { verbosity: 5 } } }, }); const shard0Primary = st.rs0.getPrimary(); jsTest.log('enabling failCommand on shard0 primary (port ' + shard0Primary.port + ')'); let fp = configureFailPoint(shard0Primary, "failCommand", { errorCode: ErrorCodes.InternalError, failCommands: ["_shardsvrSetClusterParameter"], failInternalCommands: true, }); jsTest.log('Running setClusterParameter'); assert.commandWorked( st.s.adminCommand({setClusterParameter: {defaultMaxTimeMS: {readOperations: 120000}}})); st.stop();
The test will hang during setClusterParameter.