-
Type: Bug
-
Resolution: Fixed
-
Priority: Minor - P4
-
Affects Version/s: None
-
Component/s: Testing Infrastructure
-
Fully Compatible
-
ALL
-
v4.4
-
Sharding 2020-07-13, Sharding 2020-07-27
-
35
Unlike the connect() function used by the makeNewConnWithExistingSession() function which retries for up to 10 minutes, the getReplSetName() function only retries up to 3 times on a network error. It is possible for the stepdown thread in resmoke.py to be in the midst of killing one of the shards and for the retries to be exhausted.
const getReplSetName = (conn) => { const res = assert.commandWorked(conn.getDB('admin').runCommand({isMaster: 1})); assert.eq('string', typeof res.setName, () => `not connected to a replica set: ${tojson(res)}`); return res.setName; };
Retrying conn.getDB('admin').runCommand({isMaster: 1}) in an assert.soon() would enable the isMaster command for getting the shard's replica set name to retry as long as establishing a replica set connection to the shard is allowed.