Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48673

Worker thread may exhaust command retries when using passConnectionCache=true in concurrency stepdown suites

    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-07-13, Sharding 2020-07-27
    • 35

      Unlike the connect() function used by the makeNewConnWithExistingSession() function which retries for up to 10 minutes, the getReplSetName() function only retries up to 3 times on a network error. It is possible for the stepdown thread in resmoke.py to be in the midst of killing one of the shards and for the retries to be exhausted.

      const getReplSetName = (conn) => {
          const res = assert.commandWorked(conn.getDB('admin').runCommand({isMaster: 1}));
                    typeof res.setName,
                    () => `not connected to a replica set: ${tojson(res)}`);
          return res.setName;

      Retrying conn.getDB('admin').runCommand({isMaster: 1}) in an assert.soon() would enable the isMaster command for getting the shard's replica set name to retry as long as establishing a replica set connection to the shard is allowed.

            janna.golden@mongodb.com Janna Golden
            max.hirschhorn@mongodb.com Max Hirschhorn
            0 Vote for this issue
            2 Start watching this issue