Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48673

Worker thread may exhaust command retries when using passConnectionCache=true in concurrency stepdown suites

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v4.4
    • Sharding 2020-07-13, Sharding 2020-07-27
    • 35

    Description

      Unlike the connect() function used by the makeNewConnWithExistingSession() function which retries for up to 10 minutes, the getReplSetName() function only retries up to 3 times on a network error. It is possible for the stepdown thread in resmoke.py to be in the midst of killing one of the shards and for the retries to be exhausted.

      const getReplSetName = (conn) => {
          const res = assert.commandWorked(conn.getDB('admin').runCommand({isMaster: 1}));
          assert.eq('string',
                    typeof res.setName,
                    () => `not connected to a replica set: ${tojson(res)}`);
          return res.setName;
      };
      

      Retrying conn.getDB('admin').runCommand({isMaster: 1}) in an assert.soon() would enable the isMaster command for getting the shard's replica set name to retry as long as establishing a replica set connection to the shard is allowed.

      Attachments

        Activity

          People

            janna.golden@mongodb.com Janna Golden
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: