[SERVER-48673] Worker thread may exhaust command retries when using passConnectionCache=true in concurrency stepdown suites Created: 09/Jun/20  Updated: 29/Oct/23  Resolved: 22/Jul/20

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.11

Type: Bug Priority: Minor - P4
Reporter: Max Hirschhorn Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-07-13, Sharding 2020-07-27
Participants:
Linked BF Score: 35

 Description   

Unlike the connect() function used by the makeNewConnWithExistingSession() function which retries for up to 10 minutes, the getReplSetName() function only retries up to 3 times on a network error. It is possible for the stepdown thread in resmoke.py to be in the midst of killing one of the shards and for the retries to be exhausted.

const getReplSetName = (conn) => {
    const res = assert.commandWorked(conn.getDB('admin').runCommand({isMaster: 1}));
    assert.eq('string',
              typeof res.setName,
              () => `not connected to a replica set: ${tojson(res)}`);
    return res.setName;
};

Retrying conn.getDB('admin').runCommand({isMaster: 1}) in an assert.soon() would enable the isMaster command for getting the shard's replica set name to retry as long as establishing a replica set connection to the shard is allowed.



 Comments   
Comment by Githook User [ 08/Nov/21 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-48673 Allow retrying establishing a connection to a replica set in getReplSetName in the event of a network error

(cherry picked from commit 001e6e2d180eea5262dc524675ef1cfab8fb8f8f)
Branch: v4.4
https://github.com/mongodb/mongo/commit/00828e0ea9805486dfa7cf6d534b656a0d576de3

Comment by Githook User [ 20/Jul/20 ]

Author:

{'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}

Message: SERVER-48673 Allow retrying establishing a connection to a replica set in getReplSetName in the event of a network error
Branch: master
https://github.com/mongodb/mongo/commit/001e6e2d180eea5262dc524675ef1cfab8fb8f8f

Generated at Thu Feb 08 05:17:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.