[SERVER-48673] Worker thread may exhaust command retries when using passConnectionCache=true in concurrency stepdown suites Created: 09/Jun/20 Updated: 29/Oct/23 Resolved: 22/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.4.11 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Max Hirschhorn | Assignee: | Janna Golden |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.4
|
||||||||
| Sprint: | Sharding 2020-07-13, Sharding 2020-07-27 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 35 | ||||||||
| Description |
|
Unlike the connect() function used by the makeNewConnWithExistingSession() function which retries for up to 10 minutes, the getReplSetName() function only retries up to 3 times on a network error. It is possible for the stepdown thread in resmoke.py to be in the midst of killing one of the shards and for the retries to be exhausted.
Retrying conn.getDB('admin').runCommand({isMaster: 1}) in an assert.soon() would enable the isMaster command for getting the shard's replica set name to retry as long as establishing a replica set connection to the shard is allowed. |
| Comments |
| Comment by Githook User [ 08/Nov/21 ] |
|
Author: {'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}Message: (cherry picked from commit 001e6e2d180eea5262dc524675ef1cfab8fb8f8f) |
| Comment by Githook User [ 20/Jul/20 ] |
|
Author: {'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}Message: |