[SERVER-66084] set_step_params.js can induce deadlock by preventing targeter from discovering shard Created: 29/Apr/22 Updated: 06/Dec/22 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | George Wangensteen | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Service Arch
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The test set_step_param.js (https://github.com/mongodb/mongo/blob/e3f9dca6ad9888fd696c99f1b1ae2a4c7fdd932b/jstests/noPassthrough/set_step_params.js#L1) ; attempts to test the ability to cap the number of connections the connection pools can have in the currently-establishing state via the 'maxConnecting' sharding-task-executor parameter. It works as follows:
However, the following bad interleaving is possible, causing a deadlock for the operations:
In short, the test is stuck waiting for the RSM to update monitoring of the shard, but the RSM is blocked waiting for the test to release the waitInHello failpoint.
To fix this we should probably have the test hang connection establishment from the mongos connection-pool under-test on the mongos side, rather than using the waitInHello failpoint to hand connection establishment on the shard-side. This will allow other necessary connections to the shard, like this RSM monitoring, to go through. |