[SERVER-67348] Fix race condition on set_cluster_parameter.js Created: 17/Jun/22 Updated: 29/Oct/23 Resolved: 08/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Marcos José Grillo Ramirez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Sprint: | Sharding EMEA 2022-07-11 | ||||
| Participants: | |||||
| Linked BF Score: | 11 | ||||
| Description |
|
set_cluster_param.js tests that the following two actions are serialized:
To check that, the tests uses the 'failCommand' failpoint with an 'errorCode' PrimarySteppedDown, which the coordinator will retry on. However, the coordinator only reties the _shardsvrSetClusterParameter command a fixed amount of attempts. Once the attempts are exhausted, the coordinator will release the lock that was ensuring the serialization, and retry again later. This is fine, but the test can fail because it can issue the addShard command at a moment where the coordinator is not holding the lock after having exhausted retries. |
| Comments |
| Comment by Githook User [ 08/Jul/22 ] |
|
Author: {'name': 'Marcos José Grillo Ramirez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}Message: |
| Comment by Jordi Serra Torrens [ 17/Jun/22 ] |
|
Attached repro. A simple solution is to mimic what jstests/sharding/set_user_write_block_mode.js does: It uses a failpoint to block a command, instead of relying on the 'failCommand' failpoint making the _shardsvrSetClusterParameter fail with a retriable error. |