[SERVER-61985] resharding_coordinator_recovers_abort_decision.js may report resharding operation as succeeding due to primary shard retrying _configsvrReshardCollection and running a second resharding operation Created: 10/Dec/21 Updated: 29/Oct/23 Resolved: 18/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.11, 6.0.2, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Abdul Qadeer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v6.0, v5.0
|
||||||||||||||||||||||||||||
| Sprint: | Sharding 2022-06-27, Sharding 2022-07-11 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 163 | ||||||||||||||||||||||||||||
| Story Points: | 3 | ||||||||||||||||||||||||||||
| Description |
|
The ReshardingTest fixture configures the reshardingPauseCoordinatorBeforeCompletion with {times: 1} which means that it is automatically disabled once it is reached by a ReshardingCoordinator. The failpoint is automatically disabled once it has been reached and therefore won't actually pause the ReshardingCoordinator. This is problematic for cases where the reshardCollection command is expected to error (i.e. tests which use expectedErrorCode !== ErrorCodes.OK) because the _configsvrReshardCollection can be retried by the primary shard and will have forgotten about an earlier aborted resharding. This can lead an entire second resharding operation to run and, because it runs entirely after the duringReshardingFn finished executing, it won't also abort like the first resharding operation. We should revert the changes to the ReshardingTest fixture from 38c6aff as part of We should also revert the test changes to resharding_nonblocking_coordinator_rebuild.js from |
| Comments |
| Comment by Max Hirschhorn [ 01/Sep/22 ] |
|
Author: {'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}Message: (cherry picked from commit 0d5fd57f9e55915550dd7d13340e2944c169c6e2) |
| Comment by Max Hirschhorn [ 27/Jul/22 ] |
|
Thank you matthew.russotto@mongodb.com, I filed BF-25959 to track the Evergreen failure you observed. abdul.qadeer@mongodb.com, let's hold off on the 6.0 backport until we better understand why the _configsvrReshardCollection command is being issued twice despite the changes to the reshardingPauseCoordinatorBeforeCompletion failpoint behavior. |
| Comment by Matthew Russotto [ 27/Jul/22 ] |
|
This appears to have re-introduced the bug from (note: for some reason I cannot re-open this issue. Not sure if it's a permissions problem or because a backport has been released or what). |
| Comment by Githook User [ 21/Jul/22 ] |
|
Author: {'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}Message: |
| Comment by Githook User [ 18/Jul/22 ] |
|
Author: {'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}Message: |
| Comment by Githook User [ 01/Jul/22 ] |
|
Author: {'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com'}Message: Revert " This reverts commit 88b5b28f901211cb63099b98e3c576826d82e68d. |
| Comment by Githook User [ 30/Jun/22 ] |
|
Author: {'name': 'Abdul Qadeer', 'email': 'abdul.qadeer@mongodb.com', 'username': 'zorro786'}Message: |