[SERVER-74647] Resharding state machine creation should be retried after interruption Created: 06/Mar/23 Updated: 29/Oct/23 Resolved: 29/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.0.15, 6.2.0-rc6, 6.0.5, 6.2.1, 6.3.0-rc2 |
| Fix Version/s: | 7.0.0-rc0, 6.0.6, 5.0.17 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tommaso Tocci | Assignee: | Brett Nawrocki |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v6.3, v6.2, v6.0, v5.0
|
||||||||
| Participants: | |||||||||
| Linked BF Score: | 105 | ||||||||
| Description |
|
createReshardingStateMachine is the function in charge of:
Unfortunately this function is not idempotent. In fact if during the first execution the opCtx gets interrupted between (1.) and (2.) on subsequent executions the function will try to execute (1.) it will fail with a DuplicateKey error and it will not execute (2.). Thus in this scenario the state machine document will be written on disk but the POS instance for the recipient/donor won't be actually installed and executed leaving the resharding operation in an "hang" state. The createReshardingStateMachine is called as part of shard version recovery procedure, the operation context of this procedure is interrupted every time some thread enter the collection critical for instance as part of a chunk migration. One possible solution would be to attempt the creation of the POS instance even in case we hit the DuplicateKey error on insertion. |
| Comments |
| Comment by Githook User [ 30/Mar/23 ] |
|
Author: {'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}Message: (cherry picked from commit c6fbd4ae07365389aa544f28e718eecf740604c7) |
| Comment by Githook User [ 30/Mar/23 ] |
|
Author: {'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}Message: (cherry picked from commit c6fbd4ae07365389aa544f28e718eecf740604c7) |
| Comment by Githook User [ 29/Mar/23 ] |
|
Author: {'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}Message: |