[SERVER-65371] MigrationSourceManager running on secondary node may trip invariant Created: 08/Apr/22 Updated: 29/Oct/23 Resolved: 02/Jun/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.3.0, 5.0.6 |
| Fix Version/s: | 5.0.10, 6.0.0-rc9, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Paolo Polato |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v6.0, v5.3, v5.0
|
||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Sprint: | Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 48 | ||||||||||||||||
| Description |
|
The shardsvr's 'moveChunk' is allowed on primary nodes only. However this check is just a best effort – the member state could change anytime later and the command will continue. In this scenario the command will continue executing and will instantiate a MigrationSourceManager: |
| Comments |
| Comment by Githook User [ 09/Jun/22 ] |
|
Author: {'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}Message: |
| Comment by Githook User [ 08/Jun/22 ] |
|
Author: {'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}Message: |
| Comment by Githook User [ 02/Jun/22 ] |
|
Author: {'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}Message: |
| Comment by Githook User [ 06/May/22 ] |
|
Author: {'name': 'Sviatlana Zuiko', 'email': 'sviatlana.zuiko@mongodb.com', 'username': 'szuiko'}Message: Revert " This reverts commit 417cd065b9f437f01269be04941a183b096f9db5. |
| Comment by Githook User [ 05/May/22 ] |
|
Author: {'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}Message: |
| Comment by Jordi Serra Torrens [ 08/Apr/22 ] |
|
This could be fixed by checking that the node is primary after the opCtx has been marked as interruptible here. This guarantees that the MigrationSourceManager only runs on a primary node, and should that node stop being primary (and thus possibly replicating writes done by the new primary), the MSM will first be interrupted. Attaching patch with this proposal. |