[SERVER-45752] opCtx interruption during migration critical section commit triggers fassert in FCV 4.2 Created: 24/Jan/20 Updated: 29/Oct/23 Resolved: 31/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2020-02-10 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
As part of the critical section in a migration, the donor shard will send _configsvrCommitChunkMigration to the config server to complete the migration. If the command fails and the donor shard is in FCV 4.2, the donor attempts to recover the migration's outcome by doing a write on the config server to recover the latest configOpTime. If this recovery fails, the donor will fassert. The same operation context is used to send the commit and for the recovery operations, so if it is interrupted (e.g. by a killOp command), the commit and recovery will both fail leading to a crash. Notably in FCV >= 4.4, the donor shard instead recovers from a failed commit by repeatedly sending _configsvrEnsureChunkVersionIsGreaterThan, which also reuses the commit's operation context but checks for interrupt, so the migration aborts instead of crashes on interruption. |
| Comments |
| Comment by Githook User [ 31/Jan/20 ] |
|
Author: {'name': 'Esha Maharishi', 'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com'}Message: |