[SERVER-32554] Source shard stepdown while entering critical section can trigger cloner invariant Created: 05/Jan/18 Updated: 30/Oct/23 Resolved: 08/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.3, 3.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.6
|
||||||||
| Sprint: | Sharding 2018-01-15 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
A stepdown during MigrationSourceManager::enterCriticalSection can trigger the cleanupOnError scope guard and eventually call MigrationSourceManager::_cleanup. This function std::moves the manager's clone driver into a local variable so it is destructed when the function exits, but it calls two functions before calling cancelClone on the cloneDriver (which puts it into state kDone), and if either of them throws (which ShardServerCatalogCacheLoader::waitForCollectionFlush can if the node's replication role changes), the invariant in the clone driver's destructor fails, because it will still be in state kCloning. I think the fix would be to either move the cancelClone call earlier in _cleanup, or put it in a scope guard declared after _cloneDriver is extracted into a local variable. |
| Comments |
| Comment by Githook User [ 16/Jan/18 ] |
|
Author: {'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}Message: (cherry picked from commit 9f36a4b903401c8d2f1f7e248cf96eaa64ff99ce) |
| Comment by Githook User [ 08/Jan/18 ] |
|
Author: {'name': 'Jack Mulrow', 'username': 'jsmulrow', 'email': 'jack.mulrow@mongodb.com'}Message: |