[SERVER-32592] Stepdown during migration cleanup can crash the source shard primary Created: 08/Jan/18  Updated: 30/Oct/23  Resolved: 30/Jan/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.2
Fix Version/s: 3.6.3, 3.7.2

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Sharding 2018-01-29, Sharding 2018-02-12
Participants:
Linked BF Score: 0

 Description   

If the source shard primary steps down during a migration, it can trigger one of the cleanupOnError scope guards in the MigrationSourceManager. This calls MigrationManager::_cleanup, which can call ShardServerCatalogCacheLoader::waitForCollectionFlush, which will uassert if the node is no longer primary, and because this was called inside a scope guard and isn't caught, the exception triggers std::terminate() and crashes the server.

Example failure: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_coverage_concurrency_sharded_with_stepdowns_and_balancer_patch_e4ba7722773f68d42a66af7439e585cc2136d003_5a4ceaabe3c3316388000020_18_01_03_14_41_42/0



 Comments   
Comment by Githook User [ 01/Feb/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}

Message: SERVER-32592 Ignore stepdown errors in MigrationSourceManager::cleanupOnError

(cherry picked from commit 1a8a0e35dca5f3a2bd4fa40a5e80576f1c72e221)
Branch: v3.6
https://github.com/mongodb/mongo/commit/8111ba6001650c3df1d1cfe9e8bc9203a82e6586

Comment by Githook User [ 30/Jan/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}

Message: SERVER-32592 Ignore stepdown errors in MigrationSourceManager::cleanupOnError
Branch: master
https://github.com/mongodb/mongo/commit/1a8a0e35dca5f3a2bd4fa40a5e80576f1c72e221

Generated at Thu Feb 08 04:30:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.