[SERVER-74585] Ensure shard Merge recipient aborts correctly on rollbacks and restarts. Created: 03/Mar/23  Updated: 29/Oct/23  Resolved: 29/Aug/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0, 7.1.0-rc1

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: shard-merge-milestone-3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-72215 Delete the donor WT files that are un... Closed
is depended on by SERVER-74614 Complete TODO listed in SERVER-63204 Closed
Duplicate
is duplicated by SERVER-61677 Abort migration on rollback etc. Closed
is duplicated by SERVER-63752 Make sure the shard merge never leave... Closed
is duplicated by SERVER-72202 Merge shouldn’t re-establish network ... Closed
is duplicated by SERVER-72203 Merge shouldn’t re-establish network ... Closed
is duplicated by SERVER-72205 Persist the backup cursorID info in t... Closed
is duplicated by SERVER-72206 Ensure the backup cursor is closed af... Closed
is duplicated by SERVER-72209 TenantFileImportService should do no ... Closed
is duplicated by SERVER-73900 Ensure no collection gets imported af... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.1
Sprint: Server Serverless 2023-03-20, Server Serverless 2023-04-03, Server Serverless 2023-04-17, Server Serverless 2023-05-01, Server Serverless 2023-05-15, Server Serverless 2023-05-29, Server Serverless 2023-07-24, Server Serverless 2023-08-07, Server Serverless 2023-08-21, Server Serverless 2023-09-04
Participants:

 Description   

Shard Merge is not robust to donor/recipient failovers, restarts and rollbacks. So, the following items should be taken care by this ticket
1) ShardMergeRecipientService is interrupted correctly during rollback/shutdown/abort migration cases.
2) Any data (includes temp WT directory, imported collection, idents to mdb catalog and storage) /resources (eg. backup cursor) that are copied/allocated as part of failed migration attempt is deleted/freed correctly upon node rollback/restart and abort migration cases.
3) No data corruption due to restarts/rollbacks after migration is committed.



 Comments   
Comment by Githook User [ 31/Aug/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-74585 tenant_migration_donor_cmd_response.js fix.

(cherry picked from commit bd1e7e7d9c47163017696f0879bf7f3e363061c9)
Branch: v7.1
https://github.com/mongodb/mongo/commit/0567c351ebcfcd5d7a4e28e5b78ebc53cfd883a7

Comment by Githook User [ 31/Aug/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-74585 Ensure shard Merge recipient aborts correctly on rollbacks and restarts.

(cherry picked from commit 5b43629c69b2bc67232936b05c107aa17ae5b8eb)
Branch: v7.1
https://github.com/mongodb/mongo/commit/5267a8ccd2f1975ee10b8b0ca06ef22b50e956c5

Comment by Githook User [ 29/Aug/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-74585 tenant_migration_donor_cmd_response.js fix.
Branch: master
https://github.com/mongodb/mongo/commit/bd1e7e7d9c47163017696f0879bf7f3e363061c9

Comment by Githook User [ 29/Aug/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-74585 Ensure shard Merge recipient aborts correctly on rollbacks and restarts.
Branch: master
https://github.com/mongodb/mongo/commit/5b43629c69b2bc67232936b05c107aa17ae5b8eb

Comment by Suganthi Mani [ 03/Mar/23 ]

Copy-paste from here.
Add the attached tenant_migration_shard_merge_invalid_tenants.js test as part of this PR. Importing data for non-migrated tenants will make the file copy fail following SERVER-71831. However we expect to rely on the timeout to catch the failure to import files, therefore it is not tested yet.

Generated at Thu Feb 08 06:27:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.