[SERVER-72199]  Handle initial sync, resync and unclean restarts. Created: 16/Dec/22  Updated: 29/Oct/23  Resolved: 06/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: shard-merge-milestone-3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-73397 Create separate POS instance for shar... Closed
is depended on by SERVER-72215 Delete the donor WT files that are un... Closed
Duplicate
is duplicated by SERVER-72200 R nodes should create a local.import_... Closed
is duplicated by SERVER-72201 Handle cloud-provider snapshot resync. Closed
is duplicated by SERVER-72208 R nodes should take a checkpoint bef... Closed
Problem/Incident
causes SERVER-75831 Coverity analysis defect 137537: Wrap... Closed
Assigned Teams:
Serverless
Backwards Compatibility: Fully Compatible
Sprint: Server Serverless 2023-01-09, Server Serverless 2023-01-23, Server Serverless 2023-02-06, Server Serverless 2023-02-20, Server Serverless 2023-03-06, Server Serverless 2023-03-20, Server Serverless 2023-04-03, Server Serverless 2023-04-17
Participants:

 Description   

Fail the initial sync in the following cases to prevent potential data corruption
A node tries to apply oplog writes to the recipient state document namespace during the initial sync oplog application phase
The state of the shard merge recipient state document < kConsistent at the end of initial sync recovery.



 Comments   
Comment by Githook User [ 06/Apr/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-72199 tenant_migration_shard_merge_import_write_conflict_retry.js test fix.
Branch: master
https://github.com/mongodb/mongo/commit/95b79f722a489677c64e34bc6a4b4e85621c4340

Comment by Githook User [ 06/Apr/23 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-72199 Shard merge handles initial sync, resync and unclean restarts.
Branch: master
https://github.com/mongodb/mongo/commit/723595c779147f85d49ec0bc727bdc977ee5d98c

Comment by Suganthi Mani [ 10/Feb/23 ]

(Note for future readers:)
Running initial sync(/resync) in parallel to shard merge can cause silent dat corruption. This ticket would fail the initial syncing node on detecting potential data corruption cases.

So, if initial sync oplog applier tries to applies any changes to shard merge recipient state document namespace. then we crash the initial syncing node. Since the data is inconsistent during oplog catchup phase of logical initial sync, it's not guaranteed that we would hit the op observer for each oplog entry processed in the oplog catchup phase. so, we can't observe these changes in the op observer. So, we should execute those checks before we start to apply oplog entry (i.e, at applyOpertion_inlock()). But at that point, we have no way to differentiate this change is coming from MTM or shard merge protocol (without any modification to the generation of state document oplog entry itself). So, we decided to solve this problem cleanly by having separate POS namespace, thereby having separate shard merge recipient POS service (SERVER-73397).

Generated at Thu Feb 08 06:21:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.