[SERVER-72199] Handle initial sync, resync and unclean restarts. Created: 16/Dec/22 Updated: 29/Oct/23 Resolved: 06/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | shard-merge-milestone-3 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Serverless
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Sprint: | Server Serverless 2023-01-09, Server Serverless 2023-01-23, Server Serverless 2023-02-06, Server Serverless 2023-02-20, Server Serverless 2023-03-06, Server Serverless 2023-03-20, Server Serverless 2023-04-03, Server Serverless 2023-04-17 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Description |
|
Fail the initial sync in the following cases to prevent potential data corruption |
| Comments |
| Comment by Githook User [ 06/Apr/23 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Githook User [ 06/Apr/23 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Suganthi Mani [ 10/Feb/23 ] |
|
(Note for future readers:) So, if initial sync oplog applier tries to applies any changes to shard merge recipient state document namespace. then we crash the initial syncing node. Since the data is inconsistent during oplog catchup phase of logical initial sync, it's not guaranteed that we would hit the op observer for each oplog entry processed in the oplog catchup phase. so, we can't observe these changes in the op observer. So, we should execute those checks before we start to apply oplog entry (i.e, at applyOpertion_inlock()). But at that point, we have no way to differentiate this change is coming from MTM or shard merge protocol (without any modification to the generation of state document oplog entry itself). So, we decided to solve this problem cleanly by having separate POS namespace, thereby having separate shard merge recipient POS service ( |