[SERVER-72623] Add a server comment in tenant oplog applier for shard merge oplog application mode & isDataConsistent true value. Created: 09/Jan/23 Updated: 29/Oct/23 Resolved: 08/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | shard-merge-milestone-3 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Serverless
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Server Serverless 2023-03-20 | ||||||||
| Participants: | |||||||||
| Description |
|
When a recipient failover happens in the middle of tenant oplog application, we might resume applying those donor oplog entries from a point that was already applied due to the fact we first apply the donor oplog entries and then write no-op donor oplog entries for the given batch and we calculate the resume point by finding the latest no-op oplog entry . So, with the stricter mode 'kSecondary', re-applying those donor oplog entries can cause failure and leads to shard merge failure unnecessarily. So, change the tenant oplog application mode to kInitialSync (lenient mode) |
| Comments |
| Comment by Githook User [ 08/Mar/23 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by Suganthi Mani [ 03/Mar/23 ] |
|
Shard Merge is not robust to donor/recipient failovers, restarts and rollbacks (see the design doc). No fix is required, just to give clarity to future readers, add a comment here on why it's ok to have isDataConsistent to true. |
| Comment by Suganthi Mani [ 01/Feb/23 ] |
|
Note: As part of the ticket, `isDataConsistent` should be set to false as data can be inconsistent on recipient failover during oplog catchup phase where we apply oplog entries in parallel. The side-effect of setting `isDataConsistent` is true is that we might record pre/post images of change streams and rFAM wrongly (See |