[SERVER-79024] Avoid deleting pre-image/change collection entries before allDurable/lastApplied timestamps Created: 17/Jul/23 Updated: 06/Sep/23 Resolved: 06/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jordi Olivares Provencio | Assignee: | Jordi Olivares Provencio |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Storage Execution EMEA
|
||||
| Backport Requested: |
v7.1, v7.0, v6.0
|
||||
| Sprint: | Execution EMEA Team 2023-09-04, Execution EMEA Team 2023-09-18 | ||||
| Participants: | |||||
| Description |
|
As change collections and preimages are implicitly replicated collections only deletes are propagated. This leads to an oplog applier race between the delete and insert since both operations could occur at the same time. To avoid this issue we should make sure that only entries after the lastApplied/allDurable timestamps are deleted. That way we can ensure that the insert will always occur before a delete. Note that this only occurs with replicated deletes. Unreplicated truncates already solved this issue with |
| Comments |
| Comment by Jordi Olivares Provencio [ 06/Sep/23 ] |
|
Closing this as Won't Do. This will be fixed in |
| Comment by Jordi Olivares Provencio [ 31/Aug/23 ] |
|
Requesting backports back to 6.0 since it affects preimage correctness |
| Comment by Jordi Olivares Provencio [ 24/Jul/23 ] |
|
I've rewritten the ticket as we realised that this only kicked the can down the road. As |
| Comment by Josef Ahmad [ 18/Jul/23 ] |
|
This is only a problem when the TTL period of these internal tables is set to an unreasonably low value (less than a few seconds). A customer choosing to do so is more likely to experience more fundamental problems – e.g. change stream falling off 'the oplog' – than the inconsistency described here. |
| Comment by Josef Ahmad [ 18/Jul/23 ] |
|
This can also be a problem in the absence of lagging secondaries when using an unreasonably low expireAfterSeconds approximating an oplog hole's duration. With cleanup based on (replicated) multi-deletes and expireAfterSeconds=0, the secondary races the application of replicated deletes with the (implicit) application of inserts to these change stream tables: because the inserts are implicit, there's no way for the applier threads to serialise deletes with inserts on these tables. As a result, we've observed these tables become inconsistent. Jordi's proposal to set a reasonable floor (10s) to expireAfterSeconds eradicates this corner case. |