[SERVER-74205] Fix wall time correctness bug in CollectionTruncateMarkers Created: 21/Feb/23 Updated: 27/Oct/23 Resolved: 10/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Olivares Provencio | Assignee: | Jordi Olivares Provencio |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Sprint: | Execution Team 2023-03-06, Execution Team 2023-03-20 |
| Participants: |
| Description |
|
With the creation of CollectionTruncateMarkers we discovered that there is a correctness bug in the previous OplogStones class with regards to the retention time. In a given marker right now the wallTime and record id assigned are the ones of the latest record inserted. This presumes that the following constraints hold given records A and B with ids idA, idB, and wall times wallA, wallB:
There might be the case though that the following case occurs sometimes in entries:
In this case, time-based expiration systems would eagerly delete something before it's supposed expiration time. To fix this we should ideally replace the simple marker creation with something that keeps track of the highest wallTime seen between markers. This would help prevent the correctness issue for time-based expiration systems by guaranteeing that all items present in a marker are older than the given wall time. |
| Comments |
| Comment by Jordi Olivares Provencio [ 10/Mar/23 ] |
|
Right now this not a concern due to the only truncate markers users being:
This case described here can happen either due to the clock changing in the middle of a truncate marker, or due to concurrent writes not being ordered. For the first case, only Pre-image and Oplog truncation are susceptible due to operator misconfiguration of the system clock on premises. In this case however the oplog truncating due to growing past the maximum size would cause pre-images to expire also. This means the failure would be resolved eventually by the oplog rolling over. In the case minimum retention time is configured for the oplog then the oplog is already susceptible to this issue so the failure is already one that's known and hasn't caused any issues. For the latter case, the difference in wall time between entries is minuscule to the point that it probably won't even manifest since the resolution is in the milliseconds. Truncation also happens at the second resolution in pre-image/change collection maintenance, so it is quite improbable to hit this scenario in any meaningful way. As such, we're closing this ticket as Works as designed since the modes of failure are either too remote to be a valid concern or don't lead to a permanent failure (first scenario on premises). |