[SERVER-50398] appliedThrough document write and background validation reads could use the same timestamp Created: 20/Aug/20  Updated: 10/Sep/20  Resolved: 10/Sep/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Lingzhi Deng Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by WT-4780 Enable assertion that commit timestam... Closed
Duplicate
duplicates SERVER-47681 Background validation uses the kNoOve... Closed
Related
related to SERVER-47681 Background validation uses the kNoOve... Closed
Operating System: ALL
Participants:

 Description   

Background validation reads at the all_durable timestamp. My understanding is that on secondaries, the all_durable timestamp can advance before a batch application completes, i.e. being ahead of the lastApplied timestamp.

It is possible for the oplog applier to write (commit) the appliedThrough document using the timestamp of the last oplog entry in the batch while the background validation is reading at the all_durable which could be the same as the timestamp used in the appliedThrough write.

Actually, collection validations take the PBWM lock and should conflict with oplog application. But unfortunately, by the time we write the appliedThrough, we already release the PBWM lock (which is only held in _applyOplogBatch). So background validation can start reading from all_durable which would be the timestamp of the last oplog entry in the batch by then while the oplog applier thread is trying to write using that same timestamp. And this violates WT's assertion that commit timestamp is newer than all readers. This assertion is currently disabled.



 Comments   
Comment by Lingzhi Deng [ 20/Aug/20 ]

Cool, using kNoOverlap would work too I think. So this ticket could be a dup of SERVER-47681 then.

Comment by Daniel Gottlieb (Inactive) [ 20/Aug/20 ]

IIRC, the patch gregory.wlodarek had for taking out the PBWM also changed the "read source" from "all durable" to kNoOverlap.

Comment by Eric Milkie [ 20/Aug/20 ]

We are planning to take out the acquisition of the PBWM lock for "background" validation (right now, due to this lock acquisition, it's not actually background), so any solution here will need to take that into account.

Comment by Lingzhi Deng [ 20/Aug/20 ]

Maybe we can instead write the appliedThrough document inside _applyOplogBatch with the PBWM lock?

Generated at Thu Feb 08 05:22:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.