[SERVER-50398] appliedThrough document write and background validation reads could use the same timestamp Created: 20/Aug/20 Updated: 10/Sep/20 Resolved: 10/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Lingzhi Deng | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Background validation reads at the all_durable timestamp. My understanding is that on secondaries, the all_durable timestamp can advance before a batch application completes, i.e. being ahead of the lastApplied timestamp. It is possible for the oplog applier to write (commit) the appliedThrough document using the timestamp of the last oplog entry in the batch while the background validation is reading at the all_durable which could be the same as the timestamp used in the appliedThrough write. Actually, collection validations take the PBWM lock and should conflict with oplog application. But unfortunately, by the time we write the appliedThrough, we already release the PBWM lock (which is only held in _applyOplogBatch). So background validation can start reading from all_durable which would be the timestamp of the last oplog entry in the batch by then while the oplog applier thread is trying to write using that same timestamp. And this violates WT's assertion that commit timestamp is newer than all readers. This assertion is currently disabled. |
| Comments |
| Comment by Lingzhi Deng [ 20/Aug/20 ] |
|
Cool, using kNoOverlap would work too I think. So this ticket could be a dup of |
| Comment by Daniel Gottlieb (Inactive) [ 20/Aug/20 ] |
|
IIRC, the patch gregory.wlodarek had for taking out the PBWM also changed the "read source" from "all durable" to kNoOverlap. |
| Comment by Eric Milkie [ 20/Aug/20 ] |
|
We are planning to take out the acquisition of the PBWM lock for "background" validation (right now, due to this lock acquisition, it's not actually background), so any solution here will need to take that into account. |
| Comment by Lingzhi Deng [ 20/Aug/20 ] |
|
Maybe we can instead write the appliedThrough document inside _applyOplogBatch with the PBWM lock? |