[SERVER-34564] Potential oplog visibility bug replicating from a secondary Created: 19/Apr/18  Updated: 06/Dec/22  Resolved: 20/Apr/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Daniel Gottlieb (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Duplicate
duplicates SERVER-34565 oplog reads on secondaries should rea... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Linked BF Score: 50

 Description   

This only applies to development versions of 4.0 that allow chained replication where secondaries are running in a mode that allows readers while concurrently processing batches.

There are two mechanisms that publish a new oplog visibility timestamp on secondaries during steady state replication. The first is forcefully setting the read timestamp to the end of a batch after it is applied. The second is a background thread that queries WT for the "all committed" time and publishes it as the new read visibility.

Secondary oplog application works in two phases. The first phase writes a batch of documents to the oplog asynchronously. This has the impact of causing the "all committed" time to jump around as opposed to only incrementing, which is what oplog visibility expects. Without a concurrent reader, this is fine, a reader can only come in when those oplog holes become plugged.

As a last detail, I heard it was expected for secondary oplog readers to use the "last applied" time to cap oplog visibility, which would be correct AFAIK. However, if _isOplogReader is set, the oplog visibility timestamp gets priority over the last applied.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 19/Apr/18 ]

I forgot to mention, this is only hypothesized. Our testing likely doesn't exercise this too well. After talking with people, there doesn't seem to be a mechanism that prevents this. I'll (eventually) work on making a repro unless someone beats me to it.

Generated at Thu Feb 08 04:37:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.