[SERVER-42906] lastCommitted wall clock time can be greater than lastApplied wall clock time on the primary Created: 19/Aug/19  Updated: 27/Oct/23  Resolved: 21/Aug/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Maria van Keulen Assignee: Backlog - Replication Team
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2019-08-19 at 4.48.14 PM.png    
Issue Links:
Related
related to SERVER-42917 Flow Control should disregard faulty ... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

There are certain workloads that trigger a situation where the reported lastCommitted wall clock time is greater than the reported lastApplied wall clock time on the primary. There were no elections during this time.



 Comments   
Comment by Maria van Keulen [ 21/Aug/19 ]

For those interested, Matthew and I discussed the bulk insert scenario further, and the following description clarified things for me:

So consider a bulk insert. We assign optimes 1-20 up front. Then we go through and log the inserts, assigning the wall times as we log them. Someone else does a single insert while this is happening. It gets an optime of 21, and a wall time somewhere in the middle.

Comment by Maria van Keulen [ 21/Aug/19 ]

judah.schvimer Yes, Flow Control handles this as part of SERVER-42917. It's fine to keep the patch for SERVER-42917 if SERVER-42906 works as designed.

Comment by Judah Schvimer [ 21/Aug/19 ]

matthew.russotto, great point! This seems to be "Works As Designed" then. I think synchronizing wall clock time generation with oplog entry generation would be too much to add to a hot code path.

maria.vankeulen and milkie, is this something flow control can work around?

Comment by Eric Milkie [ 20/Aug/19 ]

Ah but we're talking about the wall-clock times assigned to these oplog entries, and since the last committed calculation never looks at those values, it's entirely possible for them to be out of order, due to clock skew. I see what you're saying now.

Comment by Eric Milkie [ 20/Aug/19 ]

By "wall time commit point" do you mean last applied?
We have gone back and forth on whether to make last applied monotonically increasing or have it jump around and have consumers deal with it. But I don't see a reason, even if it moves backward, to ever go behind the last committed value, because we calculate the last committed value based on oplog visibility (what's visible), and cannot go past uncommitted holes, even if the last applied time ignores holes.

Comment by Matthew Russotto [ 20/Aug/19 ]

I think this may actually be normal. The wall clock time is assigned when we log the oplog entry. The optime may be assigned then, or it may have been reserved earlier (e.g. for bulk inserts and collection creation). The entries will be ordered by optime, and the commit point advances by optime, so it is entirely possible for the wall time commit point to go backwards or to be ahead of the last committed optime, even on the primary.

Comment by Maria van Keulen [ 20/Aug/19 ]

I've filed SERVER-42917 to address this issue at the Flow Control level until the underlying cause is addressed. The work for SERVER-42906 should include undoing the fix from SERVER-42917.

Comment by Maria van Keulen [ 19/Aug/19 ]

Assigning this to Replication to investigate.

Generated at Thu Feb 08 05:01:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.