[SERVER-35786] lastDurable optime should be updated after batch application on non-durable storage engines Created: 25/Jun/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 2018-07-30, Repl 2018-08-13, Repl 2018-08-27, Repl 2018-09-10 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 17 | ||||||||||||||||
| Description |
|
For secondary batch application, the ApplyBatchFinalizer is used to advance optimes after application of an oplog batch completes. We currently use the ApplyBatchFinalizerForJournal for durable storage engines and the ApplyBatchFinalizer for non-durable storage engines, which only updates the lastApplied optime. On primaries, for non-durable storage engines, the replication system keeps the lastDurable optime up to date wth the lastApplied optime, since the lastDurable optime has no functional meaning on a non-durable storage engine. It seems we should be keeping this behavior consistent between primaries and secondaries, so we should update the lastDurable optime on batch application on non-durable storage engines. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 29/Aug/18 ] |
|
Spent some time investigating the current behavior here and it's a bit interesting. The following describes the behavior of a single inserts with various configurations and writeConcerns specified Primary inMemory, Secondary WT, writeConcernMajorityJournalDefault: true
#2 here I believe can be explained by this line, which waits for durability and then unconditionally sets lastOpDurable to the lastApplied. This seems like problematic behavior however since j wasn't specified and writeConcernMajorityJournalDefault is true I'd expect this to error in some way.
All 4 cases behave the same. It's a bit surprising that #4 still errors even though j:false is specified.
No surprises here
No surprises here. |
| Comment by Tess Avitabile (Inactive) [ 03/Jul/18 ] |
|
We should investigate whether having an in-memory primary node keep lastDurable up to date with lastApplied causes us to incorrectly confirm majority (durable) writes. |
| Comment by William Schultz (Inactive) [ 25/Jun/18 ] |
|
milkie I came across this behavior when diagnosing a build failure that occurred specifically on the ephemeralForTest storage engine. What I observed was that the lastDurable optime on a secondary was not advancing during normal steady state replication, but an update to it was later triggered by another (internal) write happening in the system; in this case it was the writing of our "last vote" document to storage. This then seemed to cause the durable optime to advance, and because of this, we triggered an updatePosition request to our sync source, which ended up interfering with other commands in an unintended way ( Maybe the existing behavior is acceptable, but I suppose we should at least decide what we want the behavior to be, since it certainly appears that we try to keep lastDurable optimes up to date with lastApplied optimes on the primary. |
| Comment by Eric Milkie [ 25/Jun/18 ] |
|
I'm not sure we should be making this change unless there's an advantage to making it. I suspect it won't be trivial to change this behavior, and it will change the use of writeConcernMajorityJournalDefault parameter, since you would no longer need to change it when setting up a replica set with nondurable nodes. |