[SERVER-48149] Move callers of waitUntilDurable onto JournalFlusher::waitForJournalFlush Created: 12/May/20  Updated: 29/Oct/23  Resolved: 28/Aug/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.5

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-46826 Instantiate the JournalFlusher thread... Closed
is depended on by SERVER-47898 Advancing lastDurable irrespective of... Closed
Related
related to SERVER-79810 make JournalFlusher::waitForJournalFl... Closed
related to SERVER-79809 remove unused functions from StorageC... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Execution Team 2020-06-15, Execution Team 2020-06-29, Execution Team 2020-09-07
Participants:
Linked BF Score: 7

 Description   

There have been issues where threads running concurrently with stepdown call waitUntilDurable independently.

Stepdown changes the behavior of waitUntilDurable to stop doing writes to the oplogTruncateAfterPoint document and then clears the oplogTruncateAfterPoint timestamp. It does this with careful interruption of the JournalFlusher thread that does async waitUntilDurable calls. However, operations running concurrently with stepdown sometimes require durability and call waitUntilDurable directly: these operations are not carefully interrupted by stepdown prior to stepdown clearing the oplogTruncateAfterPoint timestamp. Consequently, the oplogTruncateAfterPoint can remain set after stepdown, which it should not be.

--------------------

waitUntilUnjournaledWritesDurable and flushAllFiles are callers of waitUntilDurable, but cannot be moved onto the JournalFlusher thread because they provide parameter settings that the JournalFlusher does not. The interface to using these two functions should be made very explicit about the risk of running concurrently with stepdown. Today, I do not believe there are any callers that can run concurrently with stepdown.

--------------------

This will actually be a bit tricky because we will probably have to make sure that new JournalFlusher::waitForJournalFlush callers can retry if interrupted by stepdown (or whatever interrupts the JournalFlusher thread).



 Comments   
Comment by Githook User [ 23/Feb/21 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-48149 Move callers of RecoveryUnit::waitUntilDurable onto JournalFlusher::waitForJournalFlush

Operations running concurrently with stepdown must call JournalFlusher::waitForJournalFlush so that
writes to the oplogTruncateAfterPoint are interrupted correctly during stepdown and callers waiting
for durability don't receive unexpected InterruptedDueToReplStateChange errors.

(cherry picked from commit 17457592a2a1b64ed4ac90c93b32aa47598d5c90)
Branch: v4.4
https://github.com/mongodb/mongo/commit/21cbcf04d8a0bd509658433f8f2d0f54d8e42f3b

Comment by Githook User [ 23/Feb/21 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-48149 Move callers of RecoveryUnit::waitUntilDurable onto JournalFlusher::waitForJournalFlush

(cherry picked from commit e88e77476f27a529fa596dd72189e03a52962d2d)
Branch: v4.4
https://github.com/10gen/mongo-enterprise-modules/commit/252b3f0f5e246b83e00eeb9a6afa132e8001f967

Comment by Githook User [ 28/Aug/20 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-48149 Move callers of RecoveryUnit::waitUntilDurable onto JournalFlusher::waitForJournalFlush

Operations running concurrently with stepdown must call JournalFlusher::waitForJournalFlush so that
writes to the oplogTruncateAfterPoint are interrupted correctly during stepdown and callers waiting
for durability don't receive unexpected InterruptedDueToReplStateChange errors.
Branch: master
https://github.com/mongodb/mongo/commit/17457592a2a1b64ed4ac90c93b32aa47598d5c90

Comment by Githook User [ 28/Aug/20 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-48149 SERVER-48149 Move callers of RecoveryUnit::waitUntilDurable onto JournalFlusher::waitForJournalFlush
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/e88e77476f27a529fa596dd72189e03a52962d2d

Comment by Dianna Hohensee (Inactive) [ 28/Aug/20 ]

Code review url: https://mongodbcr.appspot.com/653490015/ (enterprise)

Generated at Thu Feb 08 05:16:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.