[SERVER-48149] Move callers of waitUntilDurable onto JournalFlusher::waitForJournalFlush Created: 12/May/20 Updated: 29/Oct/23 Resolved: 28/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.4.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||||||
| Sprint: | Execution Team 2020-06-15, Execution Team 2020-06-29, Execution Team 2020-09-07 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 7 | ||||||||||||||||||||||||||||
| Description |
|
There have been issues where threads running concurrently with stepdown call waitUntilDurable independently. Stepdown changes the behavior of waitUntilDurable to stop doing writes to the oplogTruncateAfterPoint document and then clears the oplogTruncateAfterPoint timestamp. It does this with careful interruption of the JournalFlusher thread that does async waitUntilDurable calls. However, operations running concurrently with stepdown sometimes require durability and call waitUntilDurable directly: these operations are not carefully interrupted by stepdown prior to stepdown clearing the oplogTruncateAfterPoint timestamp. Consequently, the oplogTruncateAfterPoint can remain set after stepdown, which it should not be. -------------------- waitUntilUnjournaledWritesDurable and flushAllFiles are callers of waitUntilDurable, but cannot be moved onto the JournalFlusher thread because they provide parameter settings that the JournalFlusher does not. The interface to using these two functions should be made very explicit about the risk of running concurrently with stepdown. Today, I do not believe there are any callers that can run concurrently with stepdown. -------------------- This will actually be a bit tricky because we will probably have to make sure that new JournalFlusher::waitForJournalFlush callers can retry if interrupted by stepdown (or whatever interrupts the JournalFlusher thread). |
| Comments |
| Comment by Githook User [ 23/Feb/21 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: Operations running concurrently with stepdown must call JournalFlusher::waitForJournalFlush so that (cherry picked from commit 17457592a2a1b64ed4ac90c93b32aa47598d5c90) |
| Comment by Githook User [ 23/Feb/21 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: (cherry picked from commit e88e77476f27a529fa596dd72189e03a52962d2d) |
| Comment by Githook User [ 28/Aug/20 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: Operations running concurrently with stepdown must call JournalFlusher::waitForJournalFlush so that |
| Comment by Githook User [ 28/Aug/20 ] |
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 28/Aug/20 ] |
|
Code review url: https://mongodbcr.appspot.com/653490015/ (enterprise) |