[SERVER-41613] FSyncLockThread::run() should not hold a lock while calling waitUntilDurable() Created: 10/Jun/19  Updated: 29/Oct/23  Resolved: 27/Sep/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: execution_intern_2019
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39591 RecoveryUnit::waitUntilDurable() shou... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2019-09-09, Execution Team 2019-09-23, Execution Team 2019-10-07
Participants:

 Description   

The GlobalRead lock is acquired in fsync.

However, we only call waitUntilDurable() if _allowFsyncFailure is set. _allowFsyncFailure is only set if getTestCommandsEnabled() and the "allowFsyncFailure" flag was set on the cmd request, which was added to make fsync+lock block writes against the server regardless of fsync success. That was added in this commit for the backup work.



 Comments   
Comment by Githook User [ 27/Sep/19 ]

Author:

{'name': 'Daniel Gottlieb', 'username': 'dgottlieb', 'email': 'daniel.gottlieb@mongodb.com'}

Message: SERVER-41613: Remove superfluous durability call in fsync with

{allowFsyncFailure: true}

.
Branch: master
https://github.com/mongodb/mongo/commit/f268c776da964de7cb15b09ff38f1b90bbce4c85

Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ]

This ticket was spun off of SERVER-39591, where we want to invariant against locks being held across waitUntilDurable() calls: I/O can take relatively a long time, and we do not want to block other operations for a long time, potentially blocking other operations. I would argue that waitUntilDurable() it much more common than flushAllFiles(), and adding the invariant would safeguard that code path.

Looking at the code, it seems like WiredTigerSessionCache has a shuttingDown() function wherein we will wait until there are no more callers, just spinning until that is the case before returning. waitUntilDurable() makes sure that WiredTigerSessionCache knows it's running (registers itself) with this check. Most callers of waitUntilDurable() don't hold locks today, I believe: there were only 3 places that SERVER-39591 ran into trouble, fsync being one of them.

Comment by Daniel Gottlieb (Inactive) [ 11/Sep/19 ]

Maybe I'm missing something. If code calls waitUntilDurable without locks, what's protecting the storage engine from being closed?

Additionally, there are two other spots in fsync that call WiredTigerSessionCache::waitUntilDurable: here and here. It just happens that those bypass RecoveryUnit::waitUntilDurable. Where RecoveryUnit::waitUntilDurable just forwards to the session cache implementation.

Comment by Eric Milkie [ 28/Jun/19 ]

After some investigation, I think it's ok to simply remove the waitUntilDurable() call. The only jstest that is using the allowFsyncFailure flag is only calling fsyncLock to get the lock behavior, not the fsync behavior, and we already called flushAllFiles() in that code path anyway.

Comment by Dianna Hohensee (Inactive) [ 14/Jun/19 ]

Done.

Comment by Eric Milkie [ 14/Jun/19 ]

We should link here what the code location is that is locking and calling waitUntilDurable().

Generated at Thu Feb 08 04:58:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.