[SERVER-39591] RecoveryUnit::waitUntilDurable() should invariant against callers holding locks Created: 14/Feb/19 Updated: 29/Oct/23 Resolved: 18/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Storage NYC 2019-05-20, Execution Team 2019-11-18 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
MMAP's implementation had an invariant, but the other storage engine implementations appear not to have any. Locks should not be held while waitUntilDurable is called for two reasons: 1) it can take a long time to make data durable, so a lock should not be held that long for I/O; 2) we could run out of tickets and the lock that the caller has could stop the rest of the system from freeing up enough for he durability task to get a ticket, thus a ticketing deadlock. This requires fixing the failpoint in index builds that calls waitUntilDurable() with locks, curtesy of |
| Comments |
| Comment by Githook User [ 18/Nov/19 ] |
|
Author: {'username': 'DiannaHohensee', 'email': 'dianna.hohensee@mongodb.com', 'name': 'Dianna Hohensee'}Message: |
| Comment by Evgeni Dobranov [ 09/Oct/19 ] |
|
We're closing outÂ
The full reasoning can be found in this comment under the original ticket. |
| Comment by Dianna Hohensee (Inactive) [ 20/May/19 ] |
|
So we cannot just delete the waitUntilDurable() calls from setInitialSyncFlag and clearInitialSyncFlag. If we end initial sync calling clearInitialSyncFlag, and do not wait for durability, then the server can restart and fall back into initial syncing. This would be bad because initial sync is not resumable: it will start over completely. We would have to do some work to explore/try elevating the waitUntilDurable() calls to a higher level without locks. The waitUntilDurable() calls under a lock do no harm where they're currently called, I'd imagine, because only initial sync is running on the server, so nothing else is blocked. But it does block us from invariant'ing against future misuse of the waitUntilDurable() function. |
| Comment by Eric Milkie [ 17/May/19 ] |
|
I wonder if either of those locations needs that call. Perhaps we can simply remove the waitUntilDurable() calls there. |
| Comment by Dianna Hohensee (Inactive) [ 17/May/19 ] |
|
ReplicationConsistencyMarkersImpl::setInitialSyncFlag is called during initial sync, immediately after doing a write to a document. This seems hard to eradicate. |
| Comment by Dianna Hohensee (Inactive) [ 17/May/19 ] |
|
repairDatabasesAndCheckVersion holds a Global X lock throughout. StorageRepairObserver::_invalidateReplConfigIfNeeded does a document write right before calling waitUntilDurable(). |
| Comment by Eric Milkie [ 17/May/19 ] |
|
repairDatabasesAndCheckVersion doesn't need to hold any locks while it calls waitUntilDurable. |
| Comment by Githook User [ 27/Feb/19 ] |
|
Author: {'name': 'Dianna Hohensee', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}Message: |