[SERVER-13879] Yield Fsync Lock with tunable frequency Created: 08/May/14  Updated: 06/Dec/22  Resolved: 11/Jul/16

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kevin J. Rice Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Participants:

 Description   

This case covers changing the fsync lock from being a single monolithic lock of an entire shard's data at once, to a system where, say, 10% of the memory is sync'd to disk, then the next 10%, and so on. in between each set of 10%, the write lock could yield for some duration.

Background: MongoDB has a setting for how often dirty pages are sync'd to disk. This is a command line setting --syncdelay, which defaults to 60 seconds. A typical write-heavy use case generates performance characteristics of a flat line of no disk writes for 60 seconds, then a write lock that lasts n seconds until the dirty pages of memory are flushed out to disk, then flat again for the next (60 - n) seconds.

This kind of performance is very choppy. Since the write lock is global, this interferes with and thus degrades common write performance characteristics.

SUGGESTION:

Mongod issues the fsync command to the vmm system. This command specifies, 'do an fsync on this entire file.' Instead of doing this, divide the file in half and say, "Fsync the first half of the file". Once that fsync is done, yield the write lock for a configurable amount of time, or perhaps until the queued writes are cleared with a max of, say several seconds, and then repeat this process with the second half of the file.

This feature would have complications of record relocation from one area of memory to another. However, on average this wouldn't happen that often, so it could just be ignored and the relocated record would be sync'd the next time through. The only inimical cases would be a record / document that kept bouncing around from place to place in the file and not getting sync'd for a long time period.

If I am profoundly incorrect in the way MongoDB works and interacts with the VMM, this ticket may be classified as impossible. However, if it is possible, it might create significantly reduced locking percentages and much greater performance consistency.



 Comments   
Comment by Ian Whalen (Inactive) [ 11/Jul/16 ]

Thanks for the request, but this is no longer as relevant for the WiredTiger storage engine and we do not plan on doing this work for the MMAPv1 storage engine, so we are resolving as Won't Fix.

Comment by Kaloian Manassiev [ 15/May/14 ]

Hi Kevin,

We can certainly do better at optimizing the flush behaviour and what you are suggesting is not impossible at all, but will require significant changes to our storage engine - in particular to how journal truncation and memory re-mapping are implemented.

I am putting this ticket into a planning bucket pending design and resources.

Thanks for your input.

-Kal.

Generated at Thu Feb 08 03:33:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.