Priority: Major - P3
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
This case covers changing the fsync lock from being a single monolithic lock of an entire shard's data at once, to a system where, say, 10% of the memory is sync'd to disk, then the next 10%, and so on. in between each set of 10%, the write lock could yield for some duration.
Background: MongoDB has a setting for how often dirty pages are sync'd to disk. This is a command line setting --syncdelay, which defaults to 60 seconds. A typical write-heavy use case generates performance characteristics of a flat line of no disk writes for 60 seconds, then a write lock that lasts n seconds until the dirty pages of memory are flushed out to disk, then flat again for the next (60 - n) seconds.
This kind of performance is very choppy. Since the write lock is global, this interferes with and thus degrades common write performance characteristics.
Mongod issues the fsync command to the vmm system. This command specifies, 'do an fsync on this entire file.' Instead of doing this, divide the file in half and say, "Fsync the first half of the file". Once that fsync is done, yield the write lock for a configurable amount of time, or perhaps until the queued writes are cleared with a max of, say several seconds, and then repeat this process with the second half of the file.
This feature would have complications of record relocation from one area of memory to another. However, on average this wouldn't happen that often, so it could just be ignored and the relocated record would be sync'd the next time through. The only inimical cases would be a record / document that kept bouncing around from place to place in the file and not getting sync'd for a long time period.
If I am profoundly incorrect in the way MongoDB works and interacts with the VMM, this ticket may be classified as impossible. However, if it is possible, it might create significantly reduced locking percentages and much greater performance consistency.