[SERVER-12401] Improve the memory-mapped files flush on Windows Created: 17/Jan/14  Updated: 01/May/18  Resolved: 01/May/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Performance, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexander Komyagin Assignee: DO NOT USE - Backlog - Platform Team
Resolution: Won't Fix Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows, high-latency network storage


Issue Links:
Depends
Related
related to SERVER-13444 Long locked flush without inserts and... Closed
related to SERVER-14992 Query for Windows 7 File Allocation F... Closed
Sprint: Server 2.7.5, Server 2.7.6
Participants:

 Description   

On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.

The problem becomes critical on high-latency disk drives like Azure persistent storage (10ms). This behavior results in very long bg flush times, capping disk IOPS at 100. On low latency storage (local storage and AWS) the problem is not that visible.

In the code after the FlushViewOfFile returns, we call FlushFileBuffers to ensure that the drive has actually written the change to disk. It's easy to see that in Windows performance monitor disk queue never goes above 1 during the flush.

There are several possible improvements we can make to improve the situation on Windows:
1. Flush databases file in parallel to drive deeper disk queue lengths. Since each file flush is synchronous, the file flushes can be done in parallel. If a database has N files, this would mean O(N) flushes.
2. Go beyond #1, and subdivide the file into chunks, and flush each chunk in parallel, and across files. If we broke each file into M chunks, this would mean O(NM) flushes.

Option #2 delivers the best performance but requires Microsoft to deliver a OS patch to fix the behavior of Memory Mapped flush on Windows. While the performance of a preliminary patch looks promising, the ETA for released publicly available patch is approximately Q3 2014 calendar year.

Our proposal:

  1. Deliver a fix for option #1 in the 2.6.x timeframe. (SERVER-12733)
  2. When MS releases the fix, improve the behavior (SERVER-13329)


 Comments   
Comment by Matt Lord (Inactive) [ 01/May/18 ]

This only affects the mmap storage engine, which has been deprecated in MongoDB 3.7.

Comment by Mark Benvenuto [ 02/Sep/15 ]

This is not relevant to the WiredTiger storage engine since WiredTiger does not use memory mapped files. WiredTiger does explicit management of reads and writes to its page cache, and does not rely on the OS Memory Management layer to write changes to disk like MMapV1 does.

Comment by Christof Rudolph [ 02/Sep/15 ]

Is it possible that this issue is also relevant for WiredTiger setups?

Generated at Thu Feb 08 03:28:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.