On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.
The problem becomes critical on high-latency disk drives like Azure persistent storage (10ms). This behavior results in very long bg flush times, capping disk IOPS at 100. On low latency storage (local storage and AWS) the problem is not that visible.
In the code after the FlushViewOfFile returns, we call FlushFileBuffers to ensure that the drive has actually written the change to disk. It's easy to see that in Windows performance monitor disk queue never goes above 1 during the flush.
There are several possible improvements we can make to improve the situation on Windows:
1. Flush databases file in parallel to drive deeper disk queue lengths. Since each file flush is synchronous, the file flushes can be done in parallel. If a database has N files, this would mean O(N) flushes.
2. Go beyond #1, and subdivide the file into chunks, and flush each chunk in parallel, and across files. If we broke each file into M chunks, this would mean O(NM) flushes.
Option #2 delivers the best performance but requires Microsoft to deliver a OS patch to fix the behavior of Memory Mapped flush on Windows. While the performance of a preliminary patch looks promising, the ETA for released publicly available patch is approximately Q3 2014 calendar year.