Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: MMAPv1, Performance, Storage
Labels:
None
Environment:
Windows, high-latency network storage

Sprint:
Server 2.7.5, Server 2.7.6
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

On Windows, Memory Mapped File flushes are synchronous operations. When the OS Virtual Memory Manager is asked to flush a memory mapped file, it makes a synchronous write request to the file cache manager in the OS. This causes large I/O stalls on Windows systems with high Disk IO latency, while on Linux the same writes are asynchronous.

The problem becomes critical on high-latency disk drives like Azure persistent storage (10ms). This behavior results in very long bg flush times, capping disk IOPS at 100. On low latency storage (local storage and AWS) the problem is not that visible.

In the code after the FlushViewOfFile returns, we call FlushFileBuffers to ensure that the drive has actually written the change to disk. It's easy to see that in Windows performance monitor disk queue never goes above 1 during the flush.

There are several possible improvements we can make to improve the situation on Windows:
1. Flush databases file in parallel to drive deeper disk queue lengths. Since each file flush is synchronous, the file flushes can be done in parallel. If a database has N files, this would mean O(N) flushes.
2. Go beyond #1, and subdivide the file into chunks, and flush each chunk in parallel, and across files. If we broke each file into M chunks, this would mean O(NM) flushes.

Option #2 delivers the best performance but requires Microsoft to deliver a OS patch to fix the behavior of Memory Mapped flush on Windows. While the performance of a preliminary patch looks promising, the ETA for released publicly available patch is approximately Q3 2014 calendar year.

Our proposal:

Deliver a fix for option #1 in the 2.6.x timeframe. (~~SERVER-12733~~)
When MS releases the fix, improve the behavior (~~SERVER-13329~~)

related to

SERVER-13444 Long locked flush without inserts and updates

Closed

SERVER-14992 Query for Windows 7 File Allocation Fix, and other hotfixes

Closed

Assignee:: DO NOT USE - Backlog - Platform Team
Reporter:: Alexander Komyagin (Inactive)
Participants:: Alexander Komyagin, Christof Rudolph, DO NOT USE - Backlog - Platform Team, Mark Benvenuto, Matt Lord
Votes:: 3 Vote for this issue
Watchers:: 20 Start watching this issue

Created:: Jan 17 2014 09:28:46 PM UTC
Updated:: May 01 2018 03:14:57 PM UTC
Resolved:: May 01 2018 02:37:41 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates