[SERVER-7378] Windows Azure logs errors and retries if WRITETODATAFILES and FlushViewOfFile run at the same time Created: 17/Oct/12  Updated: 11/Jul/16  Resolved: 23/Oct/12

Status: Closed
Project: Core Server
Component/s: Internal Code, Performance, Stability, Storage
Affects Version/s: None
Fix Version/s: 2.2.2, 2.3.1

Type: Bug Priority: Major - P3
Reporter: Tad Marshall Assignee: Tad Marshall
Resolution: Done Votes: 2
Labels: Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows running on Windows Azure


Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: Windows
Participants:

 Description   

Windows Azure Storage uses an MD5 hash to verify that it has correctly written data to disk. When mongod.exe modifies the buffer while it is being flushed to disk, this makes the MD5 comparison fail, and Azure Storage retries the operation. If it fails 4 times, Windows fails the call to FlushViewOfFile with error code 1 (ERROR_INVALID_FUNCTION), causing mongod.exe to issue Fatal Assertion 16387, with immediate exit.

There is no requirement internally for WRITETODATAFILES to be allowed to run while FlushViewOfFile is running, and it could be argued that this is not the right thing to do. Preliminary testing shows that adding a mutex between these two threads prevents the Windows Azure Storage retries. This will probably improve worst-case performance (by avoiding the retries) and will prevent Fatal Assertions caused by excessive retries.



 Comments   
Comment by Githook User [ 29/Jul/14 ]

Author:

{u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}

Message: SERVER-13681 Revert "SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time"

This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006
Branch: master
https://github.com/mongodb/mongo/commit/d7637a71f401f2457db3caca8c8a03d9b46d1ed0

Comment by Githook User [ 29/Jul/14 ]

Author:

{u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}

Message: Revert "SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time"

This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006
Branch: v2.6
https://github.com/mongodb/mongo/commit/2ec547c158e1bd7e0339288e0b7ed33ba46e58f6

Comment by auto [ 06/Nov/12 ]

Author:

{u'date': u'2012-10-17T09:11:52Z', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}

Message: SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time

Prevent errors on Azure Storage drives that occur when a memory-mapped
data file is modified in memory while it is being flushed to disk with
FlushViewOfFile. Use a SimpleMutex (Critical Section on Windows) to
prevent these two routines from running at the same time.
Branch: v2.2
https://github.com/mongodb/mongo/commit/f43972dd586475132045bb11a96baac168f5b8cd

Comment by Daniel Pasette (Inactive) [ 05/Nov/12 ]

This change will likely be backported to v2.2.2

Comment by Tejas Vora [ 30/Oct/12 ]

We are not sure whether this change has been backported into 2.2.1 version or not yet - if it is not that difficult to backport and not too much of an effort on your side (in terms of testing and stability)then we would greatly appreciate if you can merge the changes into 2.2.1 or 2.2.2 version.

The reason is, we are currently using 2.1 development version on our Azure production deployment. We would like to move to 2.2 stable production version in Azure - but 2.2 version is exhibiting this issue while running in Azure - so it would be little difficult for us to take 2.2 version and deploy it in Azure.

Also, we cannot wait till 2.3 version comes out with this changes. Given the situation, we would like to pick-up 2.2.X version with this change. It would be really helpful.

Comment by auto [ 23/Oct/12 ]

Author:

{u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}

Message: SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time

Prevent errors on Azure Storage drives that occur when a memory-mapped
data file is modified in memory while it is being flushed to disk with
FlushViewOfFile. Use a SimpleMutex (Critical Section on Windows) to
prevent these two routines from running at the same time.
Branch: master
https://github.com/mongodb/mongo/commit/5352345eda2ffa24a72af1d5bbaccaa32bcb1006

Comment by auto [ 23/Oct/12 ]

Author:

{u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}

Message: SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time

Prevent errors on Azure Storage drives that occur when a memory-mapped
data file is modified in memory while it is being flushed to disk with
FlushViewOfFile. Use a SimpleMutex (Critical Section on Windows) to
prevent these two routines from running at the same time.
Branch: master
https://github.com/mongodb/mongo/commit/5352345eda2ffa24a72af1d5bbaccaa32bcb1006

Comment by auto [ 23/Oct/12 ]

Author:

{u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}

Message: SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time

Prevent errors on Azure Storage drives that occur when a memory-mapped
data file is modified in memory while it is being flushed to disk with
FlushViewOfFile. Use a SimpleMutex (Critical Section on Windows) to
prevent these two routines from running at the same time.
Branch: master
https://github.com/mongodb/mongo/commit/5352345eda2ffa24a72af1d5bbaccaa32bcb1006

Comment by Eric Milkie [ 17/Oct/12 ]

The code change will affect the behavior of all Windows instances, not just Azure, but there is no easy way to avoid this and it sounds like it may be beneficial anyway.

Comment by Tad Marshall [ 17/Oct/12 ]

milkie The MD5 test appears to be unique to Windows Azure, but the general issue of whether separating these functions in time would be good is still there. The proposed code change is very specific to the issue on Azure. I may revisit the larger topic in another ticket after doing some research.

Comment by Eric Milkie [ 17/Oct/12 ]

Is this behavior specific to Azure, or does it happen on any Windows version as well?

Generated at Thu Feb 08 03:14:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.