[SERVER-7378] Windows Azure logs errors and retries if WRITETODATAFILES and FlushViewOfFile run at the same time Created: 17/Oct/12 Updated: 11/Jul/16 Resolved: 23/Oct/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code, Performance, Stability, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.2, 2.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Tad Marshall | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 2 |
| Labels: | Windows | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows running on Windows Azure |
||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | Windows | ||||
| Participants: | |||||
| Description |
|
Windows Azure Storage uses an MD5 hash to verify that it has correctly written data to disk. When mongod.exe modifies the buffer while it is being flushed to disk, this makes the MD5 comparison fail, and Azure Storage retries the operation. If it fails 4 times, Windows fails the call to FlushViewOfFile with error code 1 (ERROR_INVALID_FUNCTION), causing mongod.exe to issue Fatal Assertion 16387, with immediate exit. There is no requirement internally for WRITETODATAFILES to be allowed to run while FlushViewOfFile is running, and it could be argued that this is not the right thing to do. Preliminary testing shows that adding a mutex between these two threads prevents the Windows Azure Storage retries. This will probably improve worst-case performance (by avoiding the retries) and will prevent Fatal Assertions caused by excessive retries. |
| Comments |
| Comment by Githook User [ 29/Jul/14 ] |
|
Author: {u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}Message: This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006 |
| Comment by Githook User [ 29/Jul/14 ] |
|
Author: {u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}Message: Revert " This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006 |
| Comment by auto [ 06/Nov/12 ] |
|
Author: {u'date': u'2012-10-17T09:11:52Z', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: Prevent errors on Azure Storage drives that occur when a memory-mapped |
| Comment by Daniel Pasette (Inactive) [ 05/Nov/12 ] |
|
This change will likely be backported to v2.2.2 |
| Comment by Tejas Vora [ 30/Oct/12 ] |
|
We are not sure whether this change has been backported into 2.2.1 version or not yet - if it is not that difficult to backport and not too much of an effort on your side (in terms of testing and stability)then we would greatly appreciate if you can merge the changes into 2.2.1 or 2.2.2 version. The reason is, we are currently using 2.1 development version on our Azure production deployment. We would like to move to 2.2 stable production version in Azure - but 2.2 version is exhibiting this issue while running in Azure - so it would be little difficult for us to take 2.2 version and deploy it in Azure. Also, we cannot wait till 2.3 version comes out with this changes. Given the situation, we would like to pick-up 2.2.X version with this change. It would be really helpful. |
| Comment by auto [ 23/Oct/12 ] |
|
Author: {u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: Prevent errors on Azure Storage drives that occur when a memory-mapped |
| Comment by auto [ 23/Oct/12 ] |
|
Author: {u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: Prevent errors on Azure Storage drives that occur when a memory-mapped |
| Comment by auto [ 23/Oct/12 ] |
|
Author: {u'date': u'2012-10-17T02:11:52-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: Prevent errors on Azure Storage drives that occur when a memory-mapped |
| Comment by Eric Milkie [ 17/Oct/12 ] |
|
The code change will affect the behavior of all Windows instances, not just Azure, but there is no easy way to avoid this and it sounds like it may be beneficial anyway. |
| Comment by Tad Marshall [ 17/Oct/12 ] |
|
milkie The MD5 test appears to be unique to Windows Azure, but the general issue of whether separating these functions in time would be good is still there. The proposed code change is very specific to the issue on Azure. I may revisit the larger topic in another ticket after doing some research. |
| Comment by Eric Milkie [ 17/Oct/12 ] |
|
Is this behavior specific to Azure, or does it happen on any Windows version as well? |