[SERVER-23977] Failure of MapViewOfFileEx in MemoryMappedFile::remapPrivateView with "errno:487 Attempt to access invalid address" Created: 28/Apr/16  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
Related
is related to SERVER-20923 MemoryMappedFile::remapPrivateView fa... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Sprint: Platforms 14 (05/13/16), Platforms 15 (06/03/16)
Participants:
Case:
Linked BF Score: 15

 Description   
Issue Status as of Dec 22, 2016

ISSUE SUMMARY
MMAPv1 on Windows has a known race condition during flushing that is reported to the log file as:

[durability] MapViewOfFileEx for ... failed with error errno:487 Attempt to access invalid address. (file size is ...) in MemoryMappedFile::remapPrivateView
[durability] Fatal Assertion 16148

USER IMPACT
This race condition exists in Windows version of MongoDB's MMap storage engine, and prevents MongoDB from properly managing memory mapped files. MongoDB will shutdown automatically if it detects this result.

WORKAROUNDS
This race condition does not affect storage engines other than MMAPv1. Therefore, to resolve this issue, we recommend transitioning to WiredTiger. Alternatively, since this issue only affects Windows machines, using another operating system would resolve the issue.

TECHNICAL DETAILS
The MMAPv1 (Memory Mapped) Storage Engine relies on the operating system's memory management system to manage the caching of the database file in 4kb chunks (i.e. pages) in MongoDB's memory. On Linux, this is implemented with the mmap system call. The operating system automatically reads in pages into memory when accessed, and evicts pages from memory when the system runs low on physical memory.

One of the features of the MMAPv1 storage engine is the journal. The journal is a sequentially written file used to provide write-ahead-logging (WAL). The journal is used to support recovery of commits in the case of crash in an efficient way.

In order to support recovery, for each commit made to the database, a description of the change is written to the journal first, next the journal is written to disk, then the database pages are written in memory, and finally the user receives a notification their change is complete. Please note the exact point at which the notification is received depends on write concern.

In order to ensure the journal is written to disk before the database pages are written to disk, a private copy of each database file is made using the Copy-On-Write (COW) feature of the operating system's memory manager. Modifications are written to this private copy (which is never written to disk). This is necessary since the memory mapped file can be written by the OS at any time due to page eviction. Later, after changes have been written to the journal, the changes to private copy are copied to the file-backed copy.

Now, after the pages are copied, the private copy, and the file-backed copy will have the same contents. At this point, MongoDB needs to instruct OS that these two copies are the same, and it can safely discard the copies of the file-backed pages it made.

To tell the OS these pages are the same, the storage engine calls x' = mmap(x) on the private copy, and requires that the mmap function call return the same address back for the private copy (i.e. "x` == x). On Linux, the mmap function has this behavior.

On Windows, there is no direct equivalent of the mmap function with the required behavior. The solution is a two step operation on Windows via UnmapViewOfFile, and MapViewOfFileEx. Because it is a two step process, there is a window for other components in the MongoD process to make calls into the Windows memory management system, and prevent MapViewOfFileEx from being able to use the same address again. Examples of this are new thread creation, Windows Heap allocation, TCMalloc allocation, and Mozilla Javascript memory allocation. If MapViewOfFileEx fails to map the private copy at the requested address, MongoDB terminates with the following fassert:

[durability] MapViewOfFileEx for ... failed with error errno:487 Attempt to access invalid address. (file size is ...) in MemoryMappedFile::remapPrivateView
[durability] Fatal Assertion 16148

Unfortunately, there is no known solution to this issue other than workarounds described above.

Original description

On Windows, in MemoryMappedFile::remapPrivateView, if the virtual address occupied by the private view is claimed by another thread between the time the private view is unmapped and the time that it is mapped again, the MapViewOfFileEx call will fail with an "invalid address" error, and mongod will terminate. Following messages are diagnostic of this problem:

I CONTROL  [durability] MapViewOfFileEx for ... failed with error errno:487 Attempt to access invalid address. (file size is ...) in MemoryMappedFile::remapPrivateView
I -        [durability] Fatal Assertion 16148



 Comments   
Comment by Eric Milkie [ 27/Apr/17 ]

Cleaning up memory is not an option because the problem resides in the memory region layout dictated by the operating system, not because there is an excess of memory allocated.
A primary already implicitly steps down if it shuts down due to this problem. Leaving the process up but not responding to read or write requests would not be an improvement over simply shutting the process down.

Comment by Paul Reed [ 27/Apr/17 ]

Does this issue need to effect a shutdown. Could it not just clean up memory and recover itself. At the very least, could it not stepdown ( if primary ) and allow replicasets to continue nicely ?

Comment by Guillaume Guerra [X] [ 10/Apr/17 ]

Hi,

Same here, and it seems the frequency of the issue is increasing.
Any plan for a hotfix ? or at least a fix in a new Mongo release ?

Comment by Joe Enzminger [ 10/Oct/16 ]

This is affecting our production servers as well. Since it is a crash issue, what are the chances of getting a hotfix rather than having to wait/upgrade to a new version altogether?

Comment by Paul Reed [ 27/Sep/16 ]

I am getting this issue most days now, which causes my a minor minute offline. Not an issue at the moment, but it will be.

Any news on a fix for this ? Can get dumpfiles any time you like.

Comment by Stephen JANNIN [ 26/Jul/16 ]

We had this issue 3 times in production. Is this specific to Windows 2012 R2 ? Servers in 2008R2 does not seem to be affected.

2016-07-26T04:07:36.117+0200 I CONTROL [durability] MapViewOfFileEx for D:/a/b/c/d/e/f/g_PROD_24.75 failed with error errno:487 Attempt to access invalid address. (file size is 2146435072) in MemoryMappedFile::remapPrivateView
2016-07-26T04:07:36.852+0200 I - [durability] Fatal Assertion 16148

Comment by Ramon Fernandez Marina [ 08/Jul/16 ]

paul.reed, unfortunately we don't have an estimate for a fix to this issue. The "3.3 Desired" fixVersion indicates we'd like to address this ticket before the 3.4 release, scheduled for Q4 2016. We'll post any updates to this ticket.

Regards,
Ramón.

Comment by Paul Reed [ 08/Jul/16 ]

How long away is the fix for this. I am seeing this issue downing my servers a number of times a week now. It is only a matter of time until 2 servers get hit at the same time and cause issue's for us !

Comment by Andrew Morrow (Inactive) [ 15/Jun/16 ]

paul.reed - I don't have any insight into SERVER-19043 beyond what is written in that ticket, unfortunately.

Comment by Paul Reed [ 15/Jun/16 ]

Just as soon as you can implement database meta reproduction / folder copy. I would gladly migrate.
As it stands - I cannot - any news on a fix ?

https://jira.mongodb.org/browse/SERVER-19043?filter=-2

Comment by Andrew Morrow (Inactive) [ 15/Jun/16 ]

paul.reed - Unfortunately there is not a mitigation for this issue with the MMAPv1 storage engine at this time. Copying files around will not have any effect, as the root cause is connected to details of the memory management subsystem on Windows. If it is possible for you to do so, migrating to the WiredTiger storage engine would be a permanent solution, as that storage engine is not affected by this issue.

Comment by Paul Reed [ 08/Jun/16 ]

Is there a work around.
Say like copying the database into a new file structure - removing the old, and renaming the new db back to the old name.

Would love a quick win here if possible.

Comment by Andrew Morrow (Inactive) [ 08/Jun/16 ]

Hi paul.reed - Thank you for letting us know that you have encountered this issue, and especially for offering to post a mini-dump. However, I don't think you need to. We do understand the root cause of the issue, and we are evaluating potential solutions.

Comment by Paul Reed [ 08/Jun/16 ]

I have this same issue. I have seen it a few times on our replica sets:


2016-06-08T14:33:54.603+0100 I CONTROL [durability] MapViewOfFileEx for database_path_and_file.1 failed with error errno:487 Attempt to access invalid address. (file size is 134217728) in MemoryMappedFile::remapPrivateView
2016-06-08T14:33:54.603+0100 I - [durability] Fatal Assertion 16148
2016-06-08T14:33:54.603+0100 I - [durability]

***aborting after fassert() failure

2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\util\stacktrace_windows.cpp(174) mongo::printStackTrace+0x43
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\util\signal_handlers_synchronous.cpp(182) mongo::`anonymous namespace'::printSignalAndBacktrace+0x73
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\util\signal_handlers_synchronous.cpp(238) mongo::`anonymous namespace'::abruptQuit+0x74
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe f:\dd\vctools\crt\crtw32\misc\winsig.c(587) raise+0x1e9
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe f:\dd\vctools\crt\crtw32\misc\abort.c(82) abort+0x18
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\util\assert_util.cpp(173) mongo::fassertFailed+0xde
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\db\storage\mmap_v1\mmap_windows.cpp(424) mongo::MemoryMappedFile::remapPrivateView+0x37c
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\db\storage\mmap_v1\durable_mapped_file.cpp(75) mongo::DurableMappedFile::remapThePrivateView+0x27
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\db\storage\mmap_v1\dur.cpp(387) mongo::dur::`anonymous namespace'::remapPrivateViewImpl+0x222
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\db\storage\mmap_v1\dur.cpp(633) mongo::dur::remapPrivateView+0x64
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe ...\src\mongo\db\storage\mmap_v1\dur.cpp(823) mongo::dur::durThread+0x8e4
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe c:\program files (x86)\microsoft visual studio 12.0\vc\include\thr\xthread(187) std::LaunchPad<std::_Bind<1,void,void (_cdecl*const)(void)> >::_Go+0x11
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe f:\dd\vctools\crt\crtw32\stdcpp\thr\threadcall.cpp(28) _Call_func+0x14
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe f:\dd\vctools\crt\crtw32\startup\threadex.c(376) _callthreadstartex+0x17
2016-06-08T14:33:56.299+0100 I CONTROL [durability] mongod.exe f:\dd\vctools\crt\crtw32\startup\threadex.c(354) _threadstartex+0x102
2016-06-08T14:33:56.299+0100 I CONTROL [durability] KERNEL32.DLL BaseThreadInitThunk+0x22
2016-06-08T14:33:56.299+0100 F - [durability] Got signal: 22 (SIGABRT).
2016-06-08T14:33:56.301+0100 I CONTROL [durability] writing minidump diagnostic file bin_path\mongod.2016-06-08T13-33-56.mdmp

Would you like and how can I send you the mini-dump ?

Generated at Thu Feb 08 04:05:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.