[SERVER-6477] jstests/dur/closeall.js -- WindowsFlushable::flush triggers fassert after FlushViewOfFile fails with code 487 (ERROR_INVALID_ADDRESS) Created: 17/Jul/12 Updated: 11/Jul/16 Resolved: 06/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 2.2.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aaron Staple | Assignee: | Tad Marshall |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | Windows |
| Participants: |
| Description |
|
This occurs in the context of a test that repeatedly closes databases. <http://buildlogs.mongodb.org/build/500477d7d2a60f1426000cf5/test/500495eed2a60f6d76000a43/> There is a known issue where the test in question might trigger a double exception (see |
| Comments |
| Comment by auto [ 06/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'date': u'2012-08-06T12:16:37-07:00', u'email': u'tad@10gen.com', u'name': u'Tad Marshall'}Message: In MongoFile::_flushAll(), don't pick a file to flush in a | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 06/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In all the cases I looked at, the failing call to WindowsFlushable::flush is always from line 185 in src/mongo/util/mmap.cpp. At this point in the code, we have selected a file to flush while locked, and then we unlock before calling the flush routine. This unlocking allows another thread to close the database before we get to call flush(). In fact, we are probably holding a pointer to freed memory at that instant, so a variety of segfaults or memory corruptions are possible, but in the normal case the memory is intact and we are just trying to flush a file to disk when the file is no longer open. The difference between Linux and Windows is that the viewForFlushing() routine that sets up the address to use for flushing is called inside the lock in Windows and is called after the lock is released in Linux. This means that a thread switch triggered by releasing the lock will be much more likely to throw the race condition in Windows than in Linux, but there is an opportunity for a race in either OS. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 23/Jul/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Happened again: http://buildbot.mongodb.org/builders/Windows%2064-bit%202008%2B/builds/504
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 23/Jul/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The code already tests for _view being zero, so that is not what is happening. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 20/Jul/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems like we're trying to flush the local database, but we didn't manage to open it due to "can't open database in a read lock. if db was just closed, consider retrying the query. might otherwise indicate an internal error". DOS error 487 is ERROR_INVALID_ADDRESS ... we should print the address to check, but I suspect it will be zero, meaning that we never mapped it in the first place. The old code (prior to the addition of the fassert) would have just logged this and moved on. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 20/Jul/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Happened to see it again: <http://buildlogs.mongodb.org/build/5008ccf2d2a60f13f90009c6/test/5008ea98d2a60f5ddc000f39/> | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 17/Jul/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This could potentially have been caused by the memory corruption in Here's another run of the same test reporting memory corruption. |