[SERVER-5244] core suite fails with "not enough storage" error - Windows 32 bit Created: 07/Mar/12  Updated: 11/Jul/16  Resolved: 01/May/12

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.1.1

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Nightly Windows 32-bit


Issue Links:
Duplicate
is duplicated by SERVER-5295 MapViewOfFile failed - Not enough sto... Closed
is duplicated by SERVER-6044 "not enough storage is available" for... Closed
Related
is related to SERVER-5287 Errors in MapViewOfFile() might retur... Closed
Operating System: Windows
Participants:

 Description   

Wed Mar 07 13:36:41 [conn906] CMD: drop test.jstests_pushall
Wed Mar 07 13:36:41 [conn906] MapViewOfFile failed /data/db/sconsTests/test.5 errno:8 Not enough storage is available to process this command. (32 bit build)
Wed Mar 07 13:36:41 TypeError: t.findOne() has no properties C:\10gen\buildslaves\mongo\Windows_32bit_Nightly\mongo\jstests\pushall.js:20
failed to load: C:\10gen\buildslaves\mongo\Windows_32bit_Nightly\mongo\jstests\pushall.js
Wed Mar 07 13:36:41 [conn906] end connection 127.0.0.1:59526 (0 connections now open)
\mongo.exe', '--port', '27999', 'C:\\10gen\\buildslaves\\mongo\\Windows_32bit_Nightly\\mongo\\jstests\\push2.js', '--eval', 'TestData = new Object();TestData.testPath = "C:\\\\10gen\\\\buildslaves\\\\mongo\\\\Windows_32bit_Nightly\\\\mongo\\\\jstests\\\\push2.js";TestData.testFile = "push2.js";TestData.testName = "push2";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;']
                11213.9999866ms

http://buildbot.mongodb.org/builders/Nightly%20Windows%2032-bit/builds/791/steps/test_1/logs/stdio



 Comments   
Comment by auto [ 30/Apr/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 restart mongod periodically during tests

The 32-bit Windows builder runs out of virtual address space
before it reaches the end of the js test suite. This change will
help it to complete successfully. Note that smalloplog suite,
which tests how well replication works after all the js tests have run,
is unaffected by this. The buildbot config will be changed such that
32-bit machines no longer run the small oplog suite.
Branch: master
https://github.com/mongodb/mongo/commit/7c50f4320f11865483541c8c092a0b482f5c51fc

Comment by Ian Whalen (Inactive) [ 23/Apr/12 ]

Appears that this problem is back:

http://buildbot.mongodb.org/builders/Nightly%20Windows%2032-bit/builds/838/steps/test_1/logs/stdio
and
http://buildbot.mongodb.org/builders/Nightly%20Windows%2032-bit/builds/834/steps/test_1/logs/stdio

Comment by Eric Milkie [ 29/Mar/12 ]

core suite is now passing. I just made sure that the tests with larger datasets didn't leave anything behind.

Comment by auto [ 28/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 clean up tests that leave a lot of data behind
Branch: master
https://github.com/mongodb/mongo/commit/77e7786e7aae1d6446ad5f9357ced5973ed0a6b6

Comment by auto [ 26/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 attempt to save db space (and thus virtual address space) by dropping large collection
Branch: master
https://github.com/mongodb/mongo/commit/d4c663405f3ea1577a146549e73f428f55840aa2

Comment by Eric Milkie [ 26/Mar/12 ]

I watched a run of smokeJS with VMMap. It fails when it attempts to map in a 5th database file for db "test":

03/26/2012  04:53 PM        16,777,216 test.0
03/26/2012  04:53 PM        33,554,432 test.1
03/26/2012  04:53 PM        67,108,864 test.2
03/26/2012  04:53 PM       268,435,456 test.3
03/26/2012  04:53 PM       268,435,456 test.4
03/26/2012  04:54 PM       536,608,768 test.5
03/26/2012  04:53 PM        16,777,216 test.ns
               7 File(s)  1,207,697,408 bytes

As can be seen above, just the 5th file alone consumes half a gig of virtual address space. Combined with the rest of the files, it's no wonder we are hitting this error on the 32-bit build. I don't know why this isn't problematic on Linux or OS X (perhaps we are getting close to running out?)

Should we drop the "test" database after each js script is run for smokeJS?
Also, why are test.3 and test.4 the same size? Something seems screwy.

Comment by Ian Whalen (Inactive) [ 26/Mar/12 ]

Nightly 32-bit build still failing on the MapViewOfFile issue: http://buildbot.mongodb.org/builders/Nightly%20Windows%2032-bit/builds/807/steps/test_1/logs/stdio

Comment by Eric Milkie [ 12/Mar/12 ]

I think all the collections are being dropped when we're done with them, but I could be mistaken. I think we're just getting hit by memory fragmentation. I've hacked up the tests enough now, to make all of the core tests pass again on 32-bit Windows.
I'll file another ticket to clean up error handling on Windows with MapViewOfFile().

Comment by auto [ 12/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 fix test for Windows (32 bit runs out of contiguous virtual address space)
Branch: master
https://github.com/mongodb/mongo/commit/e714bcaf6899bde1466c414c4494ceb18216c3a3

Comment by auto [ 10/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 even better construction of large string value
Branch: master
https://github.com/mongodb/mongo/commit/e7a2aa504e99c2a75fb28bc8e0c733cc23fdbfd5

Comment by Tad Marshall [ 09/Mar/12 ]

I like your idea of improving the handling of a MapViewOfFile() failure. It's hard to say exactly what the "best" improvement would be, but the current tactic of reporting the error and failing the immediate action is not good enough.

If an exception returned control to a place where things were cleaned up properly, we could keep running and a dropDatabase() on the database that triggered the exception might allow new database files to be mapped. I spotted three places where we use this API:

1) util/mmap_win.cpp, MemoryMappedFile::createReadOnlyMap(), line 71;
2) util/mmap_win.cpp, MemoryMappedFile::map(), line 144; – this is the one we're hitting (it says 32-bit build)
3) db/mongommf.cpp, MemoryMappedFile::createPrivateMap(), line 85

I think the third one is the one used for journaling.

We should definitely not be returning bad data ... it would be better to do a fatal shutdown than let that happen.

For the core suite, are we dropping collections when we are done with them, and is the space being reused by later tests? If collections are being dropped and the space is not being reused, is that another bug we need to look at? If they are not being dropped, maybe we should, otherwise we have inter-test dependencies and changing behavior when tests are added, removed, moved or renamed.

Comment by Eric Milkie [ 09/Mar/12 ]

We're further now, we make it to the u's before it runs out of storage.

Should we consider throwing an exception when MapViewOfFile() fails? Right now we just blindly continue after logging what happened. This makes me nervous. Note that the unit test that is now failing is actually failing due to unexpected data!! This could result in queries returning wrong answers.

Comment by auto [ 09/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 push bigger and fewer, to avoid Windows failure
Branch: master
https://github.com/mongodb/mongo/commit/b1586264fdbcbaa9ccef9291fefe93c5bf853dd9

Comment by Tad Marshall [ 09/Mar/12 ]

Assigning to Eric since he is working on it, reassign to me if you need to, thanks!

Comment by auto [ 09/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-5244 reduce memory fragmentation during object construction

This is in hopes of making all the core js tests pass on Windows 32-bit
Branch: master
https://github.com/mongodb/mongo/commit/0e06167978fe5e22f9b130f0e6483287369743a4

Comment by Tad Marshall [ 09/Mar/12 ]

Thinking about the problem some more, I suspect that what is happening is that we do not have enough contiguous virtual address space in the 32-bit mongod.exe process to map the next file in the test database set. The failure is on test.5, so we already have test.0 through test.4, but the push2.js test is trying to create a BSON object that is too large and so keeps creating bigger and bigger objects until it gets a failure. When this causes mongod.exe to need a new extent in a new file, it tries to create the file and map it and there isn't a block of contiguous address space big enough to hold the mapping.

The big issue with 32-bit processes is not simply "memory", but address space. Windows reserves the top half of the address space for the kernel so there is only 2 GB of address space available for user processes. But all of the DLLs used by a process and a whole bunch that may not even be used are mapped into the user's half of the address space, and they are not necessarily placed optimally. If you look at your address space with VMMap or vadump.exe you can see that they are scattered around, and all of our memory mapped files have to fit into whatever contiguous blocks of address space are available. Even with lots of unused address space, there may not be a single contiguous block of address space large enough to hold a new memory mapped file.

I think that the right thing to do now is to disable the push2.js test for 32-bit Windows. pushall.js is not the problem and it should pass on 32-bit Windows as-is if push2.js is not run.

Comment by auto [ 08/Mar/12 ]

Author:

{u'login': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5244 Remove temporary diagnostics

Remove the diagnostic logging I added for trying to debug this issue.
Branch: master
https://github.com/mongodb/mongo/commit/1bf5aac8d6a6a649421f3482abe70f3c12730aa3

Comment by Tad Marshall [ 08/Mar/12 ]

We do not seem to be out of memory when the MapViewOfFile() API fails. Total mapped is 384 MB when we start push2.js and 768 MB after push2.js when we start pushall.js. We are barely touching the page file with 77 MB of it used. It is somewhat possible that the test against BSON size of 16 MB isn't working but that seems unlikely given that there are far fewer slow updates logged before the failing case than we get past in the succeeding case. The failing case displays "info DFM::findAll(): extent 2:6787000 was empty, skipping ahead. ns:test.push2" 7 times before the failure, but the succeeding case prints it 8 times. The push2.js test kicks virtual memory usage from 1148 MB up to 1533 MB and on a machine with 1738 MB RAM that's interesting, but in theory it shouldn't break Windows APIs. Next step may be to add diagnostics to the MapViewOfFile() failure and see what's going on there. We may be getting stuck on a slow disk subsystem: watching the code run by Remote Desktop into the AWS instance, CPU usage spends most of its time in the single digits ... we are waiting for the disk almost all the time. If waiting for memory-mapped file I/O can give a MapViewOfFile() error, maybe sleeping and retrying would get us past the error. Not solved yet.

Comment by Tad Marshall [ 08/Mar/12 ]

It seems like something earlier in the tests must have put us in a bad state. I can log into the BuildBot machine and run pushall.js by hand and it works fine. Microsoft TechNet says that "Not enough storage is available to process this command" could be memory, page file, or disk space, Google suggests it could be lack of interrupt (IRP) stack space. I raised the page file to 5 GB and added diagnostics (db.hostInfo, db.serverStatus and db.stats) to see how bad the memory and test database size look on the next run. We're hitting test.5 when the MapViewOfFile() fails while pushall.js is pushing almost nothing, so it's not the pushing itself, it's the size of the memory-mapped database that's killing us. Also, push2.js is failing with the same MapViewOfFile() error but the test doesn't distinguish between the desired BSON size error and a Windows API failing.

Comment by auto [ 08/Mar/12 ]

Author:

{u'login': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-5244 Added temporary diagnostics to 2 tests

The 32-bit Windows BuildBot is showing signs of being out of
memory while mapping a file that has been expanded to try to
get a BSON size error (on purpose). In successful tests, we
get the BSON size error, on failing tests we get a Windows API
failure instead. I added db.hostInfo(), db.serverStatus() and
db.stats() to push2.js and pushall.js temporarily to see if we
can learn if it's really an out-of-memory failure.
Branch: master
https://github.com/mongodb/mongo/commit/ef111a5b999fdb6871775a57fedf4fab07a10df4

Generated at Thu Feb 08 03:08:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.