-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.0-rc1, 2.4.11, 2.6.4, 2.7.5
-
Component/s: Storage
-
Windows
-
Platform 8 08/28/15, Platform 7 08/10/15, Platform 9 (09/18/15), Platform A (10/09/15), Platform B (10/30/15), Platform C (11/20/15), Platform D (12/11/15)
In Windows, when the size of the data files is close to half the virtual address space limit, then the files can be initially opened, mapped and used (collection created and extents allocated) just fine. However, when the server is merely stopped and restarted (where extents have been allocated for a collection), it crashes with an inability to map the files.
I've narrowed this issue down to a change between 2.4.0-rc0 and 2.4.0-rc1, though it still exists in 2.4.11, 2.6 and 2.7 (though it presents slightly differently in 2.6 and 2.7 than it does in 2.4). It looks like some of the data files might be somehow being mapped multiple times? Some of them are certainly unmapped several times. Maybe related to SERVER-12567?
In Windows 2008 R2, the virtual address space limit for 64 bit user processes is 8TB. I've done all of this testing without journalling to simplify things, but when I was previously looking at it with journalling on, the situation was similar but with an effective limit of 4TB instead. The results are the same whether the "2008plus" or "legacy" win32 x64 builds are used.
A workaround is to use Windows 2012 R2 instead of 2008 R2, where the limit is 128TB instead of 8TB. However, this problem will still affect Windows 2012 R2 for datasets around the 32TB mark (with journalling).
By contrast, in Linux if I use "ulimit -v 10485760" to limit the virtual address space to 10GB, then all of these versions have the expected behaviour, ie. they are able to
- create 9GB of data files
- restart and then open the data files.
Very verbose logfiles are attached. They show the results for
- Windows 2008 R2 vs Linux (10GB vmem limit)
- MongoDB versions 2.4.0-rc0, 2.4.0-rc1, 2.6.4, and 2.7.5
- Creating capped collections of various sizes.
The Windows logfiles do not show any file allocation messages. This is because the files were allocated using an external tool that used (the Windows equivalent of) fast_allocate. (Otherwise allocating TBs of data files on Windows takes hours instead of seconds, even on SSDs. Any fast allocation bugs don't matter, since this is only testing the ability to mmap files.) You can tell when the dbpath has been cleared out by when local.ns gets allocated. A useful command to see the main timeline in each log is something like:
grep -E 'create collection test.|MongoDB starting|dbstats|assert|Map.*errno|allocating new datafile .*local.ns' mongod-2.4.0-rc1.log
Some of the smaller tests were done on an i2.8xlarge instance with 8x 800GB local SSDs in RAID0 (~6TB). The tests above this size used a hs1.8xlarge with 16x 2TB local disks in RAID0.
The results of the tests are:
OS | vmem | MongoDB | size | allocate | restart | db.stats | expected? |
---|---|---|---|---|---|---|---|
Windows | 8TB | 2.4.0-rc0 | 3.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc0 | 3.8TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc0 | 4.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc0 | 5.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc0 | 7.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc0 | 8.5TB | Fails | Fails | Fails | Expected |
OS | vmem | MongoDB | size | allocate | restart | db.stats | expected? |
---|---|---|---|---|---|---|---|
Windows | 8TB | 2.4.0-rc1 | 3.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc1 | 3.7TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc1 | 3.8TB | Works | Works | Works | Expected |
Windows | 8TB | 2.4.0-rc1 | 3.9TB | Works | Works | Fails | Unexpected |
Windows | 8TB | 2.4.0-rc1 | 4.5TB | Works | Works | Fails | Unexpected |
Windows | 8TB | 2.4.0-rc1 | 5.5TB | Works | Works | Fails | Unexpected |
Windows | 8TB | 2.4.0-rc1 | 7.5TB | Works | Works | Fails | Unexpected |
Windows | 8TB | 2.4.0-rc1 | 8.5TB | Fails | Fails | Fails | Expected |
OS | vmem | MongoDB | size | allocate | restart | db.stats | expected? |
---|---|---|---|---|---|---|---|
Windows | 8TB | 2.6.4 | 3.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.6.4 | 3.9TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.6.4 | 5.5TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.6.4 | 7.5TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.6.4 | 8.5TB | Fails | Fails | Expected |
OS | vmem | MongoDB | size | allocate | restart | db.stats | expected? |
---|---|---|---|---|---|---|---|
Windows | 8TB | 2.7.5 | 3.5TB | Works | Works | Works | Expected |
Windows | 8TB | 2.7.5 | 3.9TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.7.5 | 5.5TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.7.5 | 7.5TB | Works | Fails | Unexpected | |
Windows | 8TB | 2.7.5 | 8.5TB | Fails | Fails | Expected |
OS | vmem | MongoDB | size | allocate | restart | db.stats | expected? |
---|---|---|---|---|---|---|---|
Linux | 10GB | 2.4.0-rc0 | 9GB | Works | Works | Works | Expected |
Linux | 10GB | 2.4.0-rc0 | 11GB | Fails | Works | Fails | Expected |
Linux | 10GB | 2.4.0-rc1 | 9GB | Works | Works | Works | Expected |
Linux | 10GB | 2.4.0-rc1 | 11GB | Fails | Works | Fails | Expected |
Linux | 10GB | 2.6.4 | 9GB | Works | Works | Works | Expected |
Linux | 10GB | 2.6.4 | 11GB | Fails | Fails | Expected | |
Linux | 10GB | 2.7.5 | 9GB | Works | Works | Works | Expected |
Linux | 10GB | 2.7.5 | 11GB | Fails | Fails | Expected |
Where failures occur, the messages are:
OS | MongoDB | Message |
---|---|---|
Windows | 2.4.0-rc0, 2.4.0-rc1, and 2.6.4 | errno:487 Attempt to access invalid address. |
Windows | 2.7.5 | errno:1132 The base address or the file offset specified does not have the proper alignment. |
Linux | 2.4.0-rc0, 2.4.0-rc1, 2.6.4, and 2.7.5 | errno:12 Cannot allocate memory |
- related to
-
SERVER-19805 MMap memory mapped file address allocation code cannot handle addresses non-aligned to memory mapped granularity size
- Closed