[SERVER-13888] Add --initialDataFileSize startup parameter for MMAPv1 Created: 09/May/14 Updated: 06/Dec/22 Resolved: 14/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MMAPv1, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Kenton Varda | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: |
| Description |
|
With --smallfiles (and other options that reduce file sizes), Mongo will still allocate 16MB per database by default. For some use cases*, this is still way too big. By simply editing some hardcoded constants in the source code, I was able to reduce the default database size to a few kilobytes with no apparent loss of functionality. See: https://github.com/kentonv/mongo/commit/14f391a000134e5d9d65bb14a6110e5a5b0be61d Obviously, this patch is not suitable for merging. A real patch should be gated on a command-line flag. I would be happy to work on one, but would like direction from the Mongo team. Would a patch to add a new flag (--reallysmallfiles?) be accepted? Are there any glaring problems with the way I've approached this? How do you advise I move forward?
Unsurprisingly, several Sandstorm apps use MongoDB. We ran into a problem where instances of these apps were taking up lots of disk space, even though each instance stored very little actual data. People would create a TODO list with ten items and it would end up being 16MB on disk – 10,000x what was actually needed. Obviously, this isn't the use case MongoDB was intended to cover. Mongo is for huMONGOus data sets. However, in practice there are many reasons to choose Mongo other than scalability, and this can result in Mongo being used on very small data sets. E.g. any app that chooses to use Meteor (an excellent choice for a Sandstorm app) will probably use Mongo just because they integrate well. The problem was corrected when we asked developers to substitute my patched mongod, but that's a fairly high-maintenance solution for us as more developers come online. |
| Comments |
| Comment by Daniel Pasette (Inactive) [ 23/Aug/15 ] |
|
Hi Kenton, there is no way to run with journaling without a journal file allocated in WiredTiger. Please file a ticket in that project (https://jira.mongodb.org/browse/WT) if you'd like to request that feature, but I don't consider that a bug per se. Regarding impact of running with file_max at non-default levels and running with prealloc=false. These are undocumented flags, and we don't test this in the MongoDB integration tests at all, but it is tested by WiredTiger in isolation. The usual caveats apply. Inserting documents larger than file_max should not be an issue. The journaled data will grow the file to accommodate the space required. Depending on the durability requirements of your application, you can also choose to run without journaling – WiredTiger does not require journaling in the same way MMAPv1 does to maintain consistency due to an unclean shutdown. For many applications, using replication and relying on durability provided by checkpoints. This may be something to consider if your application allows. |
| Comment by Kenton Varda [ 22/Aug/15 ] |
|
Hi Dan, By "huge under the hood" I was just referring to the preallocation. The effect is that, for example, ls -l reports the file size as 128 bytes when in fact it's taking 100MB of underlying storage (which you can see using stat(1) and looking at the block count, or by using du(1) to see disk usage). Using file_max to reduce the log file size seems to work. But I feel like it's a bug that prealloc=false only turns off preallocation only for WiredTigerPreplog and not for WiredTigerLog – it seems like this flag should apply to both files. Also, is there any negative effect other than performance to setting file_max so low? For example, does it become impossible to insert a document that is larger than file_max? |
| Comment by Daniel Pasette (Inactive) [ 22/Aug/15 ] |
|
Hi Kenton, I'm not sure what you mean the files are "huge under the hood" – do you mean that they have a big impact on your application? WiredTiger requires that there be at least one journal file available for operation. However, there are some things you can do to minimize the size needed. If you use the following startup parameter, you can turn off preallocation and reduce the size of the journal files to 100KB. --wiredTigerEngineConfigString="log=(file_max=100k,prealloc=false)". This may have some impact on performance of your application, but it sounds like sounds like space constraints are of much greater concern for you. |
| Comment by Kenton Varda [ 21/Aug/15 ] |
|
Hi Dan, Sorry for the delay in replying. We just got a chance to try this out and there's a problem. It appears the WiredTiger journal pre-allocates 200MB of space using fallocate(). The files look small but they are actually huge under the hood. Unfortunately the --nopreallocj flag no longer seems to be honored by WiredTiger. We tried --wiredTigerEngineConfigString log=(prealloc=false) but this only caused one of the two journal files to avoid preallocation. Is there some other flag we could use to prevent all preallocation? |
| Comment by Daniel Pasette (Inactive) [ 25/Jun/15 ] |
|
Hi kenton, I wanted to check back in with you regarding this feature request. The WiredTiger storage engine available in v3.0 should address the issues you raised here. There is no pre-allocation and the storage is compressed by default with snappy (can optionally choose zlib). Creating an empty collection (currently) allocates 16k for the collection itself and 16k for the required _id index. I'll leave this feature request open, but will deprioritize it and change the description to clarify that it is only applicable to mmapv1 storage engine. Thanks |
| Comment by Kenton Varda [ 12/May/14 ] |
|
That would work in combination with some way to change the units of those flags. Currently, --nssize and similar flags take integer arguments measured in megabytes. For my use case, even one megabyte is wasteful. |
| Comment by Daniel Pasette (Inactive) [ 12/May/14 ] |
|
We've discussed this a bit internally. A more general solution will be to allow more fine grained control of the data file size used by default. Thus, in addition to --nssize we could add an --initialDataFileSize to give full control. We need to work out some of the details of what the behavior should be exactly. |
| Comment by Nick Martin [ 10/May/14 ] |
|
+1. This would be really useful to me too. We have some clusters with many small databases. It would be great if we could pay to store fewer big blocks of zeros on SSD. We'd happily trade performance when databases grow (rare for our use case) for the disk space savings (costs us real money). |