[SERVER-2958] Freelist algorithm causes storage fragmentation Created: 18/Apr/11 Updated: 06/Dec/22 Resolved: 14/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MMAPv1, Storage |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chase Wolfinger | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 19 |
| Labels: | compaction, extents, freelist | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Description |
|
The algorithm used for the freelist groups related sizes into "buckets" that are searched for a free entry. The algorithm stops at 30 entries and then goes to the next bucket. If all buckets are searched then a new extent is allocated. In a high insert / delete environment where the inserts occur throughout the delay and peak and then deletes peak at a separate time (for example a session cache for a web site) this algorithm results in very large freelists where the smallest items filter to the top of each bucket. The freelist becomes filled with items that are never reused and blocking items that can be reused. One option is to allocate only on the bucket size (256, 512, 1024, etc.) which would guarantee that all items in the freelist are reusable. The following pull request https://github.com/mongodb/mongo/pull/37 illustrates how this could be fixed. |
| Comments |
| Comment by James Blackburn [ 15/Jan/13 ] | |
|
I wonder if this is the root cause of: Note, for the example script given, useProwerOf2Sizes seems to make the space leak worse rather than better. | |
| Comment by René M. Andersen [ 21/Sep/12 ] | |
|
@Eliot - Ok I will try that. We have a different scenario related to this jira | |
| Comment by Eliot Horowitz (Inactive) [ 14/Sep/12 ] | |
|
@rene - you should try usePowerOf2Sizes, should do a lot better http://docs.mongodb.org/manual/reference/command/collMod/ | |
| Comment by René M. Andersen [ 14/Sep/12 ] | |
|
Any news on when we can expect this to be fixed? This causes problems for us because we have a collection with many inserts and deletes occuring all the time. In production, the database size on disk grows with 1GB a week even though the size of the data stays approx. the same. | |
| Comment by Dwight Merriman [ 29/May/12 ] | |
|
fwiw in 2.2 the compact command has some options for specifying padding that may or may not be helpful for you | |
| Comment by Eric Milkie [ 29/May/12 ] | |
|
In the current head of master branch, you can force a collection 'mycoll' to allocate only on the freelist bucket size with the collMod command:
This interface is still in flux and will probably change for the 2.2 production release. In the meantime, you can indeed pad your documents yourself such that all document sizes are an exact freelist bucket size. | |
| Comment by Chris Angove [ 29/May/12 ] | |
|
I know this is old but 2 questions on this issue. 1) Is there something planned for 2.2 that will fix or help this issue? 2) It seems one (hacky) way of solving this would be for me to add padding to my documents. So for example if my docuemnt is usuall 100k-1MB I could pad to make every document exactly 1MB and then the existing algorithm should work since each document is the same size. Uses more disk space at the beginning but reduces fragmentation and lost filespace. | |
| Comment by Eliot Horowitz (Inactive) [ 27/Feb/12 ] | |
|
This patch is not something we are likely to include as is. | |
| Comment by David Edwards [ 27/Feb/12 ] | |
|
Is this patch likely to get applied to 2.0.x, or will it even make 2.2? Currently, we are facing the unpleasant prospect of regular maintenance to keep our cluster running due to this issue. | |
| Comment by Eliot Horowitz (Inactive) [ 20/Feb/12 ] | |
|
@ross - dropping and creating new collections is not subject to the freelist algorithm. | |
| Comment by Scott Hernandez (Inactive) [ 20/Feb/12 ] | |
|
Capped collections don't add/remove extents. They are fixed size. | |
| Comment by Ross Dickey [ 20/Feb/12 ] | |
|
I wonder: would the problem described be an issue with TTL-based capped collections? In a steady-state, you'd expect a TLL-capped collection would have about as many inserts as it does deletes (maybe somewhat more inserts, if your workload grows over time). I would expect this to cause massive bloat over time. Even our algorithm of creating a new collection for each day, then dropping old ones (essentially hacking ttl-collections by rotating them) causes disk usage graphs nearly identical to those in the google groups link above (http://groups.google.com/group/mongodb-user/browse_thread/thread/69da5f4a13f1db7c) | |
| Comment by Eric Anderson [ 20/Feb/12 ] | |
|
@chase nice work! Gotta get this moving.. massive time and effort waste due to not having this in. | |
| Comment by Dwight Merriman [ 26/Jan/12 ] | |
|
@chase thanks will take a look | |
| Comment by Chase Wolfinger [ 24/Jan/12 ] | |
|
See the following link: https://github.com/cwolfinger/mongo/commit/4641dbcad5bd94b21cf11d1d37531552642fdc94 | |
| Comment by Dwight Merriman [ 24/Jan/12 ] | |
|
what is the commit #/tag? | |
| Comment by Chase Wolfinger [ 24/Jan/12 ] | |
|
I have fwd ported my original fix to the 2.0 branch. It is https://github.com/cwolfinger/mongo on github. You need to switch to the 2.0 branch. My team has tested several million insert deletes without any leaking memory and no need to compact. | |
| Comment by Dwight Merriman [ 24/Jan/12 ] | |
|
until much better, (1) try 2.0 there are some minor improvements and (2) you may find the compact command helpful. | |
| Comment by Swen Thümmler [ 24/Jan/12 ] | |
|
I have the same problem with storing PHP sessions in MongoDB. We have to periodically repair the db to free the unused space (we are using mms, i can provide the data on request). | |
| Comment by Jeff Behl [ 23/Jan/12 ] | |
|
I've encountered this as well while using GridFS. See: http://groups.google.com/group/mongodb-user/browse_thread/thread/69da5f4a13f1db7c It's going to complicate my intended use of MongoDB unfortunately... | |
| Comment by Chase Wolfinger [ 22/Apr/11 ] | |
|
HI Dwight - this is a sample set of code that generates the fragementation in java --> DBCollection test = _db.getCollection("test"); for (int i = 0; i < 100000; i++) { test.remove(new BasicDBObject().append("key", i)); } CommandResult result = test.getStats(); | |
| Comment by Dwight Merriman [ 19/Apr/11 ] | |
|
like some of these ideas. first thing we need is a test script (.js) to see the level of fragmentation before/after with various changes. would just do a bunch of operations and then look at db.coll.stats() |