[SERVER-2958] Freelist algorithm causes storage fragmentation Created: 18/Apr/11  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chase Wolfinger Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 19
Labels: compaction, extents, freelist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-8078 Mongo databases fileSize grows withou... Closed
Related
related to SERVER-7159 Change DataRecord allocation strategy... Closed
is related to DOCS-499 Data Fragmentation and Storage Overview Closed
is related to DOCS-276 Add documentation for usePowerOf2Sizes Closed
is related to SERVER-5046 per collection option to pad to power... Closed
is related to SERVER-14081 Freelist improvements Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Case:

 Description   

The algorithm used for the freelist groups related sizes into "buckets" that are searched for a free entry. The algorithm stops at 30 entries and then goes to the next bucket. If all buckets are searched then a new extent is allocated. In a high insert / delete environment where the inserts occur throughout the delay and peak and then deletes peak at a separate time (for example a session cache for a web site) this algorithm results in very large freelists where the smallest items filter to the top of each bucket. The freelist becomes filled with items that are never reused and blocking items that can be reused. One option is to allocate only on the bucket size (256, 512, 1024, etc.) which would guarantee that all items in the freelist are reusable. The following pull request https://github.com/mongodb/mongo/pull/37 illustrates how this could be fixed.



 Comments   
Comment by James Blackburn [ 15/Jan/13 ]

I wonder if this is the root cause of:
SERVER-8078

Note, for the example script given, useProwerOf2Sizes seems to make the space leak worse rather than better.

Comment by René M. Andersen [ 21/Sep/12 ]

@Eliot - Ok I will try that. We have a different scenario related to this jira SERVER-2838 where we drop collections (a "fast delete" of aged data) and create new ones for new data. Will the "usePowerOf2Sizes" also make the reusing of the disk space more efficient for these collections?

Comment by Eliot Horowitz (Inactive) [ 14/Sep/12 ]

@rene - you should try usePowerOf2Sizes, should do a lot better http://docs.mongodb.org/manual/reference/command/collMod/

Comment by René M. Andersen [ 14/Sep/12 ]

Any news on when we can expect this to be fixed? This causes problems for us because we have a collection with many inserts and deletes occuring all the time. In production, the database size on disk grows with 1GB a week even though the size of the data stays approx. the same.

Comment by Dwight Merriman [ 29/May/12 ]

fwiw in 2.2 the compact command has some options for specifying padding that may or may not be helpful for you

Comment by Eric Milkie [ 29/May/12 ]

In the current head of master branch, you can force a collection 'mycoll' to allocate only on the freelist bucket size with the collMod command:

mycoll.runCommand( "collMod" , { "usePowerOf2Sizes" : true } )

This interface is still in flux and will probably change for the 2.2 production release.

In the meantime, you can indeed pad your documents yourself such that all document sizes are an exact freelist bucket size.

Comment by Chris Angove [ 29/May/12 ]

I know this is old but 2 questions on this issue.

1) Is there something planned for 2.2 that will fix or help this issue?

2) It seems one (hacky) way of solving this would be for me to add padding to my documents. So for example if my docuemnt is usuall 100k-1MB I could pad to make every document exactly 1MB and then the existing algorithm should work since each document is the same size. Uses more disk space at the beginning but reduces fragmentation and lost filespace.

Comment by Eliot Horowitz (Inactive) [ 27/Feb/12 ]

This patch is not something we are likely to include as is.
We are working on some other options for 2.2

Comment by David Edwards [ 27/Feb/12 ]

Is this patch likely to get applied to 2.0.x, or will it even make 2.2? Currently, we are facing the unpleasant prospect of regular maintenance to keep our cluster running due to this issue.

Comment by Eliot Horowitz (Inactive) [ 20/Feb/12 ]

@ross - dropping and creating new collections is not subject to the freelist algorithm.
If that's happening then there is something else going on.

Comment by Scott Hernandez (Inactive) [ 20/Feb/12 ]

Capped collections don't add/remove extents. They are fixed size.

Comment by Ross Dickey [ 20/Feb/12 ]

I wonder: would the problem described be an issue with TTL-based capped collections? In a steady-state, you'd expect a TLL-capped collection would have about as many inserts as it does deletes (maybe somewhat more inserts, if your workload grows over time). I would expect this to cause massive bloat over time.

Even our algorithm of creating a new collection for each day, then dropping old ones (essentially hacking ttl-collections by rotating them) causes disk usage graphs nearly identical to those in the google groups link above (http://groups.google.com/group/mongodb-user/browse_thread/thread/69da5f4a13f1db7c)

Comment by Eric Anderson [ 20/Feb/12 ]

@chase nice work!

Gotta get this moving.. massive time and effort waste due to not having this in.

Comment by Dwight Merriman [ 26/Jan/12 ]

@chase thanks will take a look

Comment by Chase Wolfinger [ 24/Jan/12 ]

See the following link: https://github.com/cwolfinger/mongo/commit/4641dbcad5bd94b21cf11d1d37531552642fdc94

Comment by Dwight Merriman [ 24/Jan/12 ]

what is the commit #/tag?

Comment by Chase Wolfinger [ 24/Jan/12 ]

I have fwd ported my original fix to the 2.0 branch. It is https://github.com/cwolfinger/mongo on github. You need to switch to the 2.0 branch. My team has tested several million insert deletes without any leaking memory and no need to compact.

Comment by Dwight Merriman [ 24/Jan/12 ]

until much better, (1) try 2.0 there are some minor improvements and (2) you may find the compact command helpful.

Comment by Swen Thümmler [ 24/Jan/12 ]

I have the same problem with storing PHP sessions in MongoDB. We have to periodically repair the db to free the unused space (we are using mms, i can provide the data on request).

Comment by Jeff Behl [ 23/Jan/12 ]

I've encountered this as well while using GridFS. See:

http://groups.google.com/group/mongodb-user/browse_thread/thread/69da5f4a13f1db7c

It's going to complicate my intended use of MongoDB unfortunately...

Comment by Chase Wolfinger [ 22/Apr/11 ]

HI Dwight - this is a sample set of code that generates the fragementation in java -->

DBCollection test = _db.getCollection("test");
test.ensureIndex(new BasicDBObject().append("key", 1),"pk", true);
int x = 0;
while (true)
{
x++;
for (int i = 0; i < 100000; i++)

{ BasicDBObject medium = new BasicDBObject(); medium.put("key", i); medium.put("abc", new byte[800]); medium.put("payload", new byte[(int) (Math.random()*3000)]); medium.removeField("_id"); test.insert(medium); }

for (int i = 0; i < 100000; i++)

{ test.remove(new BasicDBObject().append("key", i)); }

CommandResult result = test.getStats();
System.out.println("Cycle complete "x", extents="result.get("numExtents")", indexSize="result.get("totalIndexSize")", storageSize="+result.get("storageSize"));
}

Comment by Dwight Merriman [ 19/Apr/11 ]

like some of these ideas. first thing we need is a test script (.js) to see the level of fragmentation before/after with various changes. would just do a bunch of operations and then look at db.coll.stats()

Generated at Thu Feb 08 03:01:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.