[SERVER-8775] paddingFactor implementation causes 100% record size overhead for workloads where updates consistently grow documents more than 2x Created: 27/Feb/13  Updated: 17/Oct/14  Resolved: 30/Sep/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Aaron Staple Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-1810 ability to set minimum allocation siz... Closed
Participants:

 Description   

Mongo's paddingFactor algorithm attempts to allocate records with excess space (padding) for workloads where update operations grow documents to a larger size. The goal of paddingFactor is to reduce the frequency of document moves on disk under these workloads.

The paddingFactor value is capped at 2. This means that total record size with padding introduced by the padding factor doesn't exceed around 2x the bson document size. In cases where updates always grow documents more than 2x, the padding factor will be 2 causing all records to be allocated with 100% space overhead. But because documents grow more than 2x, even with this padding the updates will force documents to move. So none of the padding will ever be used.

If we think document growth of this magnitude could be a common use case we might look into handling it specifically.

Test

c = db.c;
c.drop();
 
bigString = new Array( 1e3 ).toString();
biggerString = new Array( 2.5e3 ).toString();
 
for( i = 0; i < 1e5; ++i ) {
    c.save( { _id:i, b:bigString } );
    c.update( { _id:i }, { _id:i, b:biggerString } );
}
 
print( "record size: " + c.stats().size );
dataSize = 0;
c.find().forEach( function( doc ) {
        dataSize += Object.bsonsize( doc );
    } );
print( "data size: " + dataSize );
print( "padding factor: " + c.stats().paddingFactor );

Output

record size: 508711584
data size: 252500000
padding factor: 1.998000000007429



 Comments   
Comment by Ramon Fernandez Marina [ 10/Sep/14 ]

It seems that SERVER-1810 is a superset of this ticket, do I'm marking this ticket as a duplicate.

Comment by Daniel Pasette (Inactive) [ 19/May/13 ]

As mentioned above, this only works with non-capped collections. Compression is not leveraged for storage, so that is not a factor. The doc page has an example of this approach: http://docs.mongodb.org/manual/faq/developers/#faq-developers-manual-padding

Comment by Paul Reinheimer [ 19/May/13 ]

Hi Dan,

I've seen that, we're leveraging $set quite extensively, so we're never overwriting the original data. I guess we'd have to add like 4,000 "a"s, then delete them once the document is created? So Create with extra data -> delete extra data -> let workers run as required.

Is any compression leveraged for storage? (do i need to be concerned with writing an easily compressed string to try and make room for less compressible data later)

Comment by Daniel Pasette (Inactive) [ 19/May/13 ]

While not a perfect solution, if you know the maximum final size of a document you can manually pad documents at insertion time.
EDIT: this does NOT work with capped collections because all padding will be removed on initial sync. Subsequent updates will break replication.

Comment by Paul Reinheimer [ 18/May/13 ]

A related issue (for my use case) is that if the paddingFactor isn't high enough, you can't use a capped collection.

MongoDB is currently powering http://wheresitup.com/, the way we're using it is:

  • User submits URL and locations
  • MongoDB skeleton document is created to obtain the DocumentId. This skeleton contains the URL the user entered, the locations selected, some other junk.
  • Workers are fired up to service the request, each one updates the document to contain more information.

I might guess that the final document ranges somewhere from 5-30x the original size.

I'd love to start using capped collections on some of this, but unless I get an option to insert with a padding option ($pad: 30000?) I can't touch them.

Generated at Thu Feb 08 03:18:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.