[SERVER-8775] paddingFactor implementation causes 100% record size overhead for workloads where updates consistently grow documents more than 2x Created: 27/Feb/13 Updated: 17/Oct/14 Resolved: 30/Sep/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Aaron Staple | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
Mongo's paddingFactor algorithm attempts to allocate records with excess space (padding) for workloads where update operations grow documents to a larger size. The goal of paddingFactor is to reduce the frequency of document moves on disk under these workloads. The paddingFactor value is capped at 2. This means that total record size with padding introduced by the padding factor doesn't exceed around 2x the bson document size. In cases where updates always grow documents more than 2x, the padding factor will be 2 causing all records to be allocated with 100% space overhead. But because documents grow more than 2x, even with this padding the updates will force documents to move. So none of the padding will ever be used. If we think document growth of this magnitude could be a common use case we might look into handling it specifically. Test
Output
|
| Comments |
| Comment by Ramon Fernandez Marina [ 10/Sep/14 ] |
|
It seems that |
| Comment by Daniel Pasette (Inactive) [ 19/May/13 ] |
|
As mentioned above, this only works with non-capped collections. Compression is not leveraged for storage, so that is not a factor. The doc page has an example of this approach: http://docs.mongodb.org/manual/faq/developers/#faq-developers-manual-padding |
| Comment by Paul Reinheimer [ 19/May/13 ] |
|
Hi Dan, I've seen that, we're leveraging $set quite extensively, so we're never overwriting the original data. I guess we'd have to add like 4,000 "a"s, then delete them once the document is created? So Create with extra data -> delete extra data -> let workers run as required. Is any compression leveraged for storage? (do i need to be concerned with writing an easily compressed string to try and make room for less compressible data later) |
| Comment by Daniel Pasette (Inactive) [ 19/May/13 ] |
|
|
| Comment by Paul Reinheimer [ 18/May/13 ] |
|
A related issue (for my use case) is that if the paddingFactor isn't high enough, you can't use a capped collection. MongoDB is currently powering http://wheresitup.com/, the way we're using it is:
I might guess that the final document ranges somewhere from 5-30x the original size. I'd love to start using capped collections on some of this, but unless I get an option to insert with a padding option ($pad: 30000?) I can't touch them. |