[SERVER-3564] new $archive update operator to reduce storage waste Created: 09/Aug/11  Updated: 25/Jun/15  Resolved: 25/Jun/15

Status: Closed
Project: Core Server
Component/s: Storage, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Antoine Girbal Assignee: Unassigned
Resolution: Won't Fix Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-12224 Allow an explicit padding (or total s... Closed
Related
is related to SERVER-3752 padding factor overcompaction and rel... Closed
Participants:

 Description   

The db tries to understand what padding for a document should be so that it doesnt have to migrate a document too many times.
When a document receive pushes it may grow significantly and drive the padding factor up.
Often times the document will eventually stop growing (no more commenting, or dialog is closed) and the document is left with a hefty padding that is wasted storage.
It would not be surprising if for such a system about 30% of storage is wasted, which also means about 30% of RAM wasted due to page mapping.

A very easy update operator to add would be $archive and could be use like:
db.col.update({_id: ..},

{ $archive: N}

)
Where N is the percentage of acceptable wasted space.
Then:

  • If the wasted space is lower than N, the doc would stay where it is.
  • if greater than N, the doc would be moved to a new slot on disk with minimum padding possible.

The application knows exactly when a document is "done" and can call the operator on it.
This represents very little code change since we already have the migration code done.



 Comments   
Comment by Ian Whalen (Inactive) [ 25/Jun/15 ]

Given the introduction of the WiredTiger storage engine in 3.0.0 we've seen massive improvements to compression of data on disk, and this particular improvement is no longer necessary.

Comment by Derek MacDonald [ 20/Sep/11 ]

For my use case, what Dwight is proposing works perfectly; old documents will not be modified.

Comment by Dwight Merriman [ 03/Sep/11 ]

i wonder if we should do this (which requires the developer to do something explicit and learn a new feature), or if the system can be more intelligent and handle this. in theory the database could detect that a document has not changed in a long time, and then do this. we will likely do a background incremental compaction facility anyway; that facility could do that for example.

if the documents aren't tiny the 30% will take up disk space but probably not RAM since the pages containing the padding won't be being referenced.

Generated at Thu Feb 08 03:03:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.