[SERVER-35565] Change capped collection age-out to be based on collection storageSize, not dataSize Created: 12/Jun/18  Updated: 27/Oct/23  Resolved: 15/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Storage Execution Team
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-35431 rollback does not correct sizeStorer ... Backlog
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:

 Description   

Due to SERVER-35431, the dataSize reported for a collection can be incorrect in 4.0, even without unclean crashes. We should switch capped collection sizes to be based on the storageSize, not the dataSize, so that the calculations can be accurate. We also believe this is more likely to be what users actually want when they configure a capped collection anyway, as they are generally trying to limit the disk space consumed by the collection.



 Comments   
Comment by Bruce Lucas (Inactive) [ 13/Jun/18 ]

This would be a very significant behavior change for users that could result in substantially larger disk utilization if there is significant compression. If we do go this route seems like it should be opt-in, e.g. maybe set using a different parameter.

Comment by Michael Cahill (Inactive) [ 13/Jun/18 ]

There will be unexpected consequences if storageSize is used instead of dataSize to truncate collections. In particular, for busy collections the storageSize can vary by up to 50% during checkpoints, so the behavior will be very uneven. It would be relatively easy to create workloads where all inserted data is deleted immediately until a checkpoint runs and reduces the storageSize, allowing more documents to be inserted.

Even if we address that somehow by using some hybrid of storageSize, truncated size and an estimate of the compression ratio, this doesn't seem like a behavior that would be reasonable to change in a patch release.

Generated at Thu Feb 08 04:40:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.