[SERVER-35565] Change capped collection age-out to be based on collection storageSize, not dataSize Created: 12/Jun/18 Updated: 27/Oct/23 Resolved: 15/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Due to SERVER-35431, the dataSize reported for a collection can be incorrect in 4.0, even without unclean crashes. We should switch capped collection sizes to be based on the storageSize, not the dataSize, so that the calculations can be accurate. We also believe this is more likely to be what users actually want when they configure a capped collection anyway, as they are generally trying to limit the disk space consumed by the collection. |
| Comments |
| Comment by Bruce Lucas (Inactive) [ 13/Jun/18 ] |
|
This would be a very significant behavior change for users that could result in substantially larger disk utilization if there is significant compression. If we do go this route seems like it should be opt-in, e.g. maybe set using a different parameter. |
| Comment by Michael Cahill (Inactive) [ 13/Jun/18 ] |
|
There will be unexpected consequences if storageSize is used instead of dataSize to truncate collections. In particular, for busy collections the storageSize can vary by up to 50% during checkpoints, so the behavior will be very uneven. It would be relatively easy to create workloads where all inserted data is deleted immediately until a checkpoint runs and reduces the storageSize, allowing more documents to be inserted. Even if we address that somehow by using some hybrid of storageSize, truncated size and an estimate of the compression ratio, this doesn't seem like a behavior that would be reasonable to change in a patch release. |