[SERVER-49916] use storageSize instead of size when creating capped collections Created: 26/Jul/20  Updated: 06/Apr/23  Resolved: 18/Sep/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.6.17, 4.0.19, 4.2.8
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Kay Agahd Assignee: Michael Gargiulo
Resolution: Won't Do Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-49918 mongodump/restore does not respect order Closed
is related to SERVER-49917 allow to resize capped collections Closed
Sprint: Execution Team 2020-08-24, Execution Team 2020-09-07, Execution Team 2020-09-21
Participants:

 Description   

When creating capped collections you have to define its maximal size in bytes which is size of uncompressed data.
However, since wiredTiger stores data compressed on disk, it is very difficult to guess the right "size" not to exceed a certain amount of data on disk because the compression factor (and thus the "storageSize") may vary.

Therefore, it would be much better to be able to define the maximal "storageSize" when creating a capped collection. For downwards compatibility, the "size" parameter can co-exists.

This feature is even more important as long as capped collections cannot be resized.



 Comments   
Comment by Dissatisfied Former User [ 02/Dec/21 ]

I would agree with Kay, here. Compression adds variability to the storage requirements that is unfortunate. I did not actually anticipate compression being a factor with capped collections. In one of my logging collection cases, I requested 1GB, storage is consuming 182MB. This collection could be storing ~5.2× as much historical information. (Currently dating back to 2021-08-11, it could date back to ~2020-04-21.)

My requirements for capped collections by size relates explicitly to on-disk storage limitations / allocation. Pathological compression cases (I've identified a few, zlib and Huffman compression schemes are funny this way) could result in on-disk storage consuming significantly more space than the input data as a result of this, a potential DOS vector. (Example bad, but at least not a growth case: compressing a stream of zeros or monotonic integers. In Zlib, this results in an ironically highly compressible and growing stream of `A` in the output.)

Finally, TTL is not a replacement for a ring buffer. At all. TTL sweeps happen per-minute, might not complete, and thus require explicit validation application-side even if used. A TTL sweep might be fine for "eventual consistency of storage reclamation", it's not so good for hard guarantees.

Comment by Kay Agahd [ 18/Sep/20 ]

Hi michael.gargiulo,
thank you for the information, even though I believe that your team made the wrong decision in not supporting capped collections in the future.

If you want to use 80% of your available disk space to store historical data, how would you do that with TTL indexing? It's just not possible! So please renounce your decision and support capped collections by letting the user define its maximal "storageSize" as I've written in the description.
Thanks!

Comment by Michael Gargiulo [ 18/Sep/20 ]

Hi kay.agahd@idealo.de 

Thank you for sharing this blog post, it was helpful to understand your use cases and how you are using capped collections.

I wanted to let you know that we have made a preliminary decision to stop supporting user created capped collections in an undetermined future release of MongoDB in favor of improving TTL indexes so they can meet the needs of users who currently employ capped collections. While we have no current timeline to discontinue this support, we will not be doing new feature requests related to capped collections so I will be closing this ticket and SERVER-49917

Since you mentioned TTL indexes are not ideal for your scenario, as you must choose an index, I would love to hear more from you so we keep the use cases in mind when moving forward with designing how we better improve upon the concept of user created capped collections and improve TTL indexing. 
 

Comment by Kay Agahd [ 06/Aug/20 ]

Hi Dima,

I just wrote in our techblog how we worked around the issue that capped collections cannot be resized without downtime so far:
https://medium.com/idealo-tech-blog/mission-possible-resize-mongodb-capped-collections-without-downtime-ec8aada2223f

Comment by Dmitry Agranat [ 28/Jul/20 ]

Hi kay.agahd@idealo.de,

Thank you for clarifying your request is not related to Oplog but for other on-demand created capped collections. I am passing this request to one of our teams for review.

Thanks,
Dima

Comment by Kay Agahd [ 27/Jul/20 ]

Hi Dima,

you got me wrong. I was not talking about the oplog. I'm happy how the oplog works since mongodb v3.6 (because we can resize it on the fly) and also since v4.0.
My feature request is for all other capped collections, not the oplog. I hoped that the issues that I've linked made this even more clear.

Comment by Dmitry Agranat [ 27/Jul/20 ]

Hi kay.agahd@idealo.de,

Let's assume we are talking about Oplog (which is capped collection) and not other types of capped collections. Starting in MongoDB 4.0, the Oplog can grow past its configured size limit to avoid deleting the majority commit point so I am not sure how limiting Oplog to one metric or another would help you with setting the "right" size which you will never have to change. Apart, Oplog resize method can be used if your write workload changes and you need to increase it based on the maintenance window requirements.

Thanks,
Dima

Generated at Thu Feb 08 05:21:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.