[SERVER-37807] startup warning if oplog maxSize exceeds disk size Created: 29/Oct/18  Updated: 25/Jun/19  Resolved: 25/Jun/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kevin Pulo Assignee: Zach Yam (Inactive)
Resolution: Won't Fix Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-37749 replSetResizeOplog command does not v... Closed
related to SERVER-37806 oplogSize config option should valida... Backlog
Sprint: Execution Team 2019-06-17, Execution Team 2019-07-01
Participants:

 Description   

It's possible to create an oplog (whether by --oplogSize, replSetResizeOplog, or manually) where the capped maxSize is larger than the total size of the storage volume which holds the oplog. In this case, the oplog is basically guaranteed to (eventually) fill that disk, ie. without intervention the mongod can be expected to fail at some unknown time in the future.

mongod should check the maxSize of the local.oplog.rs collection, and issue a startup warning if it exceeds the total physical size of the storage volume which that collection is stored on. This will alert users to this situation, so that it can be corrected before it becomes a problem. The check should be done after any new oplog has been created, as well as after the size has been changed with replSetResizeOplog (or perhaps check during this command, and refuse to run (or require force: true in that case).



 Comments   
Comment by Zach Yam (Inactive) [ 25/Jun/19 ]

After discussing with geert.boschmilkie, and brian.lane, we have decided to close this issue and leave it unresolved.

As commented by Geert above, the oplog maxSize warning is not an accurate description of the amount of space left. Due to the fact that the compressibility of the oplog entries is variable, it is possible that the amount of data that can be stored on a volume may be several times the raw storage capacity of the disk.

 
At the same time, if the oplog volume is also used for data and journal files, there would be no warning even if the specified oplog size is many times larger than there is space on the volume.
 

Comment by Geert Bosch [ 14/Jun/19 ]

OK, here is a straw man proposal:

  • If the oplog has no data, or very little data (<10MB?), warn if the configured oplog size exceeds the available disk size. This will result in false positives if the volume only contains the oplog (or local database), but will generally compress well. It will give false negatives if the data is not very compressible, and/or datafiles are also stored on the same volume.
  • If the oplog has data, check if (max oplog data size / current oplog data size) > (current oplog storage size + available disk space) / current oplog storage size. This will give a false negative if data is stored on the same volume and grows in storage size, and a false positive if compressibility of the oplog increases, data size decreases, or storage size in the volume is expanded on demand.
Comment by Geert Bosch [ 15/Nov/18 ]

Note that the capped max size relates to data size, while the disk size is compressed size. Depending on compressibility of the oplog entries, and the selected compression algorithm, the amount of data that can be stored on a volume may be several times the raw storage capacity of the disk. At the same time, if the oplog volume is also used for data and journal files, there would be no warning even if the specified oplog size is many times larger than there is space on the volume.

Would a warning still be helpful given these caveats, or would it cause confusion in cases that are actually fine and give a false sense of security in cases that aren't?

Generated at Thu Feb 08 04:47:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.