[DOCS-4748] Document the process for calculating oplog size Created: 30/Jan/15  Updated: 30/Oct/23  Resolved: 01/Nov/22

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: Server_Docs_20231030

Type: Task Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Probably on the oplog resize page or on prod notes


Participants:
Days since reply: 1 year, 14 weeks, 1 day ago
Epic Link: DOCSP-1769

 Description   

Replication oplog window should cover normal maintenance/downtime windows (to avoid full resync), and also time needed to restore a RS member (either time needed for initial sync, or time to restore the last backup so it will not become stale).

To estimate the size you should look at the oplog rate (oplog GB/h graph in MMS or use the db.printReplicationInfo() output) at it's maximum (i.e. during the load peak).

The default oplog size should never be used without consideration.

For the sake of example, let's say my system has the following timings:

  • usual maintenance window: 24 hours
  • time to restore a backup is 3 hours
  • backups are taken every 24 hours
  • time needed for an initial sync: 6 hours
  • oplog rate is 3 GB/h at peak time

Here I would need to cover at least 24 hours for the usual maintenance window (it also covers the time needed for an initial sync, which is 6 hours). However, if I were to restore a backup that is 24 hours behind, and the restore process takes 3 hours, I would need at least 27 hours in the oplog to allow the restored replica set member to catch up.

It's practical to round it up to 36 hours, which is 108GB at the 3 GB/h rate. Further, it makes sense to account for sudden load spikes and some future growth of the oplog rate, so I would use 200GB instead.

cc richard@10gen.com



 Comments   
Comment by Education Bot [ 01/Nov/22 ]

Hello! This ticket has been closed due to inactivity. If you believe this ticket is still important, please reopen it and leave a comment to explain why. Thank you!

Comment by Daniel Coupal [ 21/Oct/19 ]

As for a lot of cases, it depends, however you are right that 3-5 days would be the minimum to suggest to a customer.

Comment by Nic Cottrell [ 21/Oct/19 ]

daniel.coupal  - when you ran NHTT, the rule of thumb was always 72 hours to allow for weekend downtime, right?  I guess we could say

max(usual maintenance window, weekend): 64 hours

Generated at Thu Feb 08 07:48:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.