Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
-
None
-
ALL
Description
We have a replica set comprising of a primary, secondary and arbiter. Our database sizes are 20TB+ and we have set our Oplog sizing so that it was 16 hours long the first time our secondary got too far behind. After repriming the secondary and extending the Oplog to 54 hours, we went another 3 weeks before we experienced this issue again over the weekend.
What specific things should we look for at the culprit for this? From documentation, it looks like disk I/O or network issues could potentially cause these issues but I'm not seeing any indication of that so far but really just want to check all of our bases before looking into that any more.
Would sharding out the database help with issues like this? Our database growth keeps climbing and we definitely need to start looking to see if sharding can solve this issue.
I can upload logs too if necessary. Thanks