[SERVER-61168] Mongo Secondary Oplog Keeps Getting Too Far Behind Primary Created: 01/Nov/21 Updated: 04/Nov/21 Resolved: 04/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Neil Allen | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
We have a replica set comprising of a primary, secondary and arbiter. Our database sizes are 20TB+ and we have set our Oplog sizing so that it was 16 hours long the first time our secondary got too far behind. After repriming the secondary and extending the Oplog to 54 hours, we went another 3 weeks before we experienced this issue again over the weekend. What specific things should we look for at the culprit for this? From documentation, it looks like disk I/O or network issues could potentially cause these issues but I'm not seeing any indication of that so far but really just want to check all of our bases before looking into that any more. Would sharding out the database help with issues like this? Our database growth keeps climbing and we definitely need to start looking to see if sharding can solve this issue. I can upload logs too if necessary. Thanks |
| Comments |
| Comment by Edwin Zhou [ 04/Nov/21 ] | ||||
|
After inspecting the log files and FTDC, it appears that the secondary node goes into ROLLBACK but is unable to catch up because the primary oplog is overflowing.
You may be able to find guidance in our docs to helpavoid replica set rollbacks The SERVER project is reserved for bug reports, so if you'd like to troubleshoot this further we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums. If the discussion there leads you to suspect a bug in the MongoDB server, then we'd want to investigate it as a possible bug here in the SERVER project. Best, | ||||
| Comment by Neil Allen [ 02/Nov/21 ] | ||||
|
Logs have been uploaded to support uploader as cspdb03ip-2021-10-30.log.bz2 (mongo log from day of crash) and diagnostic data cspdb03ip-diagnostic-data.tar.gz | ||||
| Comment by Edwin Zhou [ 02/Nov/21 ] | ||||
|
Thanks for you for your report. Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Best, |