[DOCS-10789] Docs for SERVER-19605: Oplog timeout should be configurable Created: 14/Sep/17  Updated: 29/Oct/23  Resolved: 14/Nov/17

Status: Closed
Project: Documentation
Component/s: None
Affects Version/s: None
Fix Version/s: 3.5.14, 3.6.0-rc0, 3.4.11

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Andrew Aldridge
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-19605 Oplog timeout should be configurable Closed
Participants:
Days since reply: 6 years, 13 weeks, 1 day ago
Epic Link: DOCS: 3.6 Server
Story Points: 0.5

 Description   

Documentation Request Summary:

I added a new server parameter, 'oplogInitialFindMaxSeconds' which configures how long the initial `find` command on the oplog waits before it times out.

Engineering Ticket Description:

Issue Status as of March 1, 2017

We intend to provide this functionality the during 3.5 development cycle and will evaluate the feasibility of backporting to MongoDB 3.4.

Please be aware this log message is typically symptomatic of an overloaded primary. Therefore, while increasing the oplog timeout may prevent these messages from being logged, it would likely not resolve the replication lag that is being observed. For MongoDB-related support discussion, I would recommend posting on the mongodb-user group or Stack Overflow with the mongodb tag. A question about how to resolve replication lag involves more discussion would be best posted on the mongodb-users group.

Original description

We just encountered a situation where all secondaries in two of our replica sets had ceased replication, and were 1-2 days behind the primary. This appears to have been caused in part by the fact that the initial oplog query from SECONDARY->PRIMARY times out after 30 seconds, but the oplog query takes > 5 minutes to run. Some searching led me to this JIRA SERVER-6733, where the timeout was reduced from 10 minutes to 30 seconds.

As a workaround, we are building a custom binary with an increased oplog timeout so that the initial oplog query is allowed to complete and so our secondaries have a chance to catch up.

Ideally, this value would be configurable with a flag or configuration option to avoid the need to recompile, and to allow users to customize the timeout for their particular situation.

We have a fairly large oplog:

> db.printReplicationInfo()
configured oplog size:   143477.3826171875MB
log length start to end: 1620689secs (450.19hrs)
oplog first event time:  Wed Jul 08 2015 23:11:24 GMT+0000 (UTC)
oplog last event time:   Mon Jul 27 2015 17:22:53 GMT+0000 (UTC)
now:                     Mon Jul 27 2015 17:22:53 GMT+0000 (UTC)

Here are some sample queries issued by the secondaries that are timing out:

Mon Jul 27 16:32:44.469 [conn5987144] query local.oplog.rs query: { ts: { $gte: Timestamp 1437813467000|94 } } cursorid:1368021807027379 ntoreturn:0 ntoskip:0 nscanned:4205713 nscannedObjects:4205713 keyUpdates:0 numYields:33130 locks(micros) r:38390680 nreturned:101 reslen:25310 1361497ms
Mon Jul 27 16:32:45.037 [conn5987146] query local.oplog.rs query: { ts: { $gte: Timestamp 1437813467000|94 } } cursorid:1368020207769978 ntoreturn:0 ntoskip:0 nscanned:4205713 nscannedObjects:4205713 keyUpdates:0 numYields:33131 locks(micros) r:38186447 nreturned:101 reslen:25310 1362020ms



 Comments   
Comment by Githook User [ 14/Nov/17 ]

Author:

{'name': 'Andrew Aldridge', 'username': 'i80and', 'email': 'i80and@foxquill.com'}

Message: DOCS-10789: oplogInitialFindMaxSeconds
Branch: master
https://github.com/mongodb/docs/commit/31c86646c66eb5ca091a33b09da43c030296cc04

Generated at Thu Feb 08 08:01:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.