Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-10789

Docs for SERVER-19605: Oplog timeout should be configurable

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.5.14, 3.6.0-rc0, 3.4.11
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Documentation Request Summary:

      I added a new server parameter, 'oplogInitialFindMaxSeconds' which configures how long the initial `find` command on the oplog waits before it times out.

      Engineering Ticket Description:

      Issue Status as of March 1, 2017

      We intend to provide this functionality the during 3.5 development cycle and will evaluate the feasibility of backporting to MongoDB 3.4.

      Please be aware this log message is typically symptomatic of an overloaded primary. Therefore, while increasing the oplog timeout may prevent these messages from being logged, it would likely not resolve the replication lag that is being observed. For MongoDB-related support discussion, I would recommend posting on the mongodb-user group or Stack Overflow with the mongodb tag. A question about how to resolve replication lag involves more discussion would be best posted on the mongodb-users group.

      Original description

      We just encountered a situation where all secondaries in two of our replica sets had ceased replication, and were 1-2 days behind the primary. This appears to have been caused in part by the fact that the initial oplog query from SECONDARY->PRIMARY times out after 30 seconds, but the oplog query takes > 5 minutes to run. Some searching led me to this JIRA SERVER-6733, where the timeout was reduced from 10 minutes to 30 seconds.

      As a workaround, we are building a custom binary with an increased oplog timeout so that the initial oplog query is allowed to complete and so our secondaries have a chance to catch up.

      Ideally, this value would be configurable with a flag or configuration option to avoid the need to recompile, and to allow users to customize the timeout for their particular situation.

      We have a fairly large oplog:

      > db.printReplicationInfo()
      configured oplog size:   143477.3826171875MB
      log length start to end: 1620689secs (450.19hrs)
      oplog first event time:  Wed Jul 08 2015 23:11:24 GMT+0000 (UTC)
      oplog last event time:   Mon Jul 27 2015 17:22:53 GMT+0000 (UTC)
      now:                     Mon Jul 27 2015 17:22:53 GMT+0000 (UTC)
      

      Here are some sample queries issued by the secondaries that are timing out:

      Mon Jul 27 16:32:44.469 [conn5987144] query local.oplog.rs query: { ts: { $gte: Timestamp 1437813467000|94 } } cursorid:1368021807027379 ntoreturn:0 ntoskip:0 nscanned:4205713 nscannedObjects:4205713 keyUpdates:0 numYields:33130 locks(micros) r:38390680 nreturned:101 reslen:25310 1361497ms
      Mon Jul 27 16:32:45.037 [conn5987146] query local.oplog.rs query: { ts: { $gte: Timestamp 1437813467000|94 } } cursorid:1368020207769978 ntoreturn:0 ntoskip:0 nscanned:4205713 nscannedObjects:4205713 keyUpdates:0 numYields:33131 locks(micros) r:38186447 nreturned:101 reslen:25310 1362020ms
      

            Assignee:
            andrew.aldridge@mongodb.com Andrew Aldridge
            Reporter:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:
              6 years, 23 weeks, 3 days ago