Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.13, 3.4.3, 3.5.4
Affects Version/s: None
Component/s: Replication
Labels:
- bkp

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v3.4, v3.2
Sprint:
Repl 2017-02-13, Repl 2017-03-06
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently the initial find for the GTE query on the oplog has a 60 second maxTimeMs, and the subsequent getMores have a maxTimeMs equal to the election timeout / 2. Both the find and the getMore, however, have timeout from the networking subsystem equal to the election timeout. Given the default election timeout is 10 seconds, that means if the initial find takes more than 10 seconds to find the common point in the oplog and return the first batch it will time out, rather than waiting the 60 seconds of the maxTimeMs.

This can make it hard for nodes that have high repl lag to catch up, as if the common point in the oplog is far back, it could consistently take more than 10 seconds, which would leave the node unable to start replicating.

is related to

SERVER-19605 Oplog timeout should be configurable

Closed

Assignee:: Spencer Brody (Inactive)
Reporter:: Spencer Brody (Inactive)
Participants:: Githook User, Michael Brenden, Spencer Brody
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Feb 14 2017 04:51:01 PM UTC
Updated:: Aug 27 2018 03:35:03 PM UTC
Resolved:: Feb 15 2017 04:32:51 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates