Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.4.10, 2.5.3
Affects Version/s: 2.2.2, 2.3.2
Component/s: Replication
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of March 30, 2014

ISSUE SUMMARY
The replication code has logic to automatically detect clock skew between two replica set members. It prints a warning message in the log file ("replSet error possible failover clock skew issue?") but takes no further action. This can lead to a sync cycle, where two secondary nodes replicate from each other via the chaining mechanism, each assuming the other node is further ahead in the oplog.

USER IMPACT
A sync cycle (two replica set secondaries syncing from each other) can affect high availability, as the nodes no longer receive the writes from the primary node and will eventually contain stale data. This situation may not be detected immediately, leaving the replica set vulnerable to failure and in the worst case data loss.

SOLUTION
When a node detects clock skew between itself and its sync source, it now switches to the primary node as its sync source to avoid sync cycles.

WORKAROUNDS
Chaining can be globally disabled for a replica set, forcing all members to sync from the primary. See the chainingAllowed setting.

AFFECTED VERSIONS
All recent production release versions up to 2.4.9 are affected.

PATCHES
The fix is included in the 2.4.10 production release and the 2.5.3 development version, which will evolve into the 2.6.0 production release.

Original Description

When replication detects clock skew (the next applied op on a secondary is not strictly after the previous applied op), it logs an error and continues.

Instead, we should force syncing only from the primary, and not attempt to sync from any other secondary via chaining. This will avoid any situations where we might have created a chain cycle.

Assignee:: Matt Dannenberg (Inactive)
Reporter:: Eric Milkie
Participants:: auto, Eric Milkie, Githook User, Matt Dannenberg, Scott Hernandez
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Jan 29 2013 01:57:38 PM UTC
Updated:: Jul 11 2016 05:57:26 PM UTC
Resolved:: Sep 10 2013 08:48:17 PM UTC

Details

Description

Original Description

Attachments

Forms

Activity

People

Dates