The replication code has logic to automatically detect clock skew between two replica set members. It prints a warning message in the log file ("replSet error possible failover clock skew issue?") but takes no further action. This can lead to a sync cycle, where two secondary nodes replicate from each other via the chaining mechanism, each assuming the other node is further ahead in the oplog.
A sync cycle (two replica set secondaries syncing from each other) can affect high availability, as the nodes no longer receive the writes from the primary node and will eventually contain stale data. This situation may not be detected immediately, leaving the replica set vulnerable to failure and in the worst case data loss.
When a node detects clock skew between itself and its sync source, it now switches to the primary node as its sync source to avoid sync cycles.
Chaining can be globally disabled for a replica set, forcing all members to sync from the primary. See the chainingAllowed setting.
All recent production release versions up to 2.4.9 are affected.
The fix is included in the 2.4.10 production release and the 2.5.3 development version, which will evolve into the 2.6.0 production release.