The maxAcceptableLogicalClockDrift startup-only parameter is currently restricted to non-negative values. However, setting it to 0 means that operations will spuriously fail to set the logical clock (due to the rate limiting mechanism) when the wallclock time naturally rolls over to the next second. This is because the wallclocks are on different hosts, and so not in 100% perfect lockstep (i.e. there is no global clock, which is the whole reason for having a logical clock).
- Example failed user operation:
{ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0, "writeError" : { "code" : 83, "errmsg" : "write results unavailable from testcc-7.kevincm1.0633.mongodbdns.com:27000 :: caused by :: ClusterTimeFailsRateLimiter: New cluster time, 1496037921, is too far from this node's wall clock time, 1496037920." } }
- Example failed system operations (note the timestamps are all very close to the "next second"):
2017-05-29T05:50:09.986+0000 I REPL [replication-1] Restarting oplog query due to error: ClusterTimeFailsRateLimiter: error in fetcher batch callback: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009.. Last fetched optime (with hash): { ts: Timestamp 1496036996000|4, t: 1 }[4820298219506527748]. Restarts remaining: 3 2017-05-29T05:50:09.987+0000 I REPL [replication-0] Restarting oplog query due to error: ClusterTimeFailsRateLimiter: error in fetcher batch callback: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009.. Last fetched optime (with hash): { ts: Timestamp 1496036996000|4, t: 1 }[4820298219506527748]. Restarts remaining: 2 2017-05-29T05:50:09.989+0000 I REPL [replication-1] Restarting oplog query due to error: ClusterTimeFailsRateLimiter: error in fetcher batch callback: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009.. Last fetched optime (with hash): { ts: Timestamp 1496036996000|4, t: 1 }[4820298219506527748]. Restarts remaining: 1 2017-05-29T05:50:09.991+0000 I REPL [replication-0] Error returned from oplog query (no more query restarts left): ClusterTimeFailsRateLimiter: error in fetcher batch callback: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009. 2017-05-29T05:50:09.991+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: ClusterTimeFailsRateLimiter: error in fetcher batch callback: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009. 2017-05-29T05:50:09.992+0000 I REPL_HB [ReplicationExecutor] Error in heartbeat (requestId: 148) to testcc-4.kevincm1.0633.mongodbdns.com:27000, response status: ClusterTimeFailsRateLimiter: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009. 2017-05-29T05:50:09.994+0000 I REPL_HB [ReplicationExecutor] Error in heartbeat (requestId: 151) to testcc-4.kevincm1.0633.mongodbdns.com:27000, response status: ClusterTimeFailsRateLimiter: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009. 2017-05-29T05:50:09.996+0000 I REPL_HB [ReplicationExecutor] Error in heartbeat (requestId: 155) to testcc-4.kevincm1.0633.mongodbdns.com:27000, response status: ClusterTimeFailsRateLimiter: New cluster time, 1496037010, is too far from this node's wall clock time, 1496037009.
When this happens it "appears" (to the rate-limiter) that the time difference between the nodes is 1 sec (apparently unacceptable, since the parameter is 0), but this is actually just an artifact of the per-second resolution — in fact the nodes could be arbitrarily close together in (wallclock) time.
This does not occur when maxAcceptableLogicalClockDrift is 1:
$ zgrep -c ClusterTimeFailsRateLimiter output-maxAcceptableLogicalClockDrift-* output-maxAcceptableLogicalClockDrift-0.gz:48 output-maxAcceptableLogicalClockDrift-1.gz:0 $ gzip -dc output-maxAcceptableLogicalClockDrift-0.gz | wc -l 57499 $ gzip -dc output-maxAcceptableLogicalClockDrift-1.gz | wc -l 60490
- simpleload-noisy.js
- output-maxAcceptableLogicalClockDrift-0.gz
- output-maxAcceptableLogicalClockDrift-1.gz
The maxAcceptableLogicalClockDrift parameter should instead be limited to only positive values, ie. >= 1