[SERVER-36159] Log whenever the gossiped config server opTime term changes Created: 17/Jul/18  Updated: 29/Oct/23  Resolved: 11/Jul/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.15, 4.2.0-rc0, 4.0.13

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kevin Pulo
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-42155 Indicate term mismatches when readCon... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0, v3.6, v3.4
Sprint: Sharding 2018-08-13, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-05-20, Sharding 2019-06-03, Sharding 2019-06-17, Sharding 2019-07-01, Sharding 2019-08-12
Participants:
Case:

 Description   

In mongodb 3.4 and earlier, the sharded cluster nodes gossip the config server's opTime in order to ensure they always read the latest routing metadata. This opTime contains both timestamp and a term and since it is only used internally between the cluster nodes is not signed or verified in any way.

As part of a customer support case we observed the gossiped config server opTime term jump forward without it actually having changed on the config server itself. Such as jump could potentially happen due to DNS misconfiguration causing members of a sharded cluster to inadvertently talk to the wrong host and since there is no validation in 3.4 the term jumping forward could have disastrous consequences for the entire cluster.

In order to help diagnose such issues we should have shard nodes log whenever the config server's opTime term changes. Such logging should also ideally include the node from which the new term came so that it can be traced back to the first node which caused it.



 Comments   
Comment by Githook User [ 13/Sep/19 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-36159 Log whenever the gossiped config server opTime term changes

(cherry picked from commit f6bee9fab63e45bd7ef30e73aff6a21edca16aa2)
Branch: v3.6
https://github.com/mongodb/mongo/commit/392af48b71dfb690a6cc1d119fdc40b61925098b

Comment by Githook User [ 26/Aug/19 ]

Author:

{'username': 'devkev', 'email': 'kevin.pulo@mongodb.com', 'name': 'Kevin Pulo'}

Message: SERVER-36159 Log whenever the gossiped config server opTime term changes

(cherry picked from commit c2c6ed338f617e89600f4a221abc19045431c46e)
Branch: v4.0
https://github.com/mongodb/mongo/commit/f6bee9fab63e45bd7ef30e73aff6a21edca16aa2

Comment by Kevin Pulo [ 11/Jul/19 ]

Remaining work will be continued in SERVER-42155.

Comment by Githook User [ 30/May/19 ]

Author:

{'name': 'Kevin Pulo', 'email': 'kevin.pulo@mongodb.com', 'username': 'devkev'}

Message: SERVER-36159 Log whenever the gossiped config server opTime term changes
Branch: master
https://github.com/mongodb/mongo/commit/c2c6ed338f617e89600f4a221abc19045431c46e

Comment by Eric Sommer [ 18/Feb/19 ]

Additionally, we should extend the error message returned as part of the WriteConcernError (or the ExceededTimeLimit error) to include text like: "Current term is 1 but request is asking for 6" or something similar. This would help alert users that a mismatches/unsatisfiable term is causing the error.

Generated at Thu Feb 08 04:42:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.