[SERVER-57228] Config Server crashes when updating FCV using an inconsistent FCV document Created: 26/May/21  Updated: 29/Oct/23  Resolved: 28/May/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc1, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Antonio Fuschetto Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Steps To Reproduce:

The steps to reproduce the problem with a sharded cluster from a Mongo Shell connected to the Mongo Router are:

// This is an environment using FCV 5.0
mongos> db.system.version.find()
{ "_id" : "featureCompatibilityVersion", "version" : "5.0" }
 
// Explicitly change the persisted document required by the FCV logic providing inconsistent information
mongos> db.system.version.update({"_id" : "featureCompatibilityVersion"}, {$set: {"version" : "4.4", "targetVersion" : "5.0"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
 
// This causes the Config Server's crash
mongos> db.adminCommand({"setFeatureCompatibilityVersion" : "5.0"})

Sprint: Sharding EMEA 2021-05-31
Participants:
Linked BF Score: 160

 Description   

When receiving an FCV update request, the Config Server relies on the current document admin.system.version {"_id": "featureCompatibilityVersion"} to determine whether to recover from a previous interrupted run.

The Config Server could crash during a FCV upgrade and after an explicit (and illegal) amend of such document. The invalid persisted information leads the Config Server to execute a wrong logic that requires the availability of missing information in that document (that is changeTimestamp), and then hit and invariant.

In this scenario, the FCV document is syntactically but not semantically correct.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 28/May/21 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-57228 Config Server crashes when updating FCV using an inconsistent FCV document
Branch: v5.0
https://github.com/mongodb/mongo/commit/ea20db1c220fb11f74a1ee2f460b07a2820b1e9c

Comment by Githook User [ 27/May/21 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-57228 Config Server crashes when updating FCV using an inconsistent FCV document
Branch: master
https://github.com/mongodb/mongo/commit/82369a1da427f67113324ae98f9413669173fdb1

Comment by Antonio Fuschetto [ 27/May/21 ]

We had a good range of solutions to the problem, such as the possibility of implementing a consistency check at the time of insertion or modification of the FCV document (there is, the changeTimestamp field must be there during the upgrade/downgrade operation). Nevertheless, considering the various use cases with the consequent possibility of not triggering these checks, I decided to simply resolve the risk of crash replacing the invariant with a user assertion (uassert). See below the new user experience:

mongos> db.system.version.update({"_id" : "featureCompatibilityVersion"}, {$set: {"version" : "4.4", "targetVersion" : "5.0"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
 
mongos> db.adminCommand({"setFeatureCompatibilityVersion" : "5.0"})
{
	"ok" : 0,
	"errmsg" : "The 'changeTimestamp' field is missing in the FCV document persisted by the Config Server. This may indicate that this document has been explicitly amended causing an internal data inconsistency.",
	"code" : 5722800,
	"codeName" : "Location5722800",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1622118981, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1622118981, 1)
}

Generated at Thu Feb 08 05:41:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.