[SERVER-64955] Examine workaround option for high replica set config version Created: 25/Mar/22  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: Vinicius Grippa Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Sprint: Repl 2022-05-16
Participants:

 Description   

When running rs.status or rs.conf, we will observe the current version of the replica set:

			"configVersion" : 88889

The same information can be found in the local.system.replset collection.

The problem is every time there is a reconfiguration of the replicaset, this number is incremented hundreds of times. If we do the reconfig with force:true it grows more than 100K this value can grow by thousands

Since it is an ever-increasing value, this can eventually lead to the:

mongo> rs.reconfig(cfg, {"force": true} );
{
"ok" : 0,
"errmsg" : "version field value of 2147494497 is out of range",
"code" : 103,
"codeName" : "NewReplicaSetConfigurationIncompatible"
}

The workaround is changing the version in the local collection with the command:

db.system.replset.update({"_id": "replset"}, {$set: {"version":1}})

And restart the replica set. I wonder why the version is not incremented by one (version++). In this way, the version would virtually never expire since the maximum value is ~2billion.



 Comments   
Comment by Eric Sedor [ 10/May/22 ]

Hi again vgrippa@gmail.com

I've been in touch with the replication team and it's our intention that {force:true} reconfigs be used rarely and only when truly necessary as part of disaster recovery. We document here that this option can have unexpected consequences, and raising the config version by more than one is among them.

We do want to make sure readers who find this ticket see that, in the unlikely event a replica set is forced into this state, our current recommendation is indeed to reinitialize the replica set rather than making edits to system collections on a live replica set.

With that in mind, I am repurposing this ticket to the task of identifying whether or not a more reasonable workaround is available.

Comment by Eric Sedor [ 04/May/22 ]

Thanks vgrippa@gmail.com,

I'm going to pass this ticket on to the Replication team to consider:

  • In the event that many replica set reconfigs with force:true push a replica set config version to the limit of max int, is there (or should there be) a more recommended workaround than reinitializing the replica set by:
    • first converting it to a single stand-alone node
    • dropping the local database on that node
    • converting the standalone node back into a new replica set?
  • Is it necessary for rs config version to be increased to long long instead of int, as has been done for replication term in SERVER-63421?
Comment by Vinicius Grippa [ 29/Mar/22 ]

Hi Eric,

Your assumption is correct. I was not able to "naturally" reach the limit, but only doing artificial tests. Still, it raised a concern, hence I opened this ticket. I believe the main point to be clarified since I could not understand by looking at the code is why the force moves the version a few thousand times. This is also not present in the documentation, so that caused the surprise.

Comment by Eric Sedor [ 29/Mar/22 ]

Apologies vgrippa@gmail.com, I conflated config version and replica set election term; so SERVER-63421 is not related.

I'll adjust course here a bit; I don't think we need FTDC, but I would like to understand if you have a replica set that somehow reached "version field value of 2147494497 is out of range" during normal or semi-normal operation, or if you contrived a script to reach the limit. Thanks!

Comment by Vinicius Grippa [ 29/Mar/22 ]

Hi Eric,

Thanks for checking. So I re-run the tests and the counter only increases by thousands when doing force reconfig. So regular reconfig increments by 1, forcing it increments by random tens of thousands.

Please see:

rs0:PRIMARY> rs.conf().version; rs.reconfig(c); rs.conf().version;
102024
rs0:PRIMARY> rs.conf().version; rs.reconfig(c); rs.conf().version;
102025
rs0:PRIMARY> rs.conf().version; rs.reconfig(c); rs.conf().version;
102026
rs0:PRIMARY> rs.conf().version; rs.reconfig(c, {force:true}); rs.conf().version;
167403
rs0:PRIMARY> rs.conf().version; rs.reconfig(c, {force:true}); rs.conf().version;
258425

That is easily reproducible. Let me know if you still need the FTDC.

Comment by Eric Sedor [ 29/Mar/22 ]

Hi vgrippa@gmail.com,

Initially, I want to recommend against updating local.system.replset manually. I'll follow up with our recommended workaround for this issue.

As well, we recently opened SERVER-63421 which should allow for higher values (up to max int64) in 4.2 and 4.4;

We're interested in the $dbpath/diagnostic.data directory and mongod log files for all replica set nodes during both a reconfig and a forced reconfig, which show the kinds of term increases (100, 100k) you're reporting. If they're too large for this ticket, you can use this secure upload portal.

Thank you,
Eric

Generated at Thu Feb 08 06:01:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.