[SERVER-9711] make it impossible to have a wrong config server specification within a cluster Created: 16/May/13  Updated: 07/Mar/14  Resolved: 07/Mar/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Dwight Merriman Assignee: Unassigned
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-1658 make adding/changing config servers easy Closed
related to SERVER-8509 add startup parameter to mongod to sp... Closed
is related to SERVER-12781 mongos should enforce that config ser... Closed
Participants:

 Description   

I imagine via human error, typos, etc., especially when replacing a server, it is currently possible to have a cluster where there is not agreement on which config servers are the "right" ones. If this is already impossible lmk and we can close this ticket.

For example suppose machines a,b,c are the config servers. we replace c with d. to do that one might put a copy of the a/b/c data (any) on d, and then switch everything over to use --configdb a,b,d. However i imagine there could be a window of time where some mongod or mongos's think a,b,c is authoritative and some think a,b,d is. We should assure that in said situation there are error messages logged an no mutations to a/b/c/d that land with a triplet of config servers that are inconsistent.

I suppose if the config servers are a replica set, it is pretty hard to get the members out of sync. Perhaps that is one approach, also for the config servers to be a replica set some new functionality there would be needed to have the right transactional semantics. So that is one approach.

Here is another idea:

  • each config server has an identity string for itself that is unique and persistent. as hostnames could be duplicated, maybe we put in /etc or somewhere a mongo.sig file with a GUID in it. we wouldn't want it in the data directory as that will be backed up and restored elsewhere and this is about the machine's identity.
  • then each machine in the cluster has a concept of who the three config servers are, and can ask them their signature. So we have the set CFG= {S1,S2,S3}

    that are the current config servers for the cluster. Operations on the config servers include this "here is who i think the config servers are" with them. The config server rejects the operation if the set isn't right. Perhaps even reads, writes for sure.

Perhaps the config servers are the only ones who need to share this CFG signature set, if all config server mutations are done by the config servers themselves. Then the other members of the cluster just ask one of the config servers to do that operation. The other members need less intelligence on this then. They could in theory read from a phantom config server by mistake, but they couldn't do a write that isn't consistent among the three.

Partial detection would be a good start if something is easy and could go into 2.5.



 Comments   
Comment by Greg Studer [ 07/Mar/14 ]

As scott mentioned, this is WAD - shards will only accept requests from mongoses where the config string is exactly consistent.

There's no way to prevent shards from getting contacted by particular inconsistent mongoses, but we may also allow shards to be started with explicit config information and not rely on the first mongos to populate it, but this is tracked separately (related to auth info as well).

Comment by Scott Hernandez (Inactive) [ 17/May/13 ]

Yes, as I stated, it is the shards which reject mongos requests with the incorrect configdb string.

Comment by Dwight Merriman [ 17/May/13 ]

so you seem to be right at least partially. i tried this:

  • spun up 4 config servers a,b,c,d
  • pointed a mongos at the first 3 a,b,c
  • tried starting a mongos with a,b,d. it wouldn't start.

however this did start:

  • spun up 4 config servers a,b,c, and a mongos.
  • shut down c. copy its data files to d. start config servers c and d.
  • so a,b,c,d cfg servers running.
  • and mongos --configdb a,b,c from above still running
  • then mongos --cofigdb a,b,d would start and did not complain. so that's bad. afterwards:

 
T:\243>mongo --port 27001
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27001/test
configsvr> ^C
bye
 
T:\243>mongo --port 27002
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27002/test
configsvr> ^C
bye
 
T:\243>mongo --port 27003
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27003/test
configsvr> ^C
bye
 
T:\243>mongo --port 28000
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:28000/test
mongos> 
mongos> use admin
switched to db admin
mongos> db.runCommand("isMaster")
{
        "ismaster" : true,
        "msg" : "isdbgrid",
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "localTime" : ISODate("2013-05-17T15:28:54.520Z"),
        "ok" : 1
}
mongos> ^C
bye
 
T:\243>mongo --port 27017
MongoDB shell version: 2.4.3
connecting to: 127.0.0.1:27017/test
mongos> use admin
switched to db admin
mongos> db.runCommand("isMaster")
{
        "ismaster" : true,
        "msg" : "isdbgrid",
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "localTime" : ISODate("2013-05-17T15:34:06.248Z"),
        "ok" : 1
}
mongos> ^C
bye

Comment by Scott Hernandez (Inactive) [ 16/May/13 ]

Dwight, the order/string of the config servers (configdb param to mongos) cannot be changes once set in a running sharded cluster. If you do what you are suggesting you will get an error when you try to start a mongos and it tries to connect to an existing shard(s).

In essence it is impossible to run with two different configdb strings.

Is there an example of user error or a use-case which you have seen where something like this happened without an error?

Generated at Thu Feb 08 03:21:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.