[SERVER-5257] When duplicate oids are found by the balancer, validate the replica set name Created: 08/Mar/12  Updated: 07/Mar/23  Resolved: 27/Feb/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.3, 2.1.0
Fix Version/s: None

Type: Improvement Priority: Trivial - P5
Reporter: Ben Becker Assignee: Sergi Mateo Bellido
Resolution: Won't Do Votes: 0
Labels: RachitaD, sharding-common-backlog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

All


Issue Links:
Related
Assigned Teams:
Sharding EMEA
Sprint: Sharding EMEA 2023-01-23, Sharding EMEA 2023-02-06, Sharding EMEA 2023-02-20, Sharding EMEA 2023-03-06
Participants:

 Description   

Currently, Balancer::_checkOIDs() produces this error message if duplicate oids are detected:

    log() << "error: 2 machines have " << x << " as oid machine piece " << s.toString() << " and " << oids[x].toString() << endl;

This error message can occur if the config server's shards collection contains unique 'host' strings, but with the same replica set name. The following configuration error could cause this:

mongos> db.shards.find()
{ "_id" : "sh1", "host" : "rs1/localhost:27017,localhost:27018,localhost:27019" }
{ "_id" : "sh2", "host" : "rs1/localhost:27027,localhost:27028,localhost:27029" }

The intended configuration is to use 'rs2' for the second line in this example. Although quite rare, this is a very easy error to overlook because the hostnames and shard names are correct. It might be nice to add logic to detect and print a more robust error message when multiple shards are pointing at the same replica set name.

Note that this check could likely live in MoveChunkCommand and SplitChunkCommand.



 Comments   
Comment by Connie Chen [ 21/Dec/22 ]

Putting into investigation to see if this is still an issue we see, given how old this ticket is.

Generated at Thu Feb 08 03:08:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.