[SERVER-44648] Sharded Cluster not working when one of the shard goes down Created: 15/Nov/19  Updated: 19/Nov/19  Resolved: 19/Nov/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.13
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Vivek Vishwakarma Assignee: Carl Champain (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I setup a testing environment to test the availability of sharded Mongo cluster. Below is my setup:

1. All the Mongo components are running on my local machine in windows
2. The cluster has 2 shard, A, B, and C. Each shard is a replica set of 3 mongod servers
3. There are three config servers running
4. 1 mongos running

After the above setup is done, a db and a collection of the db are enabled with sharding. Shared collection is then populated with some data. By checking shard status, it is confirmed that both shards carry data in the sharded collection.

Then the following steps are taken for the sharded cluster availability test.
1. perform a select and insert operation on sharded collection. Both operations are successful.
2. Shut down a secondary node of Shard A. Then, perform a select and insert operation on sharded collection. Again both operations are successful. Query for selecting and inserting was happening in mongos server.
3. Shut down one more secondary node of Shard A. perform a select and insert operation on sharded collection. Again both operations are successful. Query for selecting and inserting was happening in mongos server

4. Shut down primary node of Shard A. Query for selecting and inserting was happening in mongos server but query is giving error 

{ 
   "ok":0,
   "errmsg":"Could not find host matching read preference \{ mode: \"secondarypreferred\", tags: [ {} ] } for set s0",
   "code":133,
   "codeName":"FailedToSatisfyReadPreference",
   "operationTime":Timestamp(1573819851,
   2),
   "$clusterTime":{ 
      "clusterTime":Timestamp(1573819851,
      2),
      "signature":{ 
         "hash":BinData(0,
         "AAAAAAAAAAAAAAAAAAAAAAAAAAA="         ),
         "keyId":NumberLong(0)
      }
   }
}

These all my setup command which is used 

shard1

 mongod --replSet s0 --logpath "/data3/s0-r0.log" --dbpath /data3/shard0/rs0 --port 37017 --shardsvr --smallfiles
 mongod --replSet s0 --logpath "/data3/s0-r1.log" --dbpath /data3/shard0/rs1 --port 37018 --shardsvr --smallfiles
 mongod --replSet s0 --logpath "/data3/s0-r2.log" --dbpath /data3/shard0/rs2 --port 37019 --shardsvr --smallfiles
 mongo --port 37017
 config = { _id: "s0", members:[{ _id : 0, host : "localhost:37017" },
 { _id : 1, host : "localhost:37018" },
 { _id : 2, host : "localhost:37019" }]};
 rs.initiate(config);

shard2

 mongod --replSet s1 --logpath "/data3/s1-r0.log" --dbpath /data3/shard1/rs0 --port 47017 --shardsvr --smallfiles
 mongod --replSet s1 --logpath "/data3/s1-r1.log" --dbpath /data3/shard1/rs1 --port 47018 --shardsvr --smallfiles
 mongod --replSet s1 --logpath "/data3/s1-r2.log" --dbpath /data3/shard1/rs2 --port 47019 --shardsvr --smallfiles
 mongo --port 47017
 config = { _id: "s1", members:[{ _id : 0, host : "localhost:47017" },
 { _id : 1, host : "localhost:47018" },
 { _id : 2, host : "localhost:47019" }]};
 rs.initiate(config);

shard3

mongod --replSet s2 --logpath "/data3/s2-r0.log" --dbpath /data3/shard2/rs0 --port 57017 --shardsvr --smallfiles
 mongod --replSet s2 --logpath "/data3/s2-r1.log" --dbpath /data3/shard2/rs1 --port 57018 --shardsvr --smallfiles
 mongod --replSet s2 --logpath "/data3/s2-r2.log" --dbpath /data3/shard2/rs2 --port 57019 --shardsvr --smallfiles
 mongo --port 57017
 config = { _id: "s2", members:[{ _id : 0, host : "localhost:57017" },
 { _id : 1, host : "localhost:57018" },
 { _id : 2, host : "localhost:57019" }]};
 rs.initiate(config);

config server

 mongod --replSet cs --logpath "/data3/cfg-a.log" --dbpath /data3/config/config-a --port 57040 --configsvr --smallfiles
 mongod --replSet cs --logpath "/data3/cfg-b.log" --dbpath /data3/config/config-b --port 57041 --configsvr --smallfiles
 mongod --replSet cs --logpath "/data3/cfg-c.log" --dbpath /data3/config/config-c --port 57042 --configsvr --smallfiles

mongo --port 57040

 config = { _id: "cs", members:[
{ _id : 0, host : "localhost:57040"}
,
{ _id : 1, host : "localhost:57041"}
,
{ _id : 2, host : "localhost:57042"}]};
 rs.initiate(config);

mongos server

 mongos --logpath "/data3/mongos-1.log" --configdb cs/localhost:57040,localhost:57041,localhost:57042

mongo

 db.adminCommand( { addshard : "s0/"+"localhost:37017", name : "shard0" } );
 db.adminCommand( { addshard : "s1/"+"localhost:47017", name : "shard1" } );
 db.adminCommand( { addshard : "s2/"+"localhost:57017", name : "shard2" } );
 db.adminCommand( {enableSharding: "countingwell"} );
 db.adminCommand( {shardCollection: "countingwell.users", key: {_id:"hashed"} });

as per Mongo documentation, it is not the expected behavior. Can you point out if there is anything wrong in my setup/steps above?



 Comments   
Comment by Carl Champain (Inactive) [ 19/Nov/19 ]

Hi vvishwakarma123@gmail.com,

I successfully tested your example and the described behavior is expected. When you run a find query, mongos sends it to each shard in your cluster which can take read operations (depending on your read preference settings). if one of the replica set shards is down, then mongos can't communicate with any of the members in this replica set. So it will return an error and abort the operation, even though there are other replica set shards available in the cluster.
A solution is to add the field allowPartialResults in the find database command; it should return partial results even if some shards are unavailable, instead of throwing an error. 

That said, the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Kind regards,
Carl

Comment by Vivek Vishwakarma [ 15/Nov/19 ]

after stopped all the nodes from shard A i am trying to do query on mongos like show dbs or use commands so i am getting that error in that case i have not stop and restart the MongoS

Comment by Kaloian Manassiev [ 15/Nov/19 ]

So, just to be clear - after you have stopped all the nodes from shardA, you stop and restart MongoS? Or are you running some query, which needs to access/write data that falls on ShardA?

Can you please write the exact steps that you are doing and the exact queries/inserts that you are running?

Comment by Vivek Vishwakarma [ 15/Nov/19 ]

Kaloian Manassiev what i am trying to do i have 3 shard with 3 replication one primary and 2 secondary all is running and my shard collections data is distributed within the 3 shard but i have closed mongod server for shard one with replication primary and 2 secondary and after that when i am trying to connect mongos server that time mongos server is not connectiong and it is giving me this error

{
"ok" : 0,
"errmsg" : "Could not find host matching read preference { mode: \"secondarypreferred\", tags: [ {} ] } for set s0",
"code" : 133,
"codeName" : "FailedToSatisfyReadPreference",
"operationTime" : Timestamp(1573819851, 2),
"$clusterTime" : {
"clusterTime" : Timestamp(1573819851, 2),
"signature" :

{ "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) }

}
}

Comment by Vivek Vishwakarma [ 15/Nov/19 ]

@Ryan Chipman. Yes the questions about the MongoDB itself but i am not getting solutions where i am wrong to created the sharded cluster if one shard goes down so why whole sharded cluster goes down can you correct me it is important.

Comment by Kaloian Manassiev [ 15/Nov/19 ]

vvishwakarma123@gmail.com, what do you mean by "query for selecting and inserting"? If you are issuing a find operation, and that operation needs to return data, which is located on the shard whose nodes you brought down, then the returned error is appropriate.

Can you please elaborate a little bit on the placement of data that you have in that cluster and on what exactly find and insert operations are you running?

-Kal.

Comment by Ryan Chipman [ 15/Nov/19 ]

Moving to the SERVER project, since this is a question about MongoDB itself, not the Tools

Generated at Thu Feb 08 05:06:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.