[SERVER-30517] Mongos are failing to Calculate Config Difference Created: 04/Aug/17  Updated: 20/Sep/17  Resolved: 11/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.7
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Mohamed Abada Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod.log.gz    
Participants:

 Description   

Greetings,

I hope you're doing well.

I am currently facing the below error and I will appreciate it if someone can help or advise how can I get it fixed :

mongos> db.stats()
2017-07-24T09:12:10.064+0000 E QUERY    Error: error: {
	"$err" : "error loading initial database config information :: caused by :: could not calculate config difference for ns test.new on IP:27019,IP:27019,IP:27019 :: caused by :: can't find shard for: shard2",
	"code" : 13129
}
    at Error (<anonymous>)
    at DBQuery.next (src/mongo/shell/query.js:259:15)
    at DBCollection.findOne (src/mongo/shell/collection.js:189:22)
    at DB.runCommand (src/mongo/shell/db.js:58:41)
    at DB.stats (src/mongo/shell/db.js:30:17)
    at (shell):1:4 at src/mongo/shell/query.js:259

What I am was trying to do is to downscale the cluster and remove one shard, draining completed successfully and all chunks got moved to the primary shard, however after I issued the second shard remove command

 db.runCommand( { removeShard: "shard2" } ) 

I noticed that the shard got removed however one chunk (shard key) was moved back to shard2 befor it got removed ( I have no idea how or why).

mongos> db.chunks.find({shard:"shard2"})
{ "_id" : "test.new-_id_MinKey", "lastmod" : Timestamp(4, 0), "lastmodEpoch" : ObjectId("5641e15eab20f8848d81a6e2"), "ns" : "qs_place.places", "min" : { "_id" : { "$minKey" : 1 } }, "max" : { "_id" : "1446560142605673436" }, "shard" : "shard2" }
mongos> 
 
sh.status Output :
 
	{  "_id" : "test",  "partitioned" : true,  "primary" : "shard1" }
		test.new
			shard key: { "_id" : "hashed" }
			chunks:
				shard1	1
				shard2	1
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : "1446560142605673436" } on : shard2 Timestamp(4, 0) 
			{ "_id" : "1446560142605673436" } -->> { "_id" : { "$maxKey" : 1 } } on : shard1 Timestamp(4, 1) 

I tried db.adminCommand("flushRouterConfig") but it didn't help.

Any idea what caused this issue and how can I get it resolved. ?

Best Regards,
Mohamed Abada



 Comments   
Comment by Mohamed Abada [ 18/Aug/17 ]

Thanks a lot Thomas Schubert for your support on this issue.

I would like to confirm that adding shard manually from config server side WORKED successfully.

Below you can find what I did exactly in-case some-one else faced the same issue :

1- At the 3 config server I add the shard as following :

```use config;
db.shards.insert(

{ _id: "shard2", host: "shard2/IP-Address:27017,IP-Address:27017,IP-Address:27017", maxSize: 1 }

);```

Note maxSize : 1 was used for blancer not to distribute any chunks to Shard2.

2- At Mongos side :
```use admin
db.adminCommand(

{ flushRouterConfig: 1}

);
```
3- Moved the problematic chunk which remained at Shard2 manually from shard2 to shard 1
```sh.moveChunk("qs_place.places",

{ ObjectId: "53187" }

, "shard1")```

4- Drained Shard2.
```db.runCommand(

{ removeShard: "shard2" }

)```

I will consider the upgrade ofcourse to avoid such issues in the future.

Thanks a lot again.

Best Regards,
Mohamed Abada

Comment by Kelsey Schubert [ 11/Aug/17 ]

Hi muhamed.abada,

Thank you for providing the logs. To resolve this issue, I would recommend manually adding the shard and removing it again:

use config;
db.shards.insert({ _id: 'shard2', host: <rs or standalone connection string for shard2> });
db.adminCommand({ flushRouterConfig: 1});
db.removeShard('shard2');

As I mentioned earlier, we have put significant effort into improving sharding behavior. Therefore, I would recommend upgrading to MongoDB 3.2 and upgrading to CSRS to take advantage of these improvements.

Kind regards,
Thomas

Comment by Mohamed Abada [ 11/Aug/17 ]

Dear Thomas Schubert,

Any update from your side about the logs ?

Also is it possible to drop this collection and restore it from backup ?

Best Regards,
Mohamed Abada

Comment by Mohamed Abada [ 08/Aug/17 ]

mongod.log.gz

Hi ~Thomas Schubert,

Thanks a lot for your feedback and for the time you're allocating to this issue.

Attached you can find Shard2 Primary logs requested.

As a note because the log file is quit big , Draining finished at 21 of July.

Let me know if any additional information is required.

Best Regards,
Mohamed Abada

Comment by Kelsey Schubert [ 07/Aug/17 ]

Hi muhamed.abada,

Thanks for clarifying the affected version. In MongoDB 3.2 and later we have introduced stricter checks which may prevent this type of situation from reoccurring (e.g. SERVER-21896). Therefore, I would recommend upgrading to take advantage of these improvements.

Since restarting the mongos did not resolve the issue, would you please provide the complete logs from the primary of shard2 so we can better understand what happened here?

Thank you,
Thomas

Comment by Mohamed Abada [ 07/Aug/17 ]

Hi Thomas Schubert,

Any update here ?

Best Regards,
Mohamed Abada

Comment by Mohamed Abada [ 04/Aug/17 ]

Hi Thomas Schubert,

One note from my side as I tried the below :

1- Backed up everything.
2- Stopped mongos and mongod instances.
3- Synced config server DB path across the 3 config servers.
4- Started all the instances.

And issue persist, not sure how it this chunk got moved to shard2 and how it got removed or drained awhile it contain 1 chunk.

Best Regards,
Mohamed Abada

Comment by Mohamed Abada [ 04/Aug/17 ]

Hi @Thomas Schubert,

version v3.0.7

Best Regards,
Mohamed Abada

Comment by Kelsey Schubert [ 04/Aug/17 ]

Hi muhamed.abada,

Would you please clarify which version of MongoDB you are using?

Thank you,
Thomas

Generated at Thu Feb 08 04:24:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.