[SERVER-57738] sharding cluster, clients cannot connect to mongos successfully Created: 16/Jun/21  Updated: 22/Jun/21  Resolved: 22/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Xana Wang Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-52654 new signing keys not generated by the... Closed
Operating System: ALL
Participants:

 Description   

I found my clients or mongo shell cannot connect to mongos occasionally.

Mongo shell showed that it was connecting, but timed out.

After I stepDown config server primary, all the clients can immediately connected.

Is this a bug which had been fix in 4.4.6 or some of my configurations aren't correct?

Log of mongos or config server seems nothing wrong.



 Comments   
Comment by Eric Sedor [ 22/Jun/21 ]

Happy to help, xanawang@gmail.com!

Comment by Xana Wang [ 21/Jun/21 ]

It seems my problem was caused by SERVER-52654 because my 4 different clusters came up with mongos connection failure one by one in two days. Thank you for your information!

Comment by Eric Sedor [ 16/Jun/21 ]

Hi xanawang@gmail.com,

What you describe could be caused by SERVER-52654, fixed in 4.4.3. Can you take a look at the description of that ticket and see if the expiration of HMAC keys coincides with the occasional issues you have noticed?

If not, then we'll need more information:

I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

1) Select a specific mongos to connect to, and obtain the timeout errors and timestamps (with time zone) for failed connection attempts to that mongos
2) Initiate the CSRS failover to correct the issue.
3) Connect successfully to the same mongos and provide that timestamp (with timezone)
4) Then, collect and upload information from the following nodes in your cluster:

  • The mongos that produced the timeout error
  • Each node of the config server replica set

For all four of these nodes spanning a time period that includes the unsuccessful connection, the failover, and the successful connection, would you please archive (tar or zip) and upload to the above link:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)
Comment by Xana Wang [ 16/Jun/21 ]

in addition, mongos did not exit, it just can't be connected and accepted the connection just after the stepDown of config server.

Generated at Thu Feb 08 05:42:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.