[SERVER-37540] Operation timed out between config servers and mongos after upgrading from 3.4.16 to 3.4.17 Created: 10/Oct/18 Updated: 26/Oct/18 Resolved: 26/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.17 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Roberto Rodriguez | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | Upgrade from mongoDB cluster from 3.4.16 to mongos 3.4.17, and remote servers with a ping >99ms stopped connecting correctly to config servers. |
| Participants: |
| Description |
|
Hi, A week ago, we upgraded our Shared Cluster, from MongoDB 3.4.16 to 3.4.17. The first step was migrating all mongos to 3.4.17, and later the replicasets and config servers. Our infraestructure is:
The problem cames that all was working correctly before this upgrade, but after it, the remote mongos servers (with the ping of 100ms), apparently aren't able to connect to config servers. I'm attaching the mongos log with network verbosity to 5 and diagnostic.data . But basically when we execute a query, we get the next error:
In the log we get the next:
This is error is not happening on the same or nearest datacenter in which the cluster is deployed. This is our config servers configuration:
This is our mongos configuration (the server have 32 cores and 74 threads, we tried other ShardingTaskExecutor values, with no luck (and before upgrading was working perfect):
Thanks in advance |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 26/Oct/18 ] |
|
Roberto, It is possible that the mongos processes got "locked onto" a "dead" config server during the upgrade process. Generally, we recommend upgrading the mongos processes last when bringing a sharded cluster to a newer version. This ensures that the mongos processes have an up-to-date view of the cluster post-upgrade. I'm going to close this ticket for now but if you do encounter this problem again, please let us know. Thank you, Danny |
| Comment by Roberto Rodriguez [ 26/Oct/18 ] |
|
Hi, The logs rotated and I don't have this logs at the moment. I solved it, installing two config servers in the datacenter with problems. But it's strange, that after upgrading mongos version I didn't have this problems. Thanks |
| Comment by Danny Hatcher (Inactive) [ 24/Oct/18 ] |
|
Hello Roberto, I apologize for the delay in response. Thank you for your detail in the initial description; it is much appreciated. It looks like there was an issue at connection establishment time but the errors present are on the getmore, not the initial find. Do you happen to have logs from the Primary config server at the time of the logs above? If not, and the problem is still occurring, could you attach updated logs from the problem mongos and the Primary config server? Thanks, Danny |