[SERVER-10719] Warn if network RTT to any config server is (relatively) high Created: 09/Sep/13  Updated: 06/Dec/22  Resolved: 06/Sep/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexander Komyagin Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: upgrading
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding
Participants:

 Description   

Since it's somewhat reasonable to place one of the config servers into Disaster Recovery Zone (which has high latency), the metadata upgrade procedure can take a very long time because of the high network latency.

For the config database with a large number of chunks (100k), the upgrade process can take many many hours (6+) when latency to one of the config servers is seconds. (We have seen it take 30-40 min all config servers are within milliseconds ping with that large number of chunks)

We should be able to display a warning (esp. during the upgrade process) if we detect that ping time to the config servers, or any config server relative to the other(s), is too high.



 Comments   
Comment by Kaloian Manassiev [ 06/Sep/18 ]

Starting in version 3.2 with the introduction of CSRS, one node with high latency is unlikely to impact metadata operations. And since it since it is a replica set, now the replication lag metric offers a good indication into this and is available in cloud manager.

For improving monitoring to replica sets, please consider opening a drivers ticket.

Generated at Thu Feb 08 03:23:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.