[SERVER-10719] Warn if network RTT to any config server is (relatively) high Created: 09/Sep/13 Updated: 06/Dec/22 Resolved: 06/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alexander Komyagin | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | upgrading | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding
|
||||
| Participants: | |||||
| Description |
|
Since it's somewhat reasonable to place one of the config servers into Disaster Recovery Zone (which has high latency), the metadata upgrade procedure can take a very long time because of the high network latency. For the config database with a large number of chunks (100k), the upgrade process can take many many hours (6+) when latency to one of the config servers is seconds. (We have seen it take 30-40 min all config servers are within milliseconds ping with that large number of chunks) We should be able to display a warning (esp. during the upgrade process) if we detect that ping time to the config servers, or any config server relative to the other(s), is too high. |
| Comments |
| Comment by Kaloian Manassiev [ 06/Sep/18 ] |
|
Starting in version 3.2 with the introduction of CSRS, one node with high latency is unlikely to impact metadata operations. And since it since it is a replica set, now the replication lag metric offers a good indication into this and is available in cloud manager. For improving monitoring to replica sets, please consider opening a drivers ticket. |