[SERVER-79719] Make cluster_time_across_add_shard.js resilient to slow RSM updates Created: 04/Aug/23 Updated: 12/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Backlog - Cluster Scalability |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | cs-subteam1, sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Cluster Scalability
|
||||
| Participants: | |||||
| Linked BF Score: | 20 | ||||
| Description |
|
With TestData.configShard == true, cluster_time_across_add_shard.js https://github.com/mongodb/mongo/blob/7d3620f267bab213beccbd240a6bd3c0deb89892/jstests/sharding/cluster_time_across_add_shard.js#L172, which uses the config server's replica set monitor to target itself (via the addShard command run internally), and this can fail if the config server's RSM is stale. This is much more likely with the older SDAM RSM protocol than the newer streamable one, so when the test runs with the older protocol, it should retry on those errors or use a test helper to wait until the config primary has updated its RSM before running the transition command. |