-
Type:
Task
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Workload Resilience
-
Fully Compatible
-
Workload Resilience 2025-12-22
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In SPM-4003, we introduced a new passthrough suite that simulates the ingress request rate limiter rejecting a high percentage of requests in order to test the server's load shedding retry behavior. One of these passthroughs is derived from sharding_jscore_passthrough_with_balancer, which causes random rebalancing to occur during tests. Sometimes, the balancer kicks in and takes a long time to complete, since many of its requests are rejected. This may cause a large batched write to not make progress for several rounds, since it cannot refresh routing info if resharding is holding a lock. See SERVER-92228 and BF-34032 for a discussion of a similar issue encountered before. As a temporary workaround to solve the attached BF, we should increase the number of no-progress rounds.
e.g.
{"t":{"$date":"2025-12-19T19:59:53.739+00:00"},"s":"D4", "c":"SHARDING", "id":22907, "svc":"R", "ctx":"conn29","msg":"Write results received","attr":{"shardInfo":"localhost:20002","status":"ShardCannotRefreshDueToLocksHeld{ nss: \"test.system.resharding.76b85189-0545-4298-bbdf-a4466d32b73e\" }: Routing info refresh did not complete"}}
{"t":{"$date":"2025-12-19T19:59:53.745+00:00"},"s":"D4", "c":"SHARDING", "id":9986810, "svc":"R", "ctx":"conn29","msg":"Completed round","attr":{"rounds completed":2}}
{"t":{"$date":"2025-12-19T19:59:53.745+00:00"},"s":"D5", "c":"SHARDING", "id":9986809, "svc":"R", "ctx":"conn29","msg":"No progress made this round","attr":{"num rounds without progress":1}}
- is related to
-
SERVER-92228 Revisit the default value of max number of no progress before aborting batch write.
-
- Closed
-
- related to
-
SERVER-115873 Use default maxRoundsWithoutProgressParameter in rate limited sharding with balancer passthrough
-
- Needs Scheduling
-