-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
To reproduce failures: Remove incompatible_aubsan tag from jstests/core_sharding/resharding/reshard_collection_basic.js
In the burn_in suites for sharded_retryable_writes_downgrade running on [jstests_affected] * Shared Library {A,UB}SAN Enterprise RHEL 8 DEBUG Experimental (all feature flags) and [jstests_affected] * Shared Library {A,UB}SAN Enterprise RHEL 8 DEBUG Experimental there are system failures and timeouts for reshard_collection_basic.js. Many of the timeouts follow a similar pattern:
- During one of the success cases in reshard_collection_basic.js, resharding proceeds while the ContinuousStepdown hook steps down replica set primaries in the background.
- Resharding completes for that test case.
- Following this, the thread running the ContinuousStepdown hook throws an exception due to timing out waiting for a primary to step up on one of the replica sets.
- After this occurs, the next test cases in reshard_collection_basic.js run until hitting the evergreen timeout. Resharding appears to complete normally.
It seems likely that this behavior is somehow elicited by the slowness of the variant, since it's running with {A,UB}SAN and the same test/suite running on the RHEL 8 variant without {A,UB}SAN passes. It seems possible that resharding itself is working fine, but that whatever is causing the ContinuousStepdown hook to time out is taking long enough that the test runs out of time to complete.
Failing patch
Filtered timeout logs
- is related to
-
SERVER-99189 Test that reshardCollection command can be retried and joined across failovers in a multiversion cluster
-
- Closed
-