-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 8.1.0-rc0
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
CAR Team 2024-12-23, CAR Team 2025-01-06
-
0
Test Context
The move_primary_failpoint.js expects to successfully complete a transitionToDedicatedConfigServer after every document is moved away from the config shard.
The test runs the following:
- Moves every chunk away by forcing every moveChunk with _waitForDelete=true
- Checks whether the returned status is "completed" at the next execution
The problem
_waitForDelete in {{moveChunk}} guarantees to only return once every orphan document is deleted, but it doesn't guarantee that the rangeDeletion task itself is removed from storage, which will eventually happen soon after the completion of the moveChunk command.
Meanwhile, the transitionToDedicatedConfigServer is completed once no rangeDeletion tasks are found on disk.
Proposed solution
Considering the rangeDeletion task is removed shortly after the moveChunk command, the test should make multiple attempts to verify the transitionToDedicatedConfigServer command is completed, eventually failing only if the completion doesn't occur within a timeout.