[SERVER-84384] Resharding test infrastructure must be resilient to intermittent errors. Created: 21/Dec/23 Updated: 07/Feb/24 |
|
| Status: | In Code Review |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nandini Bhartiya | Assignee: | Aitor Esteve Alvarado |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Catalog and Routing
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | CAR Team 2024-02-05, CAR Team 2024-02-19 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 161 | ||||||||
| Story Points: | 2 | ||||||||
| Description |
|
As seen in https://jira.mongodb.org/browse/BF-31177, resharding hit an intermittent error but was able to restart and move towards completion. However, the test interpreted the mongos error response as a resharding failure and proceeded to run the metadata consistency checks and end the test even though resharding was not yet complete. This test (and maybe the resharding test infrastructure) must be modified and made resilient to retryable errors. |
| Comments |
| Comment by Githook User [ 07/Feb/24 ] |
|
Author: {'name': 'atesteve', 'email': 'aitor.esteve@mongodb.com', 'username': 'atesteve'}Message: SERVER-84384 Resharding test infrastructure must be resilient to intermittent errors (#18767) GitOrigin-RevId: 8e4947825046a1a734c822c264b3d5a2134a03a4 |
| Comment by Githook User [ 29/Jan/24 ] |
|
Author: {'name': 'Aitor Esteve Alvarado', 'email': 'aitor.esteve@mongodb.com', 'username': 'atesteve'}Message: Revert "SERVER-84384 Ignore system.resharding.* collections in checkHistoricalPlacementMetadataConsistency (#18410)" This reverts commit 16eccaef8a1066146ff72e49cf934afd1d725a7c. GitOrigin-RevId: f0478c6df5266423f5991db784d64a599eb748a0 |
| Comment by Githook User [ 29/Jan/24 ] |
|
Author: {'name': 'atesteve', 'email': 'aitor.esteve@mongodb.com', 'username': 'atesteve'}Message: SERVER-84384 Ignore system.resharding.* collections in checkHistoricalPlacementMetadataConsistency (#18410) GitOrigin-RevId: 16eccaef8a1066146ff72e49cf934afd1d725a7c |
| Comment by Max Hirschhorn [ 27/Dec/23 ] |
|
Antithesis is designed to ignore the errors from individual JavaScript tests because our tests were not authored to handle intermittent errors (e.g. network errors). It is not practical as a general solution to retry within tests because some operations can still lead individual assertion statements to throw an exception (e.g. total count of number of documents updated not matching). Instead the errors which Antithesis propagates are related to properties which must always hold true such as the server not crashing and our data consistency checks. To address the CheckRoutingTableConsistency hook failure in BF-31177, either (a) the RoutingTableConsistencyChecker hook must either wait for the resharding operation to complete or (b) the RoutingTableConsistencyChecker hook must ignore inconsistencies related the system.resharding collection and config.placementHistory when running in Antithesis. Data consistency checks are generally expected to wait for the system to have quiesced. (For historical context, a special procedure involving no-op collMod was used to drain any index builds still running as part of running the dbhash check.) A test failure is not expected to also lead to a hook failure. CC paolo.polato@mongodb.com |