[SERVER-63792] Improve coverage of blackholing network requests Created: 17/Feb/22 Updated: 06/Dec/22 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
This is likely a gap in our test coverage that can lead to longer unavailability windows than we'd like. |
| Comments |
| Comment by Robert Guo (Inactive) [ 22/Feb/22 ] |
|
Hm interesting question. We have the datacenter delay simulation mechanism in DSI using tc that could trigger TCP retransmissions with a couple of small tweaks; the same invocation can be run in JS tests. But if we want to just test that this sequence of events do not cause additional delays, mongobridge's delayMessagesFrom and a timer might be sufficient? The latter may be more deterministic and quicker. Another thing to systematically catch this type of issue that we should be able to do if there's enough value/interest is to add/extend a passthrough that delays commands at random with a fixed seed. If the resulting delay to running the next command is beyond some threshold, e.g. > 5x the delay amount of the original command, the test reports this increase in latency to the user. |
| Comment by Judah Schvimer [ 18/Feb/22 ] |
|
We should ensure to include testing of sharded clusters with black holes between mongos and mongod and between shards as well. |