-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Unknown
-
None
-
Affects Version/s: None
-
Component/s: Testing
-
None
Context
Failpoints currently aren't reliable on sharded clusters for multiple reasons, including:
- mtest.SetFailPoint only sets a failpoint on one mongoS node by default. If a subsequent operation doesn't select the same mongoS, that can lead to non-deterministically failing tests because the failpoint isn't applied.
- mongoS doesn't block for the full duration specified by blockTimeMS. See SERVER-96344.
Definition of done
- Set failpoint on every mongoS in sharded cluster.
- Limit tests that use failpoints on sharded clusters to server versions where SERVER-96344 is fixed.
Pitfalls
- Setting a failpoint on every mongoS may lead to confusing behavior because a failpoint might still be active on a different mongoS even after the expected failpoint is triggered. Maybe we need to require that failpoints on sharded clusters can only be used by clients connected to a single mongoS?
- is blocked by
-
SERVER-96344 mongos doesn't honor a failpoint's full blockTimeMS
-
- Open
-
- is related to
-
GODRIVER-3322 Uploading a GridFS Stream with client-level timeout cancels context
-
- Closed
-
- related to
-
GODRIVER-3638 Prohibit using failpoints on sharded topologies
-
- Ready for Work
-