Make failpoints reliable in sharded clusters

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: Testing
    • None
    • Go Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Failpoints currently aren't reliable on sharded clusters for multiple reasons, including:

      • mtest.SetFailPoint only sets a failpoint on one mongoS node by default. If a subsequent operation doesn't select the same mongoS, that can lead to non-deterministically failing tests because the failpoint isn't applied.
      • mongoS doesn't block for the full duration specified by blockTimeMS. See SERVER-96344.

      Definition of done

      • Set failpoint on every mongoS in sharded cluster.
      • Limit tests that use failpoints on sharded clusters to server versions where SERVER-96344 is fixed.

      Pitfalls

      • Setting a failpoint on every mongoS may lead to confusing behavior because a failpoint might still be active on a different mongoS even after the expected failpoint is triggered. Maybe we need to require that failpoints on sharded clusters can only be used by clients connected to a single mongoS?

            Assignee:
            Unassigned
            Reporter:
            Preston Vasquez
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: