Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43262

Devise general solution for BFs needing a higher stepdown interval in stepdown suites

    • Server Tooling & Methods
    • 15

      This ticket came out of a discussion we had on the replication team around a BF (linked) where a retryable write runs out of retries because nodes do not manage to catch-up in the amount of time given between stepdown cycles (in a heavy workload). While the solution to that specific BF could be to blacklist/downsize that workload, we hope to come up with a general solution (or policy) for what happens when the time given by stepdown cycles turns out to not be sufficient, so that we do not have to deal with that on a BF-by-BF basis.

      The purpose of this ticket is to serve as a place to link such BFs and to house a discussion on how we can more broadly address this class of failures.

      A few ideas that came up during one of our BF meetings:

      • Increase the stepdown interval across the board.
      • Make the interval configurable from one variant to another (so we can increase for slow variants).
      • Wait more between retries of operations in our tests and/or increase their deadlines.

      Feel free to voice your opinions in the comments.

            Assignee:
            backlog-server-stm Backlog - Server Tooling and Methods (STM) (Inactive)
            Reporter:
            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: