Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-15168

[SERVER] set TemporarilyUnavailableException RetryBackoff higher

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      SERVER-64050:

      Original Downstream Change Summary

      MongoDB may, in rare circumstances, return a new TemporarilyUnvailable error when a write operation is excessively rolled-back in the storage engine due to cache pressure. The operation may be at fault for writing too much data, or it may be a victim. That information is not exposed. However, if this error is returned, it is very likely that the operation was the cause of the problem, rather than a victim.

      Previously, this type of error was retried indefinitely inside MongoDB. Now, MongoDB will retry an operation internally at most 10 times, backing off for
      1 second, with a linearly-increasing backoff, in total up to 55 seconds. After this point, the error will be returned to the user. This can be tuned with the setParameters "temporarilyUnavailableMaxRetries" and "temporarilyUnavailableBackoffBaseMs".

      If an operation receives a TemporarilyUnavailable error internally, a "temporarilyUnavailableErrors" counter will be displayed in the slow query logs and in FTDC.

      This does not apply to multi-document transactions, which will continue to return a WriteConflict in this scenario without retrying internally.

      Description of Linked Ticket

      Currently, TemporarilyUnavailableException::kRetryBackoff is set to 100 milliseconds, which doesn't seem like enough delay to give the retry of the operation a good chance to succeed. I recommend we change this value to at least 2 seconds, and keep the number of retries hardcoded at "3". Thus, with linear increasing of the delay for each retry, prior to the last retry the delay will be 6 seconds, which doesn't seem unreasonable for an overloaded system. Control of this delay server-side is important since returning a retriable error immediately with no retrying will typically result in immediate retry by the drivers with no delay.

            Assignee:
            dave.cuthbert@mongodb.com Dave Cuthbert (Inactive)
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved:
              1 year, 32 weeks, 6 days ago