Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-104317

All resharding services should retry on WCEs

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      In general, resharding uses wtimeout: 0 when passing in the write concern options. However, catalog cache refreshes sometimes force a no-op write, which uses a wtimeout of 60 seconds. If a resharding service gets a wc timeout, it will treat this error as fatal. This was seen in AF-1904, and and fixed in SERVER-102452 for the specific case hit in that ticket. This ticket is for doing the follow up work to make all resharding services retry on WCEs in general - we should be able to do this by making the kRetryabilityPredicateIncludeWriteConcernTimeout added in SERVER-102452 the default for resharding::withAutomaticRetry. As a part of doing this, we should audit the  idempotency of anything retried by resharding::withAutomaticRetry - SERVER-102452 found that we would reset at least some in-memory state (like metrics), when we really shouldn't be. 

            Assignee:
            Unassigned Unassigned
            Reporter:
            janna.golden@mongodb.com Janna Golden
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: