Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63586

Retry to recover the sharding state until it succeeds

    • Fully Compatible
    • Sharding EMEA 2022-03-07, Sharding EMEA 2022-03-21
    • 32

      When a shard starts, if the sharding state recovery document indicates that were metadata change operations in flight, it contacts the primary config server in order to retrive the most recent opTime.

      This procedure should retry until it succeeds, but there is a corner case causing the shard process to crash: when the returned command status is NamespaceExists (perfectly expected scenario), the logic also checks the write concern status and possibly raises an error. If the primary config server stepped down, the write concerne status would be InterruptedDueToReplStateChange, the error is converted to an exception by the caller and process crashes.
       
      A possible solution would be to retry the command for the primary config server when the write conversion status is not ok and the command status is part of a specific list of errors (that includes NamespaceExists).

            Assignee:
            allison.easton@mongodb.com Allison Easton
            Reporter:
            antonio.fuschetto@mongodb.com Antonio Fuschetto
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: