Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46425

Consider increasing wtimeout for cloneCatalogData or no timeout

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Current setting is majority write concern with a 60 sec wtimeout. However, the clone can potentially generate lots of writes and index builds, which can cause it to timeout waiting for replication. In the current master, _movePrimary will attempt to retry because writeConcern errors are treated as a retryable error, but since the collections were already cloned already earlier, it will get a namespace already exists error, which is not retryable and causing the entire _movePrimary command to fail. This can lead to the data ending up as orphans and eventually causing issue described in SERVER-32142

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-sharding Backlog - Sharding Team
              Reporter:
              renctan Randolph Tan
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: