Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84135

Chunk Migration Failure in Shard “error”:”OperationFailed: Data transfer error: migrate failed: WriteConcernFailed: waiting for replication timed out”

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None
    • ALL

    Description

      HI Team,

      I would like to bring to your attention an issue we have been encountering in one of our shared environments during the chunk migration process. This issue has manifested itself after upgrading the MongoDB version from v4.4.25 to v5.0.21.

       

      Here is a summary of the error logs we’ve observed:

       

      {{{"t":

      {"$date":"2023-10-30T19:12:11.717+05:30"}

      ,"s":"I", "c":"SHARDING", "id":21872, "ctx":"Balancer","msg":"Migration failed","attr":

      {"migrateInfo":"DB.Coll: [\{ ID: MinKey }

      , { ID: -92188298389644630XX }), from Shard3, to Shard6","error":"CommandFailed: commit clone failed :: caused by :: startCommit timed out waiting for the catch up completion. Sender's session is Shaed3_Shard6_653fb0640cb752fb246bd6b6. Current session is Shaed3_Shard6_653fb0640cb752fb246bd6b6"}}

      {"t":

      {"$date":"2023-10-30T19:22:58.782+05:30"}

      ,"s":"I", "c": "SHARDING", "id":21872, "ctx":"Balancer","msg":"Migration failed","attr":

      {"migrateInfo":"DB.Coll: [\{ ID: MinKey }

      , { ID: -92188298389644630XX }), from Shard3, to Shard4","error":"OperationFailed: Data transfer error: migrate failed: WriteConcernFailed: waiting for replication timed out"}}}}

      FYI:

      • We have stopped the balancer as a temporary solution.
      • The write concern value has been set to {w:1}, _secondaryThrottle Value also {w:1}.
      • These errors are persistently occurring during chunk migrations. Interestingly, when we manually migrate the same chunk, it is carried out without any errors.

       

      {{{}mongos> db.settings.find()
      { "_id" : "balancer", "mode" : "off", "stopped" : true, "_secondaryThrottle" :

      { "w" : 1 }

      }

      { "_id" : "autosplit", "enabled" : false }

      { "_id" : "ReadWriteConcernDefaults", "defaultWriteConcern" :

      { "w" : 1, "wtimeout" : 0 }

      , "updateOpTime" : Timestamp(1698308057, 1884), "updateWallClockTime" : ISODate("2023-10-26T08:14:17.727Z") }{}}}{}

       

      If anyone has encountered a similar error or has suggestions on how to mitigate this issue, please share your insights. We are actively seeking a resolution to this matter.

      Thank you for your attention and support.

      Attachments

        Activity

          People

            Unassigned Unassigned
            madhu.s@mydbops.com Madhu Sai Vavilala
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: