[SERVER-84135] Chunk Migration Failure in Shard “error”:”OperationFailed: Data transfer error: migrate failed: WriteConcernFailed: waiting for replication timed out” Created: 13/Dec/23 Updated: 13/Dec/23 Resolved: 13/Dec/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Madhu Sai Vavilala | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
HI Team, I would like to bring to your attention an issue we have been encountering in one of our shared environments during the chunk migration process. This issue has manifested itself after upgrading the MongoDB version from v4.4.25 to v5.0.21.
Here is a summary of the error logs we’ve observed:
{{{"t": {"$date":"2023-10-30T19:12:11.717+05:30"},"s":"I", "c":"SHARDING", "id":21872, "ctx":"Balancer","msg":"Migration failed","attr": {"migrateInfo":"DB.Coll: [\{ ID: MinKey }, { ID: -92188298389644630XX }), from Shard3, to Shard6","error":"CommandFailed: commit clone failed :: caused by :: startCommit timed out waiting for the catch up completion. Sender's session is Shaed3_Shard6_653fb0640cb752fb246bd6b6. Current session is Shaed3_Shard6_653fb0640cb752fb246bd6b6"}} {"t": {"$date":"2023-10-30T19:22:58.782+05:30"},"s":"I", "c": "SHARDING", "id":21872, "ctx":"Balancer","msg":"Migration failed","attr": {"migrateInfo":"DB.Coll: [\{ ID: MinKey }, { ID: -92188298389644630XX }), from Shard3, to Shard4","error":"OperationFailed: Data transfer error: migrate failed: WriteConcernFailed: waiting for replication timed out"}}}} FYI:
{{{}mongos> db.settings.find() } { "_id" : "autosplit", "enabled" : false }{ "_id" : "ReadWriteConcernDefaults", "defaultWriteConcern" : { "w" : 1, "wtimeout" : 0 }, "updateOpTime" : Timestamp(1698308057, 1884), "updateWallClockTime" : ISODate("2023-10-26T08:14:17.727Z") }{}}}{}
If anyone has encountered a similar error or has suggestions on how to mitigate this issue, please share your insights. We are actively seeking a resolution to this matter. Thank you for your attention and support. |