[SERVER-62282] Migration recovery should be retried until success Created: 28/Dec/21  Updated: 29/Oct/23  Resolved: 18/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Task Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24
Participants:

 Description   

Currently, in case of some errors during a migration (or migration recovery), the donor shard clears it's filtering metadata so that the migration will be recovered the next time a query attempts to use that collection. Some code paths trigger a best-effort recovery, while others don't. Even in the case of the best-effort attempt, it could fail to recover. This is correct, but with the new migration protocol (where the recipient takes the critical section) it may cause long periods of time where the recipient is holding both the critical section (causing collection unavailability) and also holding the ActiveMigrationRegistry (making the recipient shard unable to donate/receive chunks related to any other collection).

This ticket is to evaluate making sure that the migration recovery is retried until success.



 Comments   
Comment by Githook User [ 18/Jan/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-62282 Migration recovery should be retried until success
Branch: master
https://github.com/mongodb/mongo/commit/e7cf35278036ed43468af6bec42225bb7c988946

Generated at Thu Feb 08 05:54:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.