[SERVER-24854] Add retry logic to MigrationManager Created: 30/Jun/16  Updated: 05/Apr/17  Resolved: 09/Aug/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done after SERVER-24853 Refactor Balancer code to use Migrati... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 17 (07/15/16), Sharding 18 (08/05/16), Sharding 2016-08-29
Participants:

 Description   

If a moveChunk command fails for various reasons, the migration is abandoned. There are cases that must be evaluated to see on what kind of errors the migration should be retried. There must also be some kind of migration retry counter logic to make sure we don't reschedule a migration endlessly and never return from MigrationManager::scheduleMigrations.

A few specific things to think through:

moveChunk command errors in MigrationManager::_checkMigrationCallback

  • should retry on network errors
  • should retry on conflicting migration errors. We should really maintain a map of shards performing migrations initiated by the balancer so that we know when the conflict is because we already scheduled a migration with the shard or it's an external cause – and so we don't schedule the migration in the first place if we know it won't work.
  • LockBusy errors when we already know it's an old 3.2 shard (second LockBusy error on moveChunk) – do we even want to reschedule?

MigrationManager::_executeMigrations

  • scheduleRemoteCommand errors (callbackhandle check)


 Comments   
Comment by Dianna Hohensee (Inactive) [ 09/Aug/16 ]

We've been making changes regarding retries as we complete other tickets in the parallel balancing project. This is no longer necessary.

Generated at Thu Feb 08 04:07:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.