Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 17 (07/15/16), Sharding 18 (08/05/16), Sharding 2016-08-29
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

If a moveChunk command fails for various reasons, the migration is abandoned. There are cases that must be evaluated to see on what kind of errors the migration should be retried. There must also be some kind of migration retry counter logic to make sure we don't reschedule a migration endlessly and never return from MigrationManager::scheduleMigrations.

A few specific things to think through:

moveChunk command errors in MigrationManager::_checkMigrationCallback

should retry on network errors
should retry on conflicting migration errors. We should really maintain a map of shards performing migrations initiated by the balancer so that we know when the conflict is because we already scheduled a migration with the shard or it's an external cause – and so we don't schedule the migration in the first place if we know it won't work.
LockBusy errors when we already know it's an old 3.2 shard (second LockBusy error on moveChunk) – do we even want to reschedule?

MigrationManager::_executeMigrations

scheduleRemoteCommand errors (callbackhandle check)

has to be done after

SERVER-24853 Refactor Balancer code to use MigrationManager in order to move chunks in parallel

Closed

Assignee:: Dianna Hohensee (Inactive)
Reporter:: Dianna Hohensee (Inactive)
Participants:: Dianna Hohensee
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Jun 30 2016 06:53:04 PM UTC
Updated:: Apr 05 2017 04:44:03 PM UTC
Resolved:: Aug 09 2016 04:19:13 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates