Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0
Affects Version/s: None
Component/s: Replication
Labels:
- pm-1791_milestone-D

Backwards Compatibility:
Fully Compatible
Sprint:
Repl 2020-12-14
Linked BF Score:
15
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I think we agree that:

If a donorStartMigration encounters a conflicting migration that is not yet marked as garbage collectable, the donorStartMigration should fail.

The question is what to do for:

If the donorStartMigration encounters a conflicting migration that is marked as garbage collectable.

I think the options are:

The donorStartMigration should fail (current).
The donorStartMigration should immediately garbage collect the old migration and start the new one.
- If the new migration has a different migrationId but is for the same tenant:
  - A delayed donorStartMigration from the first migration will get ConflictingOperationInProgress, which should be harmless, since Cloud shouldn't care about the response anymore.
  - A delayed donorForgetMigration from the first migration will get NoSuchTenantMigration, which should also be harmless, since Cloud shouldn't care about the response anymore.
- If the new migration has the same migrationId but is for a different tenant:
  - This is not a legal thing for Cloud to do, so we can say the behavior is undefined.
Allow donorForgetMigration to take a "garbageCollectImmediately: true" flag that Cloud should use if they want to retry a migration quickly.
- This is only best-effort, since it's possible for donorForgetMigration to garbage collect the state, then a delayed retry of the first donorStartMigration to restart the first migration, then Cloud tries to start the second migration and the second migration still fails since there's a conflicting active migration.

I think the second option is most practical, since it's the least amount of work for Cloud and has harmless side-effects.

EDIT:
In the end, we decided to remove the TenantMigrationAccessBlocker entry when we mark an aborted migration document as garbage collectable. In the second option, if we immediately garbage collect the old state doc and insert the new one, then it would be a problem if donor fails over in between. In that case, we could lose the old state doc without inserting the new one. If a delayed donorStartMigration from the first migration then comes in, we could mistakenly start a migration. So the property we are maintaining instead is "aborted garbage collectable documents do not have a TenantMigrationAccessBlocker entry". To maintain this property, we need to:
1. remove the mtab entry when we mark an aborted document as garbage collectable.
2. avoid creating the mtab entry for aborted garbage collectable documents when recovering mtabs from startup/rollback.
3. remove the op observer onDelete code that deletes the mtab entry for aborted state doc since this entry will have already been deleted.

related to

SERVER-53220 Not recover the TenantMigrationAccessBlocker if the donor state doc has been marked as garbage collectable

Closed

Assignee:: Vishnu Kaushik
Reporter:: Judah Schvimer
Participants:: Githook User, Judah Schvimer, Lingzhi Deng, Vishnu Kaushik
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Nov 24 2020 07:15:17 PM UTC
Updated:: Oct 29 2023 10:00:02 PM UTC
Resolved:: Dec 11 2020 03:39:11 PM UTC
Confidence Status Last Update:: 01/Dec/20 9:16 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates