Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.1.1, 6.2.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.1, v6.0
Sprint:
Server Serverless 2022-10-17, Server Serverless 2022-10-31
Linked BF Score:
6
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In general, a tenant migration statement data will be marked as garbage collectable after donorForgetMigration. Then, it will be automatically deleted by TTLMonitor once expired or it will be deleted by the next tenant migration request on the same tenant.

But, there is a corner case that TTLMonitor thread and TenantMigrationRecipientService thread may try to delete the same tenant migration data concurrently. For example,
Time 1: [TenantMigrationRecipientService thread] check and get the existing mtab of the migration.
Time 2: [TTLMonitor thread] delete the mtab due to time expired.
Time 3: [TenantMigrationRecipientService thread] delete the mtab. Today, a ConflictingOperationInProgress error is return. This is not expected. Here is the Code to be investigated.

When checking if there is another ongoing we try to delete the existing state document only if it was marked for garbage collection so the new migration can keep on going. However the code logic handles the following situation :

If the state document doesn't have the "expireAt" field, we return false since nDeleted will be 0.
If the state document is deleted while we try to delete the state document, nDeleted will return 0 as well. This is wrong, if the state document is being deleted at the same time as we try to delete it, we should process with the migration as it is the same behavior as if we deleted it ourselves.

We do not have a way to differentiate the second case from the first one and therefor we throw "ConflictingOperationInProgress". This ticket is to improve the code to handle that potential race condition.

related to

SERVER-70773 Skip rebuilding instance on stepup in tenant migration recipient test

Closed

Assignee:: Mathis Bessa (Inactive)
Reporter:: Sophia Tan (Inactive)
Participants:: Githook User, Liubov Molchanova, Mathis Bessa, Sophia Tan
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jul 08 2022 10:42:06 PM UTC
Updated:: Oct 29 2023 09:35:47 PM UTC
Resolved:: Oct 20 2022 03:34:29 PM UTC
Confidence Status Last Update:: 23/Sep/22 3:07 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates