[SERVER-67926] Delete non-existing garbage collectable tenant migration data should not cause a ConflictingInProgress error Created: 08/Jul/22  Updated: 29/Oct/23  Resolved: 20/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.1, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Sophia Tan Assignee: Mathis Bessa
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-70773 Skip rebuilding instance on stepup in... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1, v6.0
Sprint: Server Serverless 2022-10-17, Server Serverless 2022-10-31
Participants:
Linked BF Score: 6

 Description   

In general, a tenant migration statement data will be marked as garbage collectable after donorForgetMigration. Then, it will be automatically deleted by TTLMonitor once expired or it will be deleted by the next tenant migration request on the same tenant.

But, there is a corner case that TTLMonitor thread and TenantMigrationRecipientService thread may try to delete the same tenant migration data concurrently. For example, 
Time 1: [TenantMigrationRecipientService thread] check and get the existing mtab of the migration.
Time 2: [TTLMonitor thread] delete the mtab due to time expired.
Time 3: [TenantMigrationRecipientService thread] delete the mtab. Today, a ConflictingOperationInProgress error is return.  This is not expected. Here is the Code to be investigated.

When checking if there is another ongoing we try to delete the existing state document only if it was marked for garbage collection so the new migration can keep on going. However the code logic handles the following situation :

  • If the state document doesn't have the "expireAt" field, we return false since nDeleted will be 0.
  • If the state document is deleted while we try to delete the state document, nDeleted will return 0 as well. This is wrong, if the state document is being deleted at the same time as we try to delete it, we should process with the migration as it is the same behavior as if we deleted it ourselves.

We do not have a way to differentiate the second case from the first one and therefor we throw "ConflictingOperationInProgress". This ticket is to improve the code to handle that potential race condition.



 Comments   
Comment by Githook User [ 01/Nov/22 ]

Author:

{'name': 'mathisbessamdb', 'email': 'mathis.bessa@mongodb.com', 'username': 'mathisbessamdb'}

Message: SERVER-67926 Delete non-existing garbage collectable tenant migration data should not cause a ConflictingInProgress error
Branch: v6.1
https://github.com/mongodb/mongo/commit/21c50cca8176fa618aa45878eca007e6fb50b31f

Comment by Liubov Molchanova [ 29/Oct/22 ]

Requesting a backport for v6.1 as the issue reproduced in BFG-1569471

Comment by Githook User [ 20/Oct/22 ]

Author:

{'name': 'mathisbessamdb', 'email': 'mathis.bessa@mongodb.com', 'username': 'mathisbessamdb'}

Message: SERVER-67926 Delete non-existing garbage collectable tenant migration data should not cause a ConflictingInProgress error
Branch: master
https://github.com/mongodb/mongo/commit/8f524446f17abc2c042aa6ef5f81b70ba6513438

Generated at Thu Feb 08 06:09:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.