[SERVER-65315] Prevent split concurrent with tenant migration or merge Created: 06/Apr/22  Updated: 29/Oct/23  Resolved: 29/Sep/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Didier Nadeau
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Backwards Compatibility: Fully Compatible
Sprint: Server Serverless 2022-06-13, Server Serverless 2022-06-27, Server Serverless 2022-08-08, Server Serverless 2022-09-05, Server Serverless 2022-10-03
Participants:
Linked BF Score: 135

 Description   

Each of tenant migration, split, and merge already disallow concurrent instances of their same type.

Additionally, SERVER-59786 disallowed a tenant migration concurrent with merge.

This ticket is to disallow split concurrent with tenant migration or merge, after which we'll have fully implemented the concurrency restrictions.



 Comments   
Comment by Githook User [ 29/Sep/22 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-65315 Enfore mutual exclusion between serverless operations
Branch: master
https://github.com/mongodb/mongo/commit/be197bff5315f3990a598dff20a89d32a24b1e5e

Comment by Githook User [ 29/Sep/22 ]

Author:

{'name': 'Didier Nadeau', 'email': 'didier.nadeau@mongodb.com', 'username': 'nadeaudi'}

Message: SERVER-65315 Use serverless mutual exclusion during file copy initial sync
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/77e9bb43bdf71a1cbb3278c5c4243e63f9989eea

Comment by Githook User [ 16/Sep/22 ]

Author:

{'name': 'Sviatlana Zuiko', 'email': 'sviatlana.zuiko@mongodb.com', 'username': 'szuiko'}

Message: Revert "SERVER-65315 Enforce mutual exclusion between serverless operations"

This reverts commit 1a13031f7cdfb6cffdcff212edef0790fe084df2.
Branch: master
https://github.com/mongodb/mongo/commit/3fe9deb8d8e082b66c65ba4b11e55a395e515ff5

Comment by Githook User [ 15/Sep/22 ]

Author:

{'name': 'Matt Broadstone', 'email': 'mbroadst@mongodb.com', 'username': 'mbroadst'}

Message: SERVER-65315 Enforce mutual exclusion between serverless operations
Branch: master
https://github.com/mongodb/mongo/commit/1a13031f7cdfb6cffdcff212edef0790fe084df2

Comment by Didier Nadeau [ 26/Jul/22 ]

esha.maharishi@mongodb.com I'm transferring this ticket to you as discussed.

Comment by Didier Nadeau [ 17/Jun/22 ]

Follow-up on design :

The lock document would be inserted after the state document (to ensure the POS will resume it when stepping up) but before the access blockers are created (they would be created in the AbortingIndexBuild phase). If the lock document cannot be inserted, we need to abort the instance.

This change also means we need to be careful about access blocker removal when deleting the state document as it might not be the owner of the blockers. We could start a split which insert blockers, then start another split which will fail right away as it can't insert the lock document. Removal of the second split state document could remove the blockers from the first split (if it's still ongoing). The idea to address that issue would be to add and remove blockers by migrationId (which is already done for shard merge).

Comment by Didier Nadeau [ 17/Jun/22 ]

Stopping working on this in favor of SERVER-67247 (to fix a BF issue)

Comment by Didier Nadeau [ 19/May/22 ]

Hi esha.maharishi@mongodb.com / matt.broadstone@mongodb.com , I wanted to let you know what I had in mind. There's a few points to take into consideration with this ticket :

  • TenantMigration/ShardSplit are different instances of PrimaryOnlyService (with different mutex)
  • We need to ensure mutual exclusion when calling getOrCreate (check if conflicting instance exist, then create and insert one while holding the lock)
  • My first though was calling one service from the other `getOrCreate` to check for conflicting instance
    • Difficult to ensure valid mutual exclusion for the whole operation (holding both instances' locks from checking for conflict until insertion of the new instance)
    • It opens a possibility of deadlock (TMDS locks, then acquires split's lock. Split lock then acquires tmd's lock)

On option I thought about is to have another class (let's call it InstanceHolder for now) that holds the list of instances and creates them. Each PrimaryService would have one such class and delegate the creation of instances to it. ShardSplit and TenantMigration could share and InstanceHolder which could ensure no conflicting instances are created.

We'd have 

PrimaryOnlyService::getOrCreate(BSONObj id) {
    return _holder->getOrCreate(id);
};
 
PrimaryOnlyService {
...
std::shared_ptr<InstanceHolder> _holder; // Multiple POS instances can shared the same InstanceHolder
};
 
// It'd require each PrimaryOnlyService::Instance to inherit from the same base type InstanceHolder::Instance
 
class InstanceHolder {
public:
    Instance {
        virtual bool isActive() = 0; // Each instance will override this. For ShardSplit it would be (_state != kCommitted && _state != kAborted).
    };
 
    std::shared_ptr<Instance> getOrCreate(BSONObj id) {
        // Acquire lock, check there's no other active instance, then create a new one with `id`
    }
};
    

I wanted your thoughts on this to see if you had another idea in mind. I also knows there's the effort to refactor the PrimaryOnlyService about to get started so it might come into play.

 

 

Generated at Thu Feb 08 06:02:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.