[SERVER-23667] Allow deterministic lockSessionID assignment for distlocks, so that after CSRS failover processes on the config primary can reacquire locks immediately. Created: 12/Apr/16  Updated: 26/Apr/16  Resolved: 20/Apr/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.3.5

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sharding 13 (04/22/16)
Participants:

 Description   

In order to enable new config primaries to immediately gain access to the locks held by the old config primary, rather than waiting around for locks to expire, the lockSessionID must be specifiable and not always random so that the new primary knows what it is.

Current signature:

StatusWith<DistLockManager::ScopedDistLock> ReplSetDistLockManager::lock(
    OperationContext* txn,
    StringData name,
    StringData whyMessage,
    milliseconds waitFor,
    milliseconds lockTryInterval);

Add new function:

StatusWith<DistLockManager::ScopedDistLock> ReplSetDistLockManager::lockWithSessionID(
    OperationContext* txn,
    StringData name,
    StringData whyMessage,
    milliseconds waitFor,
    milliseconds lockTryInterval,
    OID lockSessionID);

ReplSetDistLockManager::lock will generate a OID and call ReplSetDistLockManager::lockWithSessionID, which will contain all the current implementation with the modification of not generating the lockSessionID that it now receives and comparing the lockSessionID to the lock's session ID to check whether the lock can be overtaken on a match.



 Comments   
Comment by Githook User [ 20/Apr/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-23667 can overtake locks by using the same lockSessionID as the lock owner, via DistLockManager::lockWithSessionID
Branch: master
https://github.com/mongodb/mongo/commit/22082d01a15a589398f3db6f9357dedd1a4c73fe

Comment by Dianna Hohensee (Inactive) [ 13/Apr/16 ]

Yep, anyone who wants to specify the lockSessionID will have to deliberately call ReplsetDistLockManager::lockWithSessionID, and all the current implementation will still go through ReplsetDistLockManager::lock which generates a random lock session ID as per usual.

Updated the title and the end of the description, hopefully that covers it. Technically any process that comes back to life after a failover, config or shard, will be able to reacquire the distlock if it's still held by that lockSessionID. I think it's just as safe from a shard process as a config process. Users will have to NOT do silly things like giving multiple threads the same lockSessionID, though. We currently only have the intention of use on the config server, but who knows what'll happen later

Comment by Spencer Brody (Inactive) [ 13/Apr/16 ]

Right, I guess the main other piece than having a mechanism to specify lock session id is to have deterministic assigning of lock session ids. We have to be careful though as we can only deterministically assign lock session IDs when taking distlocks on the config servers.

Perhaps the title of this ticket should be changed to "Deterministically assign distlock session IDs for locks taken by config servers, to allow takeover after CSRS failover" - what do you think?

Comment by Dianna Hohensee (Inactive) [ 13/Apr/16 ]

This should allow a process on the new primary to take over the lock. Perhaps I should update the description to include the fact that the change must also include a comparison between the lockSessionID passed into the function and the lock's session ID: if the two are equal, then the lock will be overtaken. The currently the only way overtaking the lock is allowed is when the lock expires. Now it will be expires or the lockSessionID matches the lock's.

An example would be the balancer. It will have a default value for lockSessionID that is always the same in every instance, so that when it restarts on a new primary it will pass the lockSessionID and immediately overtake the lock because it matches – rather than waiting for the lock to expire.

Comment by Spencer Brody (Inactive) [ 13/Apr/16 ]

This isn't sufficient for the new config server primary to take over the distributed locks, is there a ticket for that in the more general sense? I didn't realize that was a goal for 3.4.

Generated at Thu Feb 08 04:04:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.