[SERVER-23667] Allow deterministic lockSessionID assignment for distlocks, so that after CSRS failover processes on the config primary can reacquire locks immediately. Created: 12/Apr/16 Updated: 26/Apr/16 Resolved: 20/Apr/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.5 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Sharding 13 (04/22/16) |
| Participants: |
| Description |
|
In order to enable new config primaries to immediately gain access to the locks held by the old config primary, rather than waiting around for locks to expire, the lockSessionID must be specifiable and not always random so that the new primary knows what it is. Current signature:
Add new function:
ReplSetDistLockManager::lock will generate a OID and call ReplSetDistLockManager::lockWithSessionID, which will contain all the current implementation with the modification of not generating the lockSessionID that it now receives and comparing the lockSessionID to the lock's session ID to check whether the lock can be overtaken on a match. |
| Comments |
| Comment by Githook User [ 20/Apr/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 13/Apr/16 ] |
|
Yep, anyone who wants to specify the lockSessionID will have to deliberately call ReplsetDistLockManager::lockWithSessionID, and all the current implementation will still go through ReplsetDistLockManager::lock which generates a random lock session ID as per usual. Updated the title and the end of the description, hopefully that covers it. Technically any process that comes back to life after a failover, config or shard, will be able to reacquire the distlock if it's still held by that lockSessionID. I think it's just as safe from a shard process as a config process. Users will have to NOT do silly things like giving multiple threads the same lockSessionID, though. We currently only have the intention of use on the config server, but who knows what'll happen later |
| Comment by Spencer Brody (Inactive) [ 13/Apr/16 ] |
|
Right, I guess the main other piece than having a mechanism to specify lock session id is to have deterministic assigning of lock session ids. We have to be careful though as we can only deterministically assign lock session IDs when taking distlocks on the config servers. Perhaps the title of this ticket should be changed to "Deterministically assign distlock session IDs for locks taken by config servers, to allow takeover after CSRS failover" - what do you think? |
| Comment by Dianna Hohensee (Inactive) [ 13/Apr/16 ] |
|
This should allow a process on the new primary to take over the lock. Perhaps I should update the description to include the fact that the change must also include a comparison between the lockSessionID passed into the function and the lock's session ID: if the two are equal, then the lock will be overtaken. The currently the only way overtaking the lock is allowed is when the lock expires. Now it will be expires or the lockSessionID matches the lock's. An example would be the balancer. It will have a default value for lockSessionID that is always the same in every instance, so that when it restarts on a new primary it will pass the lockSessionID and immediately overtake the lock because it matches – rather than waiting for the lock to expire. |
| Comment by Spencer Brody (Inactive) [ 13/Apr/16 ] |
|
This isn't sufficient for the new config server primary to take over the distributed locks, is there a ticket for that in the more general sense? I didn't realize that was a goal for 3.4. |