[SERVER-22658] Implement support for legacy and new migration protocol -- shard versus balancer holding the distributed lock Created: 16/Feb/16 Updated: 14/Jul/16 Resolved: 13/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.9 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Sharding 15 (06/03/16), Sharding 16 (06/24/16) | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
This task is for the changes which need to be made to the donor shard's migration manager so that it supports both the legacy migration protocol (shard acquires the distlock) and the new protocol where the CSRS primary holds the distributed lock. Implementation (high level): Config balancer: Shard: Add testing if any new code does not get checked by existing testing. Probably need to make a multi-version test. |
| Comments |
| Comment by Githook User [ 13/Jun/16 ] | ||||||||||||||||||||||||||
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: | ||||||||||||||||||||||||||
| Comment by Dianna Hohensee (Inactive) [ 08/Jun/16 ] | ||||||||||||||||||||||||||
|
Sounds good. I made | ||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 08/Jun/16 ] | ||||||||||||||||||||||||||
|
Regarding the first question. I think that it would be sufficient for now to only re-check that the distributed lock is still held if the legacy moveChunk protocol is in use, but not for the new variant where the balancer holds the collection lock. In order to complete the checking, we could later on add code in commitChunkMigration to ensure that there is a dist lock held on the collection for which it is being called. Let's do it as a separate ticket though. | ||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 08/Jun/16 ] | ||||||||||||||||||||||||||
|
There is no need for now to keep a cache of the migration protocols. The solution you have described, which uses the LockBusy status as a way to guide the call would be sufficient for now. | ||||||||||||||||||||||||||
| Comment by Dianna Hohensee (Inactive) [ 07/Jun/16 ] | ||||||||||||||||||||||||||
|
Also, over here https://mongodbcr.appspot.com/74020002 you wrote this as a rough outline for how the balancer will handle the ScopedDistLock
Do we want to maintain an array of which migration protocol each shard uses? I'm not sure how we would otherwise go about discovering that a V1 protocol shard becomes a V2 protocol shard. I was originally intending to just call moveChunk with the lock, and then recall moveChunk without it if it returns a LockBusy error. It would then be something like
| ||||||||||||||||||||||||||
| Comment by Dianna Hohensee (Inactive) [ 07/Jun/16 ] | ||||||||||||||||||||||||||
|
Currently moveChunk checks that the distlock is still held by the shard https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/move_chunk_command.cpp#L245-L253 right before going into the critical section. It won't be able to check the distlock in the same way if it doesn't hold the distlock itself. We talked about making the balancer use a specific lockSessionID so that if the config server primary goes down the balancer will restart and know what locks it's holding. the lockSessionID is a mongo::OID object that is normally randomly generated. We need to iron out what to do here. |