[SERVER-26250] Balancer holding distlock briefly on recover fails a subsequent split (or potentially any distlock operation) command Created: 22/Sep/16 Updated: 31/Oct/16 Resolved: 24/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.0-rc2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Sharding 2016-10-10, Sharding 2016-10-31 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
The moveChunk is returned to the mongos with response OK, stepdown occurs and the balancer keeps the migration document. Balancer recovers, acquires distlock because of the migration document, reloads the chunk metadata and discovers that the chunk has already moved, and then the balancer releases the distlock. However, the balancer holding the distlock briefly interferes with the JS test's subsequent split command that occurs after that moveChunk command. |
| Comments |
| Comment by Githook User [ 24/Oct/16 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 10/Oct/16 ] |
|
Going with option 1) above. Extending moveChunk command success to depend on whether the deletion of the migration document succeeded. |
| Comment by Dianna Hohensee (Inactive) [ 22/Sep/16 ] |
|
Options that come to mind right now: None of these is very appealing... 2) is the cleanest, but increases time spent in drain mode, reloading chunk metadata for every collection in which there are active migrations happening. |