[SERVER-26332] remove unnecessary check that RangeDeleter has run after removeShard in remove2.js Created: 26/Sep/16 Updated: 19/Nov/16 Resolved: 05/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.3.14 |
| Fix Version/s: | 3.4.0-rc1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2016-10-10 | ||||||||
| Participants: | |||||||||
| Description |
|
The test removes a shard and tries to put it back again. However, it is possible that the migration cleanup task has not yet completed so attempting to re-add the shard will result to "local database exist in another shard" error. The test attempts to prevent this by checking the currOp for the cleanup task: But the issue here is that the balancer can run with { _waitForDelete: true }and this will mean that the cleanup will happen inline with moveChunk and the check above will not catch this. |
| Comments |
| Comment by Githook User [ 05/Oct/16 ] |
|
Author: {u'name': u'Esha Maharishi', u'email': u'esha.maharishi@mongodb.com'}Message: |
| Comment by Randolph Tan [ 30/Sep/16 ] |
|
esha.maharishi Before range deleter, the shard used to spawn a separate thread every time it wants to perform chunk cleanup. After RangeDeleter, asynchronous cleanup is instead queued for the RangeDeleter worker to pickup. The RangeDeleter lives for the entire lifetime of the process until it shuts down so it will never go away in currentOp once it is live. However, it is possible to change the currentOp to check that no migrations are running after a shard has been drained as an alternative to creating a make shift concurrency barrier by temporarily stopping the balancer. |
| Comment by Esha Maharishi (Inactive) [ 30/Sep/16 ] |
|
renctan, it seems like the existing currentOp check is for a thread with name starting with "clean" (regex: /^clean/). At the time that check was added: the range deleter thread was called "cleanupOldData": but is now called "RangeDeleter": https://github.com/mongodb/mongo/blob/r3.3.15/src/mongo/db/range_deleter.cpp#L415 Do you think the test is failing because it's not actually checking if the RangeDeleter is running? If we change the check to be on "RangeDeleter" instead of /^clean/, will we be correctly checking if the RangeDeleter thread is active? |
| Comment by Randolph Tan [ 26/Sep/16 ] |
|
One potential fix is to use the balancerStop after removeShard in order to wait for the the current balancer round to finish and call balancerStart again. This will make remove2.js not compatible with the last-stable test suite though. P.S. We should probably also remove the sleep here. |