[SERVER-21382] Sharding migration transfers all document deletions Created: 10/Nov/15 Updated: 18/Nov/21 Resolved: 04/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.9, 3.2.3, 3.3.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||
| Sprint: | Sharding D (12/11/15), Sharding E (01/08/16) | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
During chunk migration, the donor shard will record all document deletions and transfer them to the recipient, regardless of whether the deletions are relevant to the chunk being migrated or not. This causes significant load on both shards. The reason for this is that the document being deleted is no longer available on the donor shard. One solution would be that in the logOp call we should pass the shard key of the document being deleted so it can be filtered out. |
| Comments |
| Comment by Githook User [ 22/Jan/16 ] | ||||||||||||||||||
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: (cherry picked from commit 3663e004dfc2f73b82b3d88b5fa1ac6b7dcd1d33) | ||||||||||||||||||
| Comment by Githook User [ 22/Dec/15 ] | ||||||||||||||||||
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: | ||||||||||||||||||
| Comment by Githook User [ 15/Dec/15 ] | ||||||||||||||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: | ||||||||||||||||||
| Comment by Githook User [ 09/Dec/15 ] | ||||||||||||||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: | ||||||||||||||||||
| Comment by Githook User [ 09/Dec/15 ] | ||||||||||||||||||
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: | ||||||||||||||||||
| Comment by Andy Schwerin [ 20/Nov/15 ] | ||||||||||||||||||
|
I spent the afternoon looking at this. I have a solution in mind, but it requires some careful work. The problem is that we need to extract shard key data from the document before deleting it, but we don't want to do it if we're not moving a chunk on the collection. Getting this to fit cleanly into the existing code without introducing unnecessary copies (that hurt large document performance) will take some time. | ||||||||||||||||||
| Comment by David Hows [ 13/Nov/15 ] | ||||||||||||||||||
|
Looks like the evergreen patch for this has passed most of the tests. Should I look to make a formal CR and get these changes added into v3.0 and master? | ||||||||||||||||||
| Comment by David Hows [ 13/Nov/15 ] | ||||||||||||||||||
|
Found a further change that had an impact here. It appears that the code on the "from" shard that logs operations for transfer was adding all deletes for a given shard, and not just those relevant to the chunk migration in process (this limiting is done for both updates and inserts). I'm not sure if this is intended and I don't understand the reasoning fully, so I have sent a patch to evergreen for testing to see what breaks. Change is testing on evergreen at: https://evergreen.mongodb.com/version/5645687d3ff122642500002c_0 and contains the ScopedXact patch from My local tests went past the normal "hang" point and only stalled when we started migrating the "MaxKey" chunk. This migration finished quickly after I stopped the workload (last recorded insert was 10sec before migration completion as below), but the RangeDeleter work here was heavy and took quite a while to complete (by deleting 13M documents), this rangeDeleter action prevented migrations during its entire run.
| ||||||||||||||||||
| Comment by David Hows [ 12/Nov/15 ] | ||||||||||||||||||
|
Did some more reproduction on this today. My results were in line with what Bruce has been finding. I found that the patches we have been making improved the situation, but there can still be long lengthy delays where we send catch-up deletions between members (15 minutes after the inserts stopped in the worst case). This becomes problematic, because we can (and I have in my experiments) spent minutes (5 in the most recent test) transferring deletes for a collection that is now empty as all documents have been deleted by the TTL index. | ||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ] | ||||||||||||||||||
|
Another source of inefficiency in processing deletes, if I'm reading the code correctly, is that this test
means that if the _id does not exist on this (recipient) shard, which is a likely case because we are processing deletes executed on the donor shard, then we nevertheless save the object (which I guess is garbage, or actually the last object seen?) and call deleteObjects. Should this be
Or am I missing something? | ||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ] | ||||||||||||||||||
|
In addition, if there is a steady stream of deletes on the donor shard, the chunk move may never complete. Even if there is a pause in the deletes, the donor shard may not have time to catch up before the deletes resume. Anything that could be done to improve the efficiency of the processing of the out-of-range deletes on the recipient shard would mitigate this. One thing that doesn't help is that we log every out-of-range delete. |