[SERVER-21382] Sharding migration transfers all document deletions Created: 10/Nov/15  Updated: 18/Nov/21  Resolved: 04/Jan/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.0.9, 3.2.3, 3.3.0

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-8950 Sharding migration transfers all docu... Closed
Related
related to SERVER-61611 extend OpObserver::aboutToDelete() to... Closed
is related to SERVER-21366 Long-running transactions in MigrateS... Closed
Backwards Compatibility: Minor Change
Backport Completed:
Sprint: Sharding D (12/11/15), Sharding E (01/08/16)
Participants:

 Description   

During chunk migration, the donor shard will record all document deletions and transfer them to the recipient, regardless of whether the deletions are relevant to the chunk being migrated or not.

This causes significant load on both shards.

The reason for this is that the document being deleted is no longer available on the donor shard.

One solution would be that in the logOp call we should pass the shard key of the document being deleted so it can be filtered out.



 Comments   
Comment by Githook User [ 22/Jan/16 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-21382 fixing sharding migration to transfer only document deletions relevant to the chunk being migrated, not every deletion

(cherry picked from commit 3663e004dfc2f73b82b3d88b5fa1ac6b7dcd1d33)
Branch: v3.2
https://github.com/mongodb/mongo/commit/d96a296a65826a6ed6f9baf37849252866ca6970

Comment by Githook User [ 22/Dec/15 ]

Author:

{u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}

Message: SERVER-21382 fixing sharding migration to transfer only document deletions relevant to the chunk being migrated, not every deletion
Branch: master
https://github.com/mongodb/mongo/commit/3663e004dfc2f73b82b3d88b5fa1ac6b7dcd1d33

Comment by Githook User [ 15/Dec/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-21382 In chunk migration out-of-range deletes on the donor shard.
Branch: v3.0
https://github.com/mongodb/mongo/commit/2282dcdadc9356a711dc7ae60830a48c1ef6426e

Comment by Githook User [ 09/Dec/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-21382 Remove unused _id-extracting out parameter from Collection::deleteDocument
Branch: v3.2
https://github.com/mongodb/mongo/commit/f7e4eb5f500305a5a985acca9008ad57d38f3e63

Comment by Githook User [ 09/Dec/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-21382 Remove unused _id-extracting out parameter from Collection::deleteDocument
Branch: master
https://github.com/mongodb/mongo/commit/dc9993d8df8f6fd38a46549af0ba89564b11328c

Comment by Andy Schwerin [ 20/Nov/15 ]

I spent the afternoon looking at this. I have a solution in mind, but it requires some careful work. The problem is that we need to extract shard key data from the document before deleting it, but we don't want to do it if we're not moving a chunk on the collection. Getting this to fit cleanly into the existing code without introducing unnecessary copies (that hurt large document performance) will take some time.

Comment by David Hows [ 13/Nov/15 ]

Looks like the evergreen patch for this has passed most of the tests. Should I look to make a formal CR and get these changes added into v3.0 and master?

Comment by David Hows [ 13/Nov/15 ]

Found a further change that had an impact here. It appears that the code on the "from" shard that logs operations for transfer was adding all deletes for a given shard, and not just those relevant to the chunk migration in process (this limiting is done for both updates and inserts). I'm not sure if this is intended and I don't understand the reasoning fully, so I have sent a patch to evergreen for testing to see what breaks.

Change is testing on evergreen at: https://evergreen.mongodb.com/version/5645687d3ff122642500002c_0 and contains the ScopedXact patch from SERVER-21366, the suggestion above from Bruce and the mentioned change for only deleting documents within a chunks range.

My local tests went past the normal "hang" point and only stalled when we started migrating the "MaxKey" chunk. This migration finished quickly after I stopped the workload (last recorded insert was 10sec before migration completion as below), but the RangeDeleter work here was heavy and took quite a while to complete (by deleting 13M documents), this rangeDeleter action prevented migrations during its entire run.

2015-11-13T14:50:26.006+1100 I COMMAND  [conn3] command test.$cmd command: insert { insert: "c", documents: 100, ordered: false, metadata: { shardName: "shard0000", shardVersion: [ Timestamp 67000|4, ObjectId('5645590bad2529169f7b0064') ], session: 0 } } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:40 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCoun
t: { w: 2 } }, Collection: { acquireCount: { w: 2 } } } 114ms
....
2015-11-13T14:50:36.794+1100 I SHARDING [RangeDeleter] Deleter starting delete for: test.c from { _id: ObjectId('56455c14b9ec1c34eba99420') } -> { _id: MaxKey }, with opId: 3
...
2015-11-13T15:03:18.788+1100 I SHARDING [RangeDeleter] rangeDeleter deleted 13580001 documents for test.c from { _id: ObjectId('56455c14b9ec1c34eba99420') } -> { _id: MaxKey }

Comment by David Hows [ 12/Nov/15 ]

Did some more reproduction on this today. My results were in line with what Bruce has been finding.

I found that the patches we have been making improved the situation, but there can still be long lengthy delays where we send catch-up deletions between members (15 minutes after the inserts stopped in the worst case). This becomes problematic, because we can (and I have in my experiments) spent minutes (5 in the most recent test) transferring deletes for a collection that is now empty as all documents have been deleted by the TTL index.

Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ]

Another source of inefficiency in processing deletes, if I'm reading the code correctly, is that this test

                if (Helpers::findById(txn, ctx.db(), ns.c_str(), id, fullObj)) {
                    if (!isInRange(fullObj, min, max, shardKeyPattern)) {
                        log() << "not applying out of range deletion: " << fullObj << migrateLog;
                        continue;
                    }
                }
 
                if (serverGlobalParams.moveParanoia) {
                    rs.goingToDelete(fullObj);
                }
                ....
                deleteObjects(txn,

means that if the _id does not exist on this (recipient) shard, which is a likely case because we are processing deletes executed on the donor shard, then we nevertheless save the object (which I guess is garbage, or actually the last object seen?) and call deleteObjects. Should this be

               if (!Helpers::findById(txn, ctx.db(), ns.c_str(), id, fullObj) ||
                    !isInRange(fullObj, min, max, shardKeyPattern)) {
                        continue;
                    }
                }

Or am I missing something?

Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ]

In addition, if there is a steady stream of deletes on the donor shard, the chunk move may never complete. Even if there is a pause in the deletes, the donor shard may not have time to catch up before the deletes resume. Anything that could be done to improve the efficiency of the processing of the out-of-range deletes on the recipient shard would mitigate this. One thing that doesn't help is that we log every out-of-range delete.

Generated at Thu Feb 08 03:57:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.