[SERVER-81201] Limiting the memory usage during the cloning phase on the recipient shard Created: 19/Sep/23 Updated: 09/Nov/23 Resolved: 29/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 7.0.0, 6.0.5, 7.1.0-rc0, 5.0.16 |
| Fix Version/s: | 7.1.1, 7.2.0-rc0, 5.0.22, 7.0.3, 6.0.12 |
| Type: | Task | Priority: | Critical - P2 |
| Reporter: | Sergi Mateo Bellido | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0, v5.0
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
As part of PM-3014 (and its backports to 5.0/6.0 done under PM-3001) we changed the amount of memory used during the cloning phase on the recipient shard, going from a constant amount to a variable one that depends on the speed of fetching data (producer) and locally inserting it (consumer). Cloning a chunk of 80GB we saw spikes of memory of 52GB. |
| Comments |
| Comment by Githook User [ 20/Oct/23 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit 6d6506f4bad7bbb72e16c6f5fc3ca74475f66e2b) |
| Comment by Githook User [ 13/Oct/23 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit cbef5c88d77886cb62b085f43d495a0c2af7b6af) |
| Comment by Githook User [ 13/Oct/23 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit 957e3523d76a8ce45188021fa51a7fb28f2aecb7) |
| Comment by Githook User [ 13/Oct/23 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit 6d6506f4bad7bbb72e16c6f5fc3ca74475f66e2b) |
| Comment by Garaudy Etienne [ 03/Oct/23 ] |
|
are we backporting this? |
| Comment by Githook User [ 29/Sep/23 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: |
| Comment by Max Hirschhorn [ 20/Sep/23 ] |
|
In MongoDB 4.4 and in versions of MongoDB prior to
Given how the _migrateClone result is a standard command response of at most 16MB, there can in total be at most 3*16MB == 48MB used on the recipient shard to represent the _migrateClone results. The MigrationBatchFetcher doesn't have an equivalent to limiting the number of _migrateClone results buffered in the queue. This leads the MigrationBatchFetcher to run _migrateClone as quickly as possible against the donor shard and hold many more _migrateClone results than before in memory at once. When running with chunkMigrationConcurrency == 1 we can reasonably expect fetching the documents to insert to outpace the rate of inserting documents. But at chunkMigrationConcurrency > 1 we may want to have multiple _migrateClone results buffered in the queue. We should define some limit for the memory usage of the MigrationBatchFetcher and MigrationBatchInserter to avoid overwhelming the recipient shard. |