[SERVER-42273] Introduce a "force" option to `moveChunk` to allow migrating jumbo chunks Created: 18/Jul/19  Updated: 29/Oct/23  Resolved: 05/Nov/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: Ratika Gandhi Assignee: Janna Golden
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-13200 Investigate changes in SERVER-42273: ... Closed
Related
is related to SERVER-44476 Include number of jumbo chunks remain... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-09-23, Sharding 2019-10-07, Sharding 2019-10-21, Sharding 2019-11-04, Sharding 2019-11-18
Participants:
Case:

 Description   

Currently, if a chunk is larger than 64MB by default or 1GB max, the balancer will mark it as jumbo and will refuse to move it.

It is possible to manually issue a moveChunk command and pass the unsupported and undocumented maxChunkSizeBytes parameter, which will override the check for max chunk size, but even with this, given sufficient write load to the chunk being migrated, the memory usage on the donor shard could exceed 500MB in which case migration will still fail.

This ticket proposes adding a new forceJumbo option to the moveChunk command in order to allow large chunks to be migrated at the possible expense of blocking writes to the owning collection on the shard in question. The option will have the following deviation from the way it currently operates:

  1. It will skip the step, which sorts the cloned chunk's document ids and will instead give out the chunks in the order of the shard key (this means it will never return a 'jumbo chunk' error)
  2. Instead of failing the migration, if the memory usage exceeds 500MB, it will instead enter the critical section (this means that writes to the collection being migrated will possibly block for longer period of time)


 Comments   
Comment by Janna Golden [ 05/Nov/19 ]

The following behavior changes were made as a part of this ticket:

Changes to moveChunk command:
A new optional boolean parameter 'forceJumbo' that defaults to false. If set to true and the chunk would otherwise have been deemed too large to move, the donor shard will enter the critical section early and writes will be blocked during the cloning phase. This is important to note as it can cause a long period of time where ops are blocked on this collection.

Changes to balancer configuration settings:
A new field 'attemptToBalanceJumboChunks' in the 'balancer' document in the config.settings collection. This a boolean field that defaults to false. This document will now look something like

{"_id": "balancer", "mode": "full", "stopped": false, "attemptToBalanceJumboChunks": false}

If 'attemptToBalanceJumboChunks' is set to true, the balancer will schedule migrations that attempt to move large chunks as long as the chunk is not marked 'jumbo' in config.chunks. A chunk is marked 'jumbo' only after an attempt to split or move a large chunk has failed because of its size or the size of the transfer mods queue. The balancer should not continually try to schedule the migration of a chunk that has failed for either of these reasons previously to avoid the risk of forever scheduling the same migration. A user can run 'clearJumboFlag' so that the balancer with schedule this migration in the future, or they can choose to use the moveChunk command to manually move the chunk.

Unlike the new behavior of the moveChunk command above, the donor shard will not enter the critical section early, and if the transfer mods queue (queue of writes that modify any documents being migrated) surpasses 500MB of memory the migration will fail. This is to avoid unintended "down time" in the case a user was unaware that moving a large chunk can cause a long period of time where ops are blocked on this collection.

Changes to shard removal:
If a shard is in draining mode, meaning it has been removed, the balancer will also attempt to schedule migrations of any large chunks currently belonging to this shard. The balancer will behave the same as if 'attemptToBalanceJumboChunks' is set to true (described above).

Comment by Githook User [ 05/Nov/19 ]

Author:

{'username': 'jannaerin', 'email': 'janna.golden@mongodb.com', 'name': 'Janna Golden'}

Message: SERVER-42273 Introduce 'force' option to 'moveChunk' to allow migrating jumbo chunks
Branch: master
https://github.com/mongodb/mongo/commit/c150b588cb4400e0324becd916de2a699988af99

Comment by Alyson Cabral (Inactive) [ 22/Jul/19 ]

Yes, I agree with everything you said. But for my clarity, this is less about how big the chunk is and more about the write throughput on the chunk, correct?

Comment by Kaloian Manassiev [ 22/Jul/19 ]

alyson.cabral, correct. To be more specific here are the trade-offs:

  • Entering the critical section too early means that too many write operations will get blocked for possibly unbounded amount of time (e.g., 1TB jumbo chunk for example could take a day to migrate).
  • Entering the critical section too late means that the write modifications which accrue in memory could exceed the amount of available memory on the server and cause an OOM crash

I'd like us to attempt to automatically move the chunk during shard removal and only require the manual move chunk if you need to enter the critical section early.

To make sure I understand what you are suggesting - moveChunk as part of shard removal should ignore the "jumbo" flag and not skip jumbo chunks, but if as part of migration it is discovered that the in-memory usage of the change log to the chunks has exceeded 500MB, still fail the migration, which would require manual intervention? This effectively requires a third state of that option, which is something like "forceJumbo But If Chunk Is Not Too Big".

Comment by Alyson Cabral (Inactive) [ 22/Jul/19 ]

kaloian.manassiev this is most impactful when you enter the critical section early because you're queueing too many writes to that chunk, right? Stopping all writes to the collection.

I'd like us to attempt to automatically move the chunk during shard removal and only require the manual move chunk if you need to enter the critical section early.

Comment by Kaloian Manassiev [ 22/Jul/19 ]

josef.ahmad/alyson.cabral/cailin.nelson, for this proposal to be used, it still requires the moveChunk command to be manually issued with the forceJumbo parameter, which means that shard removal scenarios will still not work only with the balancer (because it will not send that option by default).

In order to make remove shard work fully in the presence of jumbo chunks, we can do two things:

  1. (Atlas-only change): Make Atlas manually move any leftover jumbo chunks by passing this parameter
  2. (Server + possibly Atlas change): Make the 'forceJumbo' parameter configurable under config.settings so that the balancer can pick it up
  3. (Server-only change): Make the balancer send forceJumbo for any chunks, which reside on a shard, which is being removed

I don't particularly like options (2) and (3), because they give opportunity for customers to unknowingly expose themselves to long stalls. Do you think implementing option (1) makes sense with possibly some checkbox to warn/opt-in users to this behaviour with the warning that it may cause stalls?

Generated at Thu Feb 08 05:00:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.