[SERVER-50027] Freeze migrations on original sharded collection while the resharding operation is in progress Created: 30/Jul/20  Updated: 29/Oct/23  Resolved: 11/Nov/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: PM-234-M2, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-51891 Make ShardCollectionType not use stri... Closed
Problem/Incident
causes SERVER-76004 Remove incorrect sharding tassert in ... Closed
Related
related to SERVER-53140 Ensure that allowMigrations setting g... Closed
related to SERVER-56226 [v4.4] Introduce 'permitMigrations' f... Closed
related to SERVER-56227 Add user-facing command to set allowM... Closed
related to SERVER-58082 Fail resharding operation if allowMig... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-09-21, Sharding 2020-10-05, Sharding 2020-10-19, Sharding 2020-11-02, Sharding 2020-11-16
Participants:

 Description   
  • disable the balancer
  • prohibit external users from causing migrations via moveChunk

Add an optional isFrozen field  for each collection in config.collections.
isFrozen = true overrules the noBalance field for the collection if specified (even if noBalance = false, isFrozen = true disallows the balancer from running)



 Comments   
Comment by Githook User [ 11/Nov/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-50027 Implement an 'allowMigrations' collection property
Branch: master
https://github.com/mongodb/mongo/commit/7950f071b21957be1ab8e5ac2db8b650695a2bd0

Comment by Githook User [ 03/Nov/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-50027 Convert the CollectionType's UUID field to IDL

... and make it required
Branch: master
https://github.com/mongodb/mongo/commit/ab6d6baf541e762b272738da8d9c5b257caa3be4

Comment by Githook User [ 02/Nov/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-50027 Make CollectionType use IDL (Part 3)
Branch: master
https://github.com/mongodb/mongo/commit/5bd87925a006fa591692e097d7929b6764da6d0c

Comment by Githook User [ 30/Oct/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-50027 Make CollectionType use IDL (Part 2)

This change gets rid of the writing of the 'dropped' field on the
collections in lieu of directly deleting them from the
config.collections.
Branch: master
https://github.com/mongodb/mongo/commit/b9bd00606448c33ea89c2c1246a11d309a9a19ce

Comment by Githook User [ 22/Oct/20 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-50027 Make CollectionType use IDL (Part 1)

This change just starts the basis so that the CollectionType can be
built on IDL in order to avoid conflicts with any changes to the format.
The remaining fields will be moved to IDL in a subsequent commit and the
custom parsing will be removed.
Branch: master
https://github.com/mongodb/mongo/commit/41f2ecbb47072b7b6990f39e4ad7fe27b289e835

Comment by Haley Connelly [ 13/Oct/20 ]

kaloian.manassiev it would be great if y'all could pick this up so we can focus on other resharding MS1 tickets!

Comment by Kaloian Manassiev [ 06/Oct/20 ]

haley.connelly, since PM-1965 requires this functionality too, does it make sense that we pick it up?

CC tommaso.tocci

Comment by Haley Connelly [ 05/Oct/20 ]

setting aside until higher priority milestone 1 tickets are taken care of.

Comment by Kaloian Manassiev [ 10/Sep/20 ]

I was envisioning that this command will not just disable the balancer thread (in fact it will not disable it). It would do exactly what you are suggesting, to add the noBalance field, but also it needs to join the current balancer round and also needs to join any currently ongoing manually issued moveChunk's for that collection. Basically it needs to serve as a fence that after it successfully returns, there are no moveChunk(s) running anywhere, for that collection.

Comment by Max Hirschhorn [ 10/Sep/20 ]

However, may I ask that the ability to block balancing for a particular collection be implemented as a separate internal command, which can be tested in isolation from the resharding code and is not coupled with it? That way we can make sure that it properly joins ongoing migrations, etc and also it can be reused. I was imagining an internal _configsvrBalancerFreezeForCollection: "DB.Coll" or something like that. If the collection doesn't exist, it can return NamespaceNotFound.

I'm new to sharding conversations like this so I wasn't sure when people write "balancer" if they mean "chunk splits and migrations including those triggered explicitly by user commands" or if people mean "the background thread which runs on the config server primary". Disabling the balancer thread isn't sufficient to prevent all sharding metadata operations on the collection being resharded. Acquiring the distributed locks on the database and collection being resharding are likely necessary too.

I was imagining resharding would do {$set: {noBalance: true}} to the config.collections document for the collection being resharded as part of the same transaction which inserts the document into the config.reshardingOperations collection. (And it would preserve the original noBalance setting so it can be restored after the resharding operation completes.) Setting the 'noBalance' field is a best-effort to make the balancer thread not do wasted work where it could somehow start a migration for the collection being resharded on one of the shards. No matter the noBalance setting, committing a chunk migration on the collection being resharded must be mutually exclusive with allowing the resharding operation to succeed.

Comment by Kaloian Manassiev [ 09/Sep/20 ]

max.hirschhorn explained to me over Slack the correctness considerations why we need to disable balancing for a collection being resharded and I am fine with that (as much as I don't love it).

However, may I ask that the ability to block balancing for a particular collection be implemented as a separate internal command, which can be tested in isolation from the resharding code and is not coupled with it? That way we can make sure that it properly joins ongoing migrations, etc and also it can be reused. I was imagining an internal _configsvrBalancerFreezeForCollection: "DB.Coll" or something like that. If the collection doesn't exist, it can return NamespaceNotFound.

Comment by Blake Oler [ 09/Sep/20 ]

To be more specific – we necessitate disabling the balancer for both the source and destination collections, because the collection cloning and oplog fetching depend on the routing table not changing. Amusingly enough, the Slack conversation that I pulled the original context from notes that you didn't want the balancer to need to be disabled, but we had to do it anyway kaloian.manassiev

Comment by Blake Oler [ 09/Sep/20 ]

Hey Kal, the balancer being disabled has been written as step 1.1 in the high-level design for a while now, and the current implementation relies on its being disabled. If you'd like to talk about this more, let haley.connelly and me know and we'll schedule a Zoom call.

Comment by Kaloian Manassiev [ 09/Sep/20 ]

Would it be possible to not disable the balancer for neither the source nor the destination collection, in order to ensure that we adequately handle all cases where the versions change, throw StaleShardVersion, etc. I really don't want that we write the resharding cloning code in a way which relies on the balancer being stopped for correctness. The disabling of the balancer should be just for performance reasons.

Generated at Thu Feb 08 05:21:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.