[SERVER-57790] Minimise the impact of FCV upgrade/downgrade between 4.4 and 5.0 with large routing tables Created: 17/Jun/21  Updated: 29/Oct/23  Resolved: 16/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.0-rc2
Fix Version/s: 5.0.2, 5.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Paolo Polato
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Sharding EMEA 2021-07-12, Sharding EMEA 2021-07-26
Participants:

 Description   

The format of the sharding catalog has been changed starting from version 5.0. The change impacts the contents of all entries in the config.databases/collections/chunks/shards system collections and the upgrade/downgrade steps need to update a lot of data.

The data updates are synchronised per collection under the chunks lock, which means that for collections with a lot of chunks, chunk migrations can be blocked for a longer time. Since the chunk commit happens during the migration critical section, any shard which attempts to commit a chunk migration at that point in time will block access to the portion of the collection that it holds.

In order to mitigate this impact, it is proposed that we do 2 things:

  1. Make the beginning of setFCV stop the balancer and the end re-enable it (if it wasn't enabled already).
  2. Make chunk migration fail with a ConflictingOperationInProgress error if it attempts to commit at this point in time

Option (1) can be done later since it needs to remember if the balancer was stopped, but (2) is fairly easy and can be done earlier.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-57790 Stop the balancer thread while serving setFCV.

(cherry picked from commit a3e0e9d2998bd9a1ce0f8f975b364ef614e2d57c)
Branch: v5.0
https://github.com/mongodb/mongo/commit/a61de4f56d4f5d83e29117ebd845385f2dc356a1

Comment by Githook User [ 26/Jul/21 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-57790 abort chunk migration while an up/downgrade is in progress.

(cherry picked from commit 804ddfe0dcf1b5f26e1afd83b47cb1ef5df888bd)
Branch: v5.0
https://github.com/mongodb/mongo/commit/9f1d821bb54f7192fc5192e9cf891fc709c051cb

Comment by Githook User [ 16/Jul/21 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-57790 Stop the balancer thread while serving setFCV.
Branch: master
https://github.com/mongodb/mongo/commit/a3e0e9d2998bd9a1ce0f8f975b364ef614e2d57c

Comment by Githook User [ 06/Jul/21 ]

Author:

{'name': 'Paolo Polato', 'email': 'paolo.polato@mongodb.com', 'username': 'ppolato'}

Message: SERVER-57790 abort chunk migration while an up/downgrade is in progress.
Branch: master
https://github.com/mongodb/mongo/commit/804ddfe0dcf1b5f26e1afd83b47cb1ef5df888bd

Generated at Thu Feb 08 05:42:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.