[SERVER-58433] ReshardingCoordinatorService Transaction In bumpCollectionVersionAndChangeMetadataInTxn Possibly Too Large Created: 12/Jul/21  Updated: 29/Oct/23  Resolved: 31/Aug/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-68495 Resharding a collection with a very l... Closed
related to SERVER-73763 Resharding does not extend zone range... Closed
related to SERVER-73848 Hashed shard keys with zones can caus... Closed
related to SERVER-61035 Increase zones in 'resharding_large_n... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Sharding 2021-08-23
Participants:
Story Points: 2

 Description   

As discovered in this ticket and discussed in this [CR|https://mongodbcr.appspot.com/799100006.] The transaction in ShardingCatalogManager that makes sure the updates in 
bumpCollectionVersionAndChangeMetadataInTxn happen atomically requires at least 5 GB of Wired Tiger cache size.
 
This is too large for an Atlas M30 to be able to handle as it has 8 GB of RAM and 50% of it is used for WT cache.
 
We need to investigate whether this is an issue and how to fix it.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 05/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-58433 Remove multi writes from multi statement transactions from Resharding Coordinator Service
Branch: v5.0
https://github.com/mongodb/mongo/commit/3ee4e6c98cdc9f52d3b6fb85407d25a77ed3ce1a

Comment by Githook User [ 30/Aug/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-58433 Removing multi writes from multi statement transactions from Resharding Coordinator Service
Branch: master
https://github.com/mongodb/mongo/commit/b15ce764a4ee7e3bff5d20193de9ba9b8c4e3494

Comment by Max Hirschhorn [ 30/Jul/21 ]

After discussing this issue with Garaudy, I can confirm we'll need to stop using multi-statement transactions to bulk write the documents in config.chunks and config.tags as part of the different resharding coordinator state transitions.

I believe we can assume that when a resharding operation is running that the config.chunks documents are keyed off of "uuid" and don't contain an "ns" field. This is because the resharding operation only runs in FCV 5.0+.

  • CoordinatorStateEnum::kPreparingToDonate transition
    • Inserts into config.chunks for temporary resharding collection must first multi-delete any config.chunks documents belonging to {uuid: <reshardingUUID>}.
    • Inserts into config.tags for temporary resharding collection must first multi-delete any config.chunks documents belonging to {ns: <tempReshardingNss>}.
  • CoordinatorStateEnum::kCommitting transition
    • Deletes from config.chunks for source collection don't need any special handling to delete by {uuid: <sourceUUID>}.
    • Deletes from config.tags for source collection must include an additional filter on the "min" field that it doesn't have the same shape as the new shard key pattern. (See the code snippet below.)
    • Updates to config.chunks for temporary resharding collection don't need any special handling to retry.
    • Updates to config.tags for temporary resharding collection don't need any special handling to retry.

{$expr: {$ne: [
  {$map: {input: {$objectToArray: "$min"}, in: "$$this.k"}},
  {$map: {input: {$objectToArray: {$literal: <reshardingKey>}}, in: "$$this.k"}},
]}}

  • CoordinatorStateEnum::kDone transition
    • Deletes from config.chunks for temporary resharding collection don't need any special handling to retry.
    • Deletes from config.tags for temporary resharding collection don't need any special handling to retry.
Comment by Max Hirschhorn [ 14/Jul/21 ]

In addition to the WiredTiger cache size limitations, there is also the issue of the multi-statement transaction taking longer than the 1-minute default for the transactionLifetimeLimitSeconds server parameter. I imagine this combination will leave us without any options other than to rewrite the ReshardingCoordinator to not use a multi-statement transaction for its bulk updates and deletes of config.chunks and config.tags documents.

Generated at Thu Feb 08 05:44:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.