[SERVER-57952] Resharding donor shards cannot complete a shard version refresh after acquiring the critical section, stalling the resharding operation Created: 22/Jun/21  Updated: 29/Oct/23  Resolved: 24/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.0-rc4, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-lifecycle, post-rc0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Documented
is documented by DOCS-14542 Server: Update tables for resharding-... Closed
Related
related to SERVER-57953 _flushReshardingStateChange attempts ... Closed
is related to SERVER-55677 Remove resharding's DonorStateEnum::k... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Sprint: Sharding 2021-06-28
Participants:
Linked BF Score: 134
Story Points: 1

 Description   

Shards during a resharding operation rely on a shard version refresh to be triggered after a new primary has stepped up for the DonorStateMachine and RecipientStateMachines to learn of a change to the coordinator's state. However, a shard version refresh won't be able to complete while the critical section is held. This means if the write to acquire the critical section becomes majority-committed but the write to transition to DonorStateEnum::kBlockingWrites doesn't, then the donor shard will be stuck unable to advance past DonorStateEnum::kDonatingOplogEntries. The shard version refresh won't be able to complete while the critical section is held and so the donor shard won't realize it is safe for it to complete its transition to DonorStateEnum::kBlockingWrites.

The DonorStateEnum::kPreparingToBlockWrites state had been removed as part of SERVER-55677 but could be reintroduced to solve this issue. A donor shard coming up in DonorStateEnum::kPreparingToBlockWrites would mean the donor shard doesn't need to wait for a shard version refresh to complete before it can complete its transition to DonorStateEnum::kBlockingWrites.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 24/Jun/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-57952 Re-add DonorStateEnum::kPreparingToBlockWrites.

(cherry picked from commit bd1a5b70ff899b8a5271136cfbb989094442d75b)
Branch: v5.0
https://github.com/mongodb/mongo/commit/6b14a422616c84e5f8eab8c3bf81618bcf7ebc17

Comment by Githook User [ 23/Jun/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-57952 Re-add DonorStateEnum::kPreparingToBlockWrites.
Branch: master
https://github.com/mongodb/mongo/commit/bd1a5b70ff899b8a5271136cfbb989094442d75b

Generated at Thu Feb 08 05:43:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.