[SERVER-53611] Handle errors on donor before donor document is initially inserted Created: 06/Jan/21  Updated: 27/Oct/23  Resolved: 02/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Sarah Zhou Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Gone away Votes: 0
Labels: PM-234-M3, PM-234-T-error-flow
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-54000 Make errors propagate from the Reshar... Closed
is related to SERVER-54513 Create new resharding recipient state... Closed
is related to SERVER-54704 Remove processAbortReasonNoDonorMachi... Closed
Assigned Teams:
Sharding
Participants:
Story Points: 1

 Description   

If the donor state machine errors before the donor document is initially inserted, the transition to kError will not persist on disk because updates to the donor doc are done without upsert

Two possible workarounds are upserting the doc in transitionToError() and ensuring that the DSM will not error before the donor document is inserted.



 Comments   
Comment by Max Hirschhorn [ 02/Mar/21 ]

The changes from 6ca44ce as part of SERVER-54513 made it so the state documents for donors and recipients have always been inserted before the DonorStateMachines and RecipientStateMachines are started. Closing this ticket as "Gone away".

Comment by Haley Connelly [ 25/Jan/21 ]

I think there is an underlying problem here that extends to both recipient/donorStateMachines: * Both state machines must handle that the coordinator may report an error before their Resharding<Recipient/Donor>Document has been created locally or their Resharding<Recipient/Donor>Document has been persisted locally.

eg): * Donor is in preparing-to-donate

  • reports an state DonorStateEnum::kError because there is an indexBuild in progress
  • Coordinator transitions to CoordinatorStateEnum::kError
  • At this point, the ReshardingRecipientMachine hasn’t been created, let alone has the ReshardingRecipientDocument been persisted locally.
  • Right now, if the ReshardingRecipientMachine doesn’t exist, we return
  • For recovery purposes, we probably want to persist that it has seen the error form the coordinator.
  • We need to ensure the error gets set after the original ReshardingRecipientDocument is inserted locally
Generated at Thu Feb 08 05:31:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.